Clustering with the Shoal Framework Blog


    Shoal is an open source, Java-based generic clustering framework. It can be used in your applications to add clustering functionalities like load balancing, fault tolerance, or both. Applications using Shoal can share data, communicate via messages with other cluster nodes across the network, and notify of relevant events like the joining, shutdown, and failure of a node or group of nodes. You can take appropriate measures and perform monitoring tasks when these events occur; Shoal forwards a signal to your code to track these notifications.

    Shoal is the clustering framework used by the Glassfish project to implement its application server clustering. One of the benefits to your application is that Shoal abstracts away network details and the network communication API. Under the hood, the default group communication provider uses JXTA for peer-to-peer reliability and scalability.

    Shoal is a lightweight component; you can embed Shoal not only in Java EE applications, but in SE applications too.

    In this article, we'll cover the Shoal architecture and its basic concepts. Then we'll illustrate how to integrate your own code into the clustering infrastructure.

    Learning Shoal's Basic Concepts and Architecture

    We will now discuss the architecture of a Shoal cluster solution and the internal design of Shoal. This section also describes main components, such as Signals and the Distributed State Cache (DSC).

    Understanding Shoal's Architecture

    Shoal's main concept is the group, a virtual union of cluster nodes by a single name. Each group is composed ofmembers. A member is uniquely identified by its member token. A token is just a String, usually a Global Unique ID (GUID), to avoid name collision between the nodes. Also, a member can be a spectator or a core, the difference being that when a core member goes down, all other members are notified of the failure. On the contrary, an spectator's failure is not a significant event that Shoal notifies other members about. Spectators are commonly used for cluster administration and/or monitoring applications.

    A network can host many groups, but be aware that each group consumes additional bandwidth and processing resources.

    The second important concept is the Group Management Service (GMS). It manages the network's groups by keeping track of the members, mediating and facilitating member cooperation and communication.

    Shoal divides groups into functionalities calledcomponents. This division is up to the developer; by default Shoal doesn't impose any structure on the cluster. A group, for example, can have a transaction component and a monitoring component, each processed by one or two physical machines. The API allows the sending of signals to specific nodes that implement a particular component.

    The last significant piece is the signal. Signals are events in the group's lifecycle, such as a new node joining the cluster or another one leaving it. Signals are covered in greater depth below.

    Figure 1 illustrates the physical distribution of a example Shoal group.

    A Shoal system's architecture
    Figure 1. A Shoal system's architecture

    Figure 1 shows three physical machines, with two running a JVM and executing a single instance of the application, and the third machine running two JVMs, each one executing one instance. Each application instance has loaded Shoal's GMS, which discovered the other peers and joined the group.

    Note that all the machines are connected in the same physical network. The physical location is not a concern as long as multicast traffic traverses the relevant nodes.

    Understanding Shoal's Design

    Shoal is comprised of two main parts: the GMS Client API and the GMS Service Provider Interface (SPI). Your application interacts with the API, which in turn uses the SPI to talk to the underlying group communication protocol. The default SPI uses JXTA to provide group services.

    Figure 2 illustrates the dynamics between Shoal's layers, your application, and the network:

    Shoal impact on your application architecture
    Figure 2. Shoal's impact on your application architecture

    Calling Shoal's API, you can:

    • Emit cluster signals: join, failure, and shutdown are some examples. The signals are discussed in depth in the next section.
    • Send messages to other nodes or the whole group.
    • Share data through a Distributed State Cache (DSC) (discussed later).

    The framework calls your code when signals arrive from any node on the group. For this to happen, you must register callback objects as described in the "Listening to the Group's Messages" section.

    Understanding Shoal Lifecycle Signals

    Each node in the group has the following built-in cluster lifecycle signals to both send and receive from other nodes:

    • Join: Shoal tells all members that a new node is joining the group.
    • Joined and ready: The recently joined member is ready to process requests.
    • Failure suspected: Shoal doesn't receive a heartbeat response from a member and it suspects the node has failed. When the failure is confirmed, it sends a failure notify signal.
    • Failure notify: The group tells all the members about the confirmed failure of a node.
    • Failure recovery: The group tells a node to take the place of a failed node. This node must have a previously registered as a failure recovery node.
    • Planned shutdown: Shoal sends a signal to the group notifying an administrative shutdown of a member.

    Shoal can send other signals related to the domain of the application. Those are called message signals. Both signal types can arrive at any time and can encapsulate arbitrary data.

    Taking Advantage of Shoal's Failure Recovery

    The most interesting signals are the failure family, as they enable Shoal's recovery features. When a node fails and then tries to recover, Shoal brings up something called recovery failure fencing. This protects the node from getting inconsistent states and fences its recovery process. A virtual fence is raised for a member's identity inside the group. This is done to protect cluster operations from race conditions. When the fence is lowered for the identity, any member can communicate with the previously fenced one.

    Sometimes another node wants to take the failed node's place. This is achieved by a node getting the failed member's identity. Let's make it clearer with an example.

    When node A goes down, the other nodes in the group get notified via a FailureSuspectedSignal. After a timeout period aFailureNotifySignal is fired. If any of the other members in the group is a FailureRecoveryActionlistener, then these nodes are candidates to substitute the failed one. Let's call one such node "node B."

    At this point, the GMS master node notifies node B that it has to become a recovery delegate for the failed node. It sends the node a FailureRecoverySignal. Finally, if node B acknowledges that signal, it raises the fence, does the recovery work (such as taking over resources or tasks of node A), and lowers the fence. This takes us back to normal operation.

    If the failed node goes up again, it must check whether the fence is raised for its member identity. If it's not fenced, and it wants to recover, it must raise a fence. After the recovery is finished, it lowers the fence again.

    Understanding Distributed State Cache

    Shoal also provides a data sharing mechanism for the group called the Distributed State Cache. Though not a full-blown distributed cache like JBossCache, it allows the nodes to share data.

    The DSC is an associative data structure like ajava.util.Map. The default implementation uses a node-local HashMap, and replicates the changes to the other nodes for better performance. Beware that it doesn't implement a Least Recently Used (LRU) or any other algorithm to eliminate unused elements. You can only put aserializable object into the map as a key, a value, or both.

    In the next section, we'll cover how to integrate Shoal into your application.

    Integrating Shoal into Your Application

    Embedding Shoal into your application is straightforward; the time-consuming part is thinking about what clustering features you really need: load balancing, fault tolerance, or both. If you add something you don't need, the complexity grows and can jeopardize your project.

    Begin by downloading the Shoal distribution from Shoal's site. Unpack and copy both shoal.jar and jxta.jar to your application classpath. That's all you need besides a working network connection.

    It's very useful to test with two instances of the application, so you can see how it responds to the other instance joining and failing, along with message passing. So let's start by creating a group and joining it.

    Starting and Joining a Group

    The first thing we need to create or join a group is a group name. The first node to start the Shoal framework creates the group with this name. If no other nodes for the same group on the network have started, this node becomes the master node. From now on, each node joining the group will acknowledge this node as a master.

    If there's already a group created with the group name, the new node joins as a vanilla member.

    First you must start a new GMSModule by calling theGMSFactory.startGMSModule() method, like the following code fragment does:

    GroupManagementService gms = (GroupManagementService) GMSFactory.startGMSModule(serverName, groupName, memberType, props); 

    The method receives four parameters:

    • groupName identifies the group name to create if it doesn't exist, or to join in the case the group is already created.
    • serverName uniquely identifies this instance in the cluster. You can't create a node with a duplicate name in the cluster.
    • The memberType parameter is an enumcharacterizing the member's role: CORE orSPECTATOR, as discussed in the previous section.
    • The properties you send to start the GMS module are mostly configuration parameters for the underlying SPI, such as JXTA network configuration parameters. These parameters range from the multicast address and port to the discovery and failure timeout and retries.

    In the case where a Shoal creates a new group, it will display a message like the following in your console:

    7:33:14 PM getMemberTokens INFO: GMS View Change Received: Members in view (before change analysis) are : 1: MemberId: node1, MemberType: CORE, Address: urn:jxta:uuid-4879342BEE684EC598A9B8741505B700A62767B3102B4D258A48FDBF1961AC0D03

    This message tells us that a new core member, callednode1, is a member of the group, and shows its unique JXTA identifier. If the group already exists, the message includes the other member JXTA identifiers.

    After the group is created, you must call thejoin() method, as the next code snippet shows:

    try { gms.join(); } catch (GMSException e) { e.printStackTrace(); } 

    After join()ing, Shoal will output a message like the following:

    Nov 29, 2007 7:33:14 PM newViewObserved INFO: Analyzing new membership snapshot received as part of event : MASTER_CHANGE_EVENT

    In this case, the member becomes the master of the group, as it is the creator and only member at this moment.

    Sending Messages

    Once you are part of the group, you can send and receive messages to and from other members. Shoal exposes aGroupHandle object to achieve several tasks, one of which is to send message signals. Let's see how you can use theGroupHandle to send messages in the following fragment:

    gms.getGroupHandle().sendMessage(null, message.getBytes()); 

    This method sends a byte array to all members. The first parameter is the component name; in this case the nullmeans all components. The second is the byte array of a string. TheGroupHandle can also send a message to all members, a specific subset, or just one member by using member tokens. See theGroupHandle Javadocs for more details.

    Listening to the Group's Messages

    To begin listening to the group member messages, you must register a callback with the framework. To do this, simply create a class implementing the CallBack interface. You have to implement just one method: processNotification(). The following code fragment shows an implementation:

    public void processNotification(Signal signal) { signal.acquire(); if (signal instanceof MessageSignal) { System.out.println( ":Message Received from:" + signal.getMemberToken() + ":[" + signal.toString() + "]"); } else { System.out.println( ":Other Notification Received from:" + signal.getMemberToken() + ":[" + signal.toString() + "]"); } signal.release(); } 

    The signal must be acquire()d andrelease()d to avoid race conditions. TheSignal's getMemberToken() method returns the member token of the node that emitted the signal. It's very useful for responding only to the member who sent the signal.

    Once you have the message handling code, you must register theCallBack with Shoal.You can register to receive custom messages directed to a particular component of the cluster by hooking up a MessageActionFactoryImpl and sending acomponentName parameter.

    gms.addActionFactory(new MessageActionFactoryImpl(callBackClass), componentName);

    This particular MessageActionFactory receivesMessageSignals, the only non-lifecycle signals built into Shoal. The ActionFactory will call the instance of the CallBack object, every time aMessageSignal arrives directed to thecomponentName parameter.

    Sharing Data using the DSC

    To share data between nodes without using messages, you can use the DSC. Remember that no automatic reaping of the unused values occurs. The default implementation doesn't have any stale value reaping mechanism or a capacity limit, so you must be very cautious using the default DSC implementation. You can write another implementation to manage DSC according to the application requirements.

    The DSC is accessed via the GroupHandle by invoking the getDistributedStateCache() method, as in the following code:

    DistributedStateCache dsc = gms.getGroupHandle().getDistributedStateCache();

    This gets a reference to the DSC. To add a value, use theadd() method:

    dsc.addToCache( componentName, memberTokenId, key, state);

    There's an individual cache for each group component; you specify in which cache to store the state using thecomponentName parameter. The second parameter is the calling member's token. The third is the key you'll use to recover the state. Finally, the fourth parameter is the state to store in the cache.

    To retrieve the stored value, use the get()method:

     Object o = dsc.getFromCache( componentName, memberTokenId, key) ;

    This retrieves the state, using the key to search it. Finally, to remove an object, use the remove() method:

    dsc.removeFromCache( componentName, memberTokenId, key) ;

    This eliminates the state associated with the key from the DSC. .

    Shutting Down

    Hopefully, after doing some useful work, the node can shut down. The master node can shut down all members in the group. To achieve both you must call the shutdown() method of thegms object. Shutting down members or the whole group sends planned shutdown signals to their members.


    Hopefully this article clarifies some of the less-documented aspects of Shoal and kickstarts new users of the framework.

    Clustering gives you a solution to solve two important problems: fault tolerance and load balancing, a must for enterprise mission-critical applications. Load balancing offers much-needed scalability and fault tolerance means less downtime.

    Shoal makes it easy for you to achieve both load balancing and fault tolerance characteristics in your application. However, the most commonly used scenario is group communication, including message passing and state caching. You can easily add cluster facilities to your application by integrating the Shoal framework into your code.

    To see detailed characteristics and advanced uses of the framework, such as cross subnets groups, take a look at the examplesand design documents included with Shoal.