Managing Timed Tasks Within a Cluster Utilizing The StopLight Framework Blog

Version 2


    <  /tr>                                  

    Towards a Solution
    Task Management
    Task Monitoring
       Task Monitoring Strategy
    Detection and Management
    of Dead Tasks
    How Does it All Work?
    Configuring and Using
    the StopLight Framework

    The increase in demand for large-scale, enterprise application solutions has led to the development of application clustering techniques and technologies. Clustering applications across multiple servers provides applications with the ability to handle large volumes of traffic, and performance can be increased by adding additional servers to the cluster. In addition to providing scalability, application clusters make the system more robust by allowing for automatic system fail over when a server fails. This way, when one server goes down, the application continues to run, albeit with slightly decreased performance. While it is true that the current generation of application servers makes it relatively pain-free to create a cluster, there are still several significant, if often overlooked, design issues that must be taken into account once a system is clustered.

    Perhaps the most significant of these issues is how to handle recurring tasks that should not execute concurrently. Scheduled or recurring tasks are used to execute procedures that need to run at certain fixed times or at fixed intervals. Typical examples of scheduled tasks are report generation tasks and tasks that send data to external systems that are only available within a certain timeframe. In order to understand why clustering affects the application's design in regards to the handling of scheduled tasks, it is useful to consider an example.

    For this article, a generic e-commerce web application will be used as an example. In order to allow management to analyze sales trends, profits, inventory, etc., the system has been set up to periodically compile a set of reports and email them to management. Clearly, management does not want to receive multiple emails containing the same reports; yet, this is what will happen if the application contains a basic scheduled task and then the application is clustered. When the appointed time to run the report comes up, all machines in the cluster will generate the same report and send it to management. This can be seen visually in Figure 1.

    Concurrently Executing Tasks
    Figure 1. Concurrently executing tasks

    Obviously, this is not the desired outcome. What is needed is something that instead allows only a single task within the cluster to execute, while still retaining the benefits of the cluster, such as high availability and scalability.

    Towards a Solution

    Let's look at what features we would like to see in timed tasks executing within a cluster. The StopLight framework, a project hosted on, addresses the issue of managing clustered tasks by dividing the problem into four sections:tasks, task management, task monitoring, andheartbeat monitoring. The task management portion of the framework provides for the registration and scheduling of tasks, while the task monitoring portion of the framework is responsible, through the use of the external semaphore, for determining if a given instance of a task may execute at a given time. The heartbeat monitoring portion of the framework is tasked with determining if a particular server within the cluster is still alive and running or if the server has failed. Figure 2 provides a high-level view of the framework deployed into a cluster.

    StopLight Deployment
    Figure 2. StopLight deployment

    Next, we'll examine the various components of the framework so that we can then understand how those components work together within the StopLight framework. Last, we'll discuss installing and configuring StopLight.


    As a clustered task management framework, the definitions of tasks are central to all functionality within the StopLight framework. Tasks are defined in interfaces found in thecom.clarkrichey.stopLight.task package. TheTask interface is shown below.

    public interface Task extends Runnable { /** This is the viewable name for this Task * @return The viewable name for this Task */ String getName(); /** Used to get the interval that should pass * between executions of this task. * This interval is specified in milliseconds * @return The interval between run times * in milliseconds */ long getRunInterval(); /** Should be called to initialize a Task to its * base state. Must be called * before the Task is executed for the first time */ void initialize(); /** Used to cancel execution of the Task * @returns true if the method was successfully cancelled * @returns false if the Task couldn't be cancelled */ void cancel(); /** Used to get a read-only view of Task * information * @returns an instance of TaskInfo containing * the information for this * Task */ TaskInfo getTaskInfo(); }

    The Task interface defines the basic characteristics of all tasks. Several methods are defined for the purpose of Task identification, such asgetName() and getIcon(). The methodsinitialize() and run() perform the work of the Task. The initialize() method is guaranteed to be called only once during the lifetime of aTask, when the task is first registered with aTaskManager. The run() method is executed when the Task is scheduled to execute (as defined by the task's runInterval property) and the instance of the Task has been selected as the single instance of that Task in the cluster to execute at this time. Thecancel() method is guaranteed to only be called once during the lifetime of a Task, when the task is removed from the TaskManager's list ofTasks scheduled for execution.

    The interface RestartableTask extendsTask in order define tasks that may be safely restarted in the event that they terminate abnormally.RestartableTask defines one additional method,reset(). This method is called when aTask has been terminated abnormally and is now being resurrected and is eligible to be executed again. It is important to note that the initialize() method is not called when the Task is resurrected, so it falls upon thereset() method to provide any initialization, as well as any cleanup, that may be needed as a result of the abnormal termination.

    AbstractTask is an abstract class that is provided as a convenience to developers in the creation of their own concrete Tasks. AbstractTask provides default implementations for all of the methods required by the Task interface, with the exception of initialize(),run(), and cancel(). The implementation of those methods are left to the developers of concreteTasks. AbstractRestartableTask provides the same convenience to developers ofRestartableTasks, requiring only the additional implementation of the reset() method.

    The code for a "Hello World" task that simply prints "Hello World" along with the current time every time it is executed is listed below. This provides a simple example of creating a task by extending AbstractTask.

    import com.clarkrichey.stopLight.task.*; import java.util.Calendar; public class HelloWorldTask extends AbstractTask{ public HelloWorldTask() { this.description = "A simple task"; = "HelloWorldTask"; // execute every 10 seconds this.runInterval = 10000; // no associated icon this.icon = null; } public void cancel() { // no need to do anything } public void initialize() { // no need to do anything } public void run() { Calendar now = Calendar.getInstance(); System.out.println("Hello World! It's " + now); } }

    The BasicTask class is provided as an additional convenience to developers. BasicTask extendsAbstractTask and is constructed by passing aRunnable to its constructor along with a unique task name. The BasicTask class delegates to therun() method of its runnable when run()is called. The BasicTask takes no action when eithercancel() or initialize() are called. If the Task being deployed requires action to be taken when these methods are invoked, then the use of theBasicTask is not appropriate and it will be necessary to either extend AbstractTask or directly implement the Task interface.

    Below is the code for the HelloWorld class, which functions exactly the same way as the HelloWorldTaskshown above. However, instead of extendingAbstractTask, the HelloWorld class simply implements Runnable. This class can then be passed in to the constructor for BasicTask, along with its run interval.

    public class HelloWorld implements Runnable { /** Creates a new instance of HelloWorld */ public HelloWorld() { } public void run() { Calendar now = Calendar.getInstance(); System.out.println("Hello World! It's " + now); } }

    Task Management

    Classes directly responsible for the management of tasks are found in the com.clarkrichey.stopLight.managementpackage. The TaskManager interface is listed below. The TaskManager interface describes classes that are responsible for scheduling the execution of tasks. This interface contains methods for registering and for removing aTask, as well a method for setting the monitoring strategy to be used, and for retrieving an instance of a registeredTask. While the registerTask(),getTask(), and getTasks() methods are self-explanatory, the rest of the methods defined by this interface require some explanation.

    The removeTask() method will remove the specifiedTask from the TaskManager's list of tasks to be executed, but will not interrupt the Task if it is currently executing. The removeTaskNow() methodwill remove the specified Task from theTaskManager's list of tasks to be executed, and will terminate the Task's execution if it is currently running.

    The methods setTaskMonitoringStrategy() andgetTaskMonitoringStrategy() are used, respectively, for setting and getting the strategy to be used by theTaskWrapper to determine if a Task has terminated abnormally. While the TaskManager is responsible for the scheduling of Tasks, the determination of a Task's health is dictated by theTaskMonitoringStrategy that is being used. Further information on the TaskMonitoringStrategy can be found in the "Task Monitor" section below.

    public interface TaskManager extends StopLightManagedComponent { /** Used to register a Task with the TaskMonitor * @param taskToRegister The Task that will * be managed by the TaskMonitor * @param taskName The unique name of the Task */ void registerTask(String taskName, Task taskToRegister); /** Used to get a copy of the List of Tasks * being managed by this * TaskManager * @return The List of Tasks being managed by * this TaskManager */ List<Task> getTasks(); /** Used to get a particular Task that was * registered with the TaskManager * @param taskName The name of the Task to * be retrieved * @return The requested Task. Null if * the task is not found */ Task getTask(String taskName); /** Used to remove a Task that we registered * with this * TaskManager. Running Tasks are allowed to * complete their execution * @param taskToRemove The Task to be removed */ void removeTask(String taskName); /** Used to remove a Task that we registered * with this TaskManager. Running Tasks * are terminated without being allowed to * complete their execution * @param taskToRemove The Task to be removed */ void removeTaskNow(String taskName); /** Used to set the TaskMonitoringStrategy that * will be used by the TaskManager to * determine if a Task is alive or not. * Calling setTaskMonitoringStrategy will * replace any existing TaskMonitorStrategy * with the new TaskMonitorStrategy passed in * @param s The TaskMonitoringStrategy * to use */ void setTaskMonitoringStrategy(TaskMonitoringStrategy s); /** Retrieves the current TaskMonitoringStrategy * @return The current TaskMonitoringStrategy */ TaskMonitoringStrategy getTaskMonitoringStrategy(); 

    BasicTaskManager is the default implementation ofTaskManager that is provided with the StopLight framework. The BasicTaskManager utilizes a ScheduledThreadPoolExecutoras the means of scheduling the Tasks and aTaskWrapper class in order to inject StopLight logic in between the ScheduledThreadPoolExecutor and the actual running of the Task. The details of how this process works is discussed in greater detail in the "Task Monitoring Strategy" section as well as in the "How Does it All Work?" section. While the framework allows for otherTaskManagers to be created, the relatively simple nature of the TaskManager interface and theBasicTaskManager implementation make it much more likely that developers will want to create their ownTaskMonitors while reusing the defaultBasicTaskManager.

    The code snippet below illustrates the creation of aBasicTask using the HelloWorld class illustrated earlier and then the subsequent use of theBasicTaskManager to register the task. TheStopLightConfigurationManager has not yet been discussed, and will be introduced later.

    HelloWorld hello = new HelloWorld(); BasicTask myTask = new BasicTask(hello, 10000); StopLightConfigurationManager manager = StopLightConfigurationManager.getInstance(); TaskManager tm = manager.getTaskManager(); tm.registerTask(hello); 

    Task Monitoring

    So far we have examined tasks and task management, two concepts that are central to the StopLight framework. However, in order to understand how the framework actually works, we need to take a look at the task monitor as well. The classes directly involved in task monitoring are found in thecom.clarkrichey.stopLight.taskMonitor package and are illustrated in Figure 3 below. TaskMonitor classes, as defined by the TaskMonitor interface, are responsible for insuring that a given task is only executed on a single instance of the clustered application at any given time. Put differently, it is up to the TaskMonitor to make sure that two instances of the same Task are never executing at the same time. In order to achieve this, theTaskMonitor contains methods for attempting to acquire a lock on a task, as well as a method for releasing that lock and another method for acquiring information on the current lock holder for a particular Task. The StopLight framework ships with an implementation of TaskMonitor,DatabaseTaskMonitor, that uses a database to store lock information. Information about configuring theDatabaseTaskMonitor can be found in the section entitled "Configuring and Using the StopLight Framework".


    Figure 3
    Figure 3: Task monitor classes (click for full-size image)

    Task Monitoring Strategy

    A classic Strategy pattern is used in order to determine the logic used by theTaskWrapper in deciding the health of a task. TheTaskWrapper contains a reference to itsTaskMonitoringStrategy, along with a reference to theTask it is wrapping. If the call made by theTaskWrapper to the acquireLock() method of its TaskMonitor returns false, indicating that another instance of the wrapped Taskis already executing, the TaskWrapper uses itsTaskMonitoringStrategy in order to determine the health of the Task that was reported to be executing by the task monitor.

    TaskMonitoringStrategy is an interface that defines a single overloaded method, isAlive(). This overloaded method takes as parameters MonitoringInformation,TaskInfo and either a boolean or anException as parameters and returns an enumeration class, MonitoringResult. TheMonitoringInformation passed in to the method is acquired from the TaskMonitor and theTaskInfo is acquired from the Taskitself. When the overload method accepting a Boolean as a parameter is called, that Boolean is the result of a call to theisAlive() method of the HeartbeatLocator. A true value indicates that the lock on theTask is believed to be held by a livingTask, while a false result indicates that it is believed that the lock on the Task is held by a dead Task. When the overloaded method accepting anException is called, the Exception passed in is the Exception that was thrown by theisAlive() method of the HeartbeatLocator. More detail on the HeartbeatLocator can be found in the next section.

    Detection and Management of Dead Tasks

    It is the responsibility of classes implementing theHeartbeatLocator interface to detect Tasks that have terminated abnormally during their run cycle. Classes implementingHeartbeatLocator are used to attempt to determine if an instance of a Task that is holding the execute lock for a given Task type is still executing or if theTask has terminated abnormally without releasing the execute lock. The StopLight framework ships with a default implementation of HeartbeatLocator that uses a servlet for the purpose of determining Task "live-ness." ThisHeartbeatLocatorServlet implementation and the other classes in the HeartbeatLocator package are detailed in Figure 4. The details of how and when heartbeat detection occurs are covered in the next section.

    HeartbeatLocator Classes
    Figure 4: HeartbeatLocator classes

    How Does it All Work?

    The functioning of the StopLight framework is best understood through a series of sequence diagrams that illustrate how the framework behaves for a variety of use cases. Figure 5 below illustrates the standard case where a Tasksuccessfully acquires the execution lock, executes, and then releases the lock. The ScheduledThreadPoolExecutorcalls the run() method of the TaskWrapperclass that is used by the TaskManager to wrap the underlying Task.

    The purpose of the TaskWrapper is to inject StopLight framework logic between the scheduled execution of theTask as signaled by theScheduledThreadPoolExecutor and the actual execution of the local instance of the Task. This is done to ensure that only one instance of the Task is executing at a given time within the cluster. The TaskWrappercalls the acquireLock() method of theTaskMonitor in order to attempt to acquire the execution lock for that particular type ofTask. The TaskMonitor accesses the external semaphore (a database, in the case of the default implementation) to determine if the lock is available, and if it is available, to then acquire the lock on behalf of the requestor and return true, as is illustrated in this use case.

    When the TaskWrapper receives the trueresponse from its call to acquireLock(), it then proceeds to execute the wrapped Task'srun() method, allowing the Task to execute. Once the run() method completes, theTaskWrapper calls releaseLock() on theTaskMonitor. The TaskMonitor then accesses the external semaphore in order to release the lock for that type of Task. Once this has completed, the executing thread terminates. The process repeats when theTask is next scheduled to run. Figure 5 shows this process.


    Figure 5
    Figure 5. Task acquires lock and executes (click for full-size image)

    An alternative use case to the one above is that when theTask attempts to execute, it is unable to do so because another instance of that Task has already begun executing on one of the other machines in the cluster. The sequence of events that occur in this scenario is very similar to that previously discussed, with the exception that the call toacquireLock() returns false, since another instance of the Task has already acquired the lock.

    When the TaskWrapper is unable to acquire the lock, it needs to attempt to determine if it was unable to acquire the lock because another instance of the Task was able to acquire the lock before this instance, or if the lock was never released from some previous execution due to the abnormal termination of a running instance of the Task. In order to make this determination, the TaskWrappercalls getMonitoringInformation() on theTaskMonitor in order to get all available information about the current holder of the lock. of the lock's current holder is then extracted from the returned TaskInformation and is passed along to the HeartbeatLocator in the callisAlive(). The HeartbeatLocator then attempts to ping the clustered application running at the specified address. The results of this attempt are returned to theTaskMonitor. This sequence of events is depicted in Figure 6 below.

    The TaskMonitor next passes the results from theHeartbeatLocator, along with theTaskInformation returned by theTaskMonitor, to the isAlive() method of the TaskMonitoringStrategy. The business logic embedded in the TaskMonitoringStrategy uses this information to determine how to proceed. It may return one of three possible results: ISDEAD, ISALIVE, orRETRY. If RETRY is returned, then this entire process will repeat from the point where theTaskWrapper attempts to acquire the lock. IfISALIVE is returned, then the executing thread terminates and the process repeats when the Task is next scheduled to run. The consequences of ISDEADbeing returned are explored in the next paragraph. The default implementation of TaskMonitoringStrategy returnsISALIVE if the HeartbeatLocator returnedtrue and ISDEAD is theHeartbeatLocator returned false. Figure 6 illustrates how this works.


    Figure 6
    Figure 6. Task fails to acquire lock (click for full-size image)

    If the use case described above occurred not as a result of another task acquiring the lock prior to theacquireLock() method being called locally, but rather as the result of the lock not being released due to the abnormal termination of a previously executing Task, then the sequence of events depicted in Figure 7 would occur. This sequence is identical to the sequence of events described above up to the point where the TaskMonitoringStrategy returnsISDEAD. Once ISDEAD is returned, then there are two possible outcomes. First, if the Taskthat we are attempting to execute is not an instance of aRestartableTask, then the executing thread terminates and the process repeats when the Task is next scheduled to run.

    However, if the Task is aRestartableTask, then the following sequence of events occurs, as illustrated in steps 6-8 in Figure 7. First, thereleaseLock() method of TaskMonitor is invoked in order to release the lock that was left when the runningTask terminated abnormally. Next, thereset() method of the RestartableTask is called in order to allow the RestartableTask the opportunity to clean up any artifacts that may have been left over when the previously running task terminated abnormally. Lastly, the entire process of attempting to acquire the lock begins again (as illustrated in the previous use cases) in order to ensure that only one instance of the Task runs concurrently.


    Figure 7
    Figure 7. Task fails to acquire lock due to dead task (click for full-size image)

    Configuring and Using the StopLight Framework

    The StopLight framework is designed to be used with only minor setup and configuration. In order to use the framework, follow the following steps:

    1. Place the StopLight.jar file in your application's classpath.
    2. Create the required databases tables. The SQL scripts for creating the necessary tables are contained in the fileStopLight.sql.
    3. Modify web.xml to declare thecom.clarkrichey.stopLight.heartbeat.HeartbeatLocatorServletclass as a servlet that is loaded at startup.
    4. Modify web.xml to set heartbeatPath as an initialization parameter to theHeartbeatLocatorServlet. The value of thehearbeatPath parameter should be the context path of the web application, plus the URL pattern of theHeartbeatLocatorServlet.
    5. Create new or modify the default configuration files (described in detail below).

    The StopLightConfigurationManager provides client code with handles to all of the externally usable portions of the StopLight framework. The StopLightConfigurationManagerclass must have its initialize() method called on it prior to any attempt to retrieve references to these objects. Attempts to retrieve objects from theStopLightConfigurationManager prior to invokinginitialize() will cause the initialize()method to be called by theStopLightConfigurationManager.

    Once the StopLightConfigurationManager has been initialized, Tasks may be registered with theTaskManager that is made available via theStopLightConfigurationManager. Once aTask is registered with the TaskManager, no further steps are typically necessary.

    StopLight is packaged with two default configuration files. First, the default configuration file,StopLightConfiguration.xml, is located in thecom/clarkrichey/stopLight directory within theStopLight.jar. This configuration file defines the implementation classes that will be loaded byConfigurationManager for the TaskManager,TaskMonitor, HeartbeatLocator, andMonitoringStrategy interfaces. The second configuration file, DatabaseMonitorConfiguration.xml, is located in the com/clarkrichey/stopLight/monitoring/databasedirectory within StopLight.jar. This configuration file defines the connection string, user name, password, and database driver class required by the DatabaseTaskMonitor.

    In order to run StopLight with different implementation classes, either modify the StopLightConfiguration.xml file that is packaged with the .jar, or create a new configuration file. If a new StopLightConfiguration.xml file is created, it must be named StopLightConfiguration.xml but it may be placed at any arbitrary location that is accessible from the application at run time. In order to tell the StopLight framework to use the alternate configuration file, set the StopLightConfigDir system property to the absolute path to the directory containing the alternate configuration file. It is important to note that all values present in the StopLightConfiguration.xml file must be present in the alternate configuration file, even if only one value has changed.

    If the DatabaseTaskMonitor class that is provided as the default TaskMonitor implementation is to be used, then the default DatabaseMonitorConfiguration.xml file must be modified or an alternate one created. The process for creating a new DatabaseMonitorConfiguration.xml file is the same as the process described for replacing theStopLightConfiguration.xml file.

    The only development necessary for using the StopLight framework is the creation of the Task classes that will be managed by the framework. The creation of new tasks can be done through any of the means described previously in the "Tasks" section, such as by extending AbstractTask or creating a BasicTask.


    While the StopLight framework provides a complete and extensible set of services of managing tasks within a cluster, there is still additional work that could be done on the framework. Additional work on the StopLight framework is being conducted in the java.netStopLight project, which is released under the Lesser GNU Public License. Additional documentation for the StopLight framework, particularly information on extending the framework, may also be found at that site. The following list enumerates some of the planned enhancements to the framework.

    • Modification of the DatabaseTaskMonitor to support the use of a JNDI-based datasource for connection to the database.
    • Inclusion of a new implementation of theTaskManager interface to support a more robust task scheduling system, such as the Quartz project. Quartz provides a much more robust and flexible mechanism for specifying when tasks should run, but it does not provide any mechanisms for preventing a task in a cluster from running on multiple servers concurrently