1 Reply Latest reply: Jun 17, 2010 5:21 AM by 755396 RSS

    Processing Pattern 1.2.1 rejected tasks

    755396
      We're seeing the following warning in our logs:

      2010-06-12 16:16:17,295 WARN [Logger@9271164 3.5.3/465p2] (Coherence) (thread=DistributedCache:DistributedServiceForProcessingPattern member=3) Rejecting DequeueTask because TaskProcessor is INACTIVE Class:com.oracle.coherence.patterns.processing.internal.task.DefaultTaskProcessorMediator

      Scanning the code this looks to be related to the expiry of a Lease preventing a TaskProcessorMediator from dequeueing a Task. As far as I can tell the Task is simply redispatched as a result. I'd like a bit more certainty - what I'd really like to see is a bit of detail as to how the lifecycle of a Lease affects the behaviour of the Processing Pattern implementation. In particular, I'd like to know why I'm (apparently) seeing Lease expiry in the first place, and whether there's anything I can do to control the Leasing policy/lifecycle?

      Thanks for any help,

      Simon
        • 1. Re: Processing Pattern 1.2.1 rejected tasks
          755396
          Some further analysis on my application log entries indicates that all the log entries below are symptomatic of a race condition between one process that periodically expires Leases, and a second process that periodically renews Leases. Leases are only actually expired if the second process loses out to the first. The side effect of this is that some Tasks can be executed multiple times, i.e. Tasks must be idempotent.

          Unfortunately the RecoverTasks log entries give no indication of just which Tasks are being recovered. Hopefully I'm not misinterpreting the evidence to fit the story.

          The following log lines, in sequence, broadly show the lifecycle of the Thingybob Task, with some coincidental logs for DoSomething and DoHickey Tasks:

          -- Initial Task dispatch happens on server 2 node D. This results in Task execution commencing on server 1 node A:

          server2-storage-D.log:2010-06-12 16:15:02,209 INFO [Logger@9215997 3.5.3/465p2] (Coherence) (thread=Thread-10 member=7) [LoggingDispatcher] UUID=0x000001292CF15C350A653364FC0A1A9157C27BEF38C75404C2BD886BE6ECBA41, Payload=MyResumableTask:Thingybob Class:com.oracle.coherence.patterns.processing.dispatchers.logging.LoggingDispatcher

          server1-storage-A.log:2010-06-12 16:15:02,250 INFO [GridExecutor:Thread-15] (MyResumableTask) Started running task for Thingybob

          -- Sometime between 1" and 1"20' later the TaskProcessorMediator's Lease is apparently expired by a ServiceLeaseMonitor. This results in Task recovery. I can see three such events in the logs, which tallies with the fact that two Tasks apparently ran for each of DoSomething, DoHickey, and Thingybob. This seems to be down to plain old bad luck since a ClientLeaseMaintainer is supposed to renew the Lease periodically. It's bad luck because the period the ServiceLeaseMonitor and the ClientLeaseMaintainer use to expire or renew a Lease is the same duration, so there's a race condition. At 16:16:17 the ServiceLeaseMonitor won the race.

          server1-storage-C.log:2010-06-12 16:16:17,230 INFO [Logger@9270387 3.5.3/465p2] (Coherence) (thread=Thread-16 member=6) Starting recovery of 1 tasks. Class:com.oracle.coherence.patterns.processing.internal.task.RecoverTasks

          server1-storage-C.log:2010-06-12 16:16:17,239 INFO [Logger@9270387 3.5.3/465p2] (Coherence) (thread=Thread-17 member=6) Starting recovery of 1 tasks. Class:com.oracle.coherence.patterns.processing.internal.task.RecoverTasks

          server2-storage-C:2010-06-12 16:16:17,291 INFO [Logger@9271164 3.5.3/465p2] (Coherence) (thread=Thread-16 member=3) Starting recovery of 1 tasks. Class:com.oracle.coherence.patterns.processing.internal.task.RecoverTasks

          -- The TaskProcessor running on server 2 node C attempts to dequeue a Task but the associated TaskProcessorMediator's Lease has expired, as noted above. Unknown is whether the Task being dequeued is the Thingybob priming Task or something else that happened to be in the queue. It doesn't seem to matter since all three of my Tasks completed successfully:

          server2-storage-C:2010-06-12 16:16:17,295 WARN [Logger@9271164 3.5.3/465p2] (Coherence) (thread=DistributedCache:DistributedServiceForProcessingPattern member=3) Rejecting DequeueTask because TaskProcessor is INACTIVE Class:com.oracle.coherence.patterns.processing.internal.task.DefaultTaskProcessorMediator

          -- The earlier resubmission results in the Thingybob Task being re-dispatched by server 3 node C and executed to completion on server 1 node B. Since the Task that was first submitted is actually still running on server 1 node A, this Task logs completion at about the same time:

          server3-storage-C.log:2010-06-12 16:16:17,300 INFO [Logger@9240084 3.5.3/465p2] (Coherence) (thread=Thread-10 member=14) [LoggingDispatcher] UUID=0x000001292CF15C350A653364FC0A1A9157C27BEF38C75404C2BD886BE6ECBA41, Payload=MyResumableTask:Thingybob Class:com.oracle.coherence.patterns.processing.dispatchers.logging.LoggingDispatcher

          server1-storage-B.log:2010-06-12 16:16:37,563 INFO [GridExecutor:Thread-14] (MyResumableTask) Started running task for Thingybob

          server1-storage-B.log:2010-06-12 16:18:30,892 INFO [GridExecutor:Thread-14] (MyResumableTask) Finished running task for Thingybob

          server1-storage-A.log:2010-06-12 16:18:30,894 INFO [GridExecutor:Thread-15] (MyResumableTask) Finished running task for Thingybob

          server1-storage-B.log:2010-06-12 16:18:30,905 INFO [Logger@9257178 3.5.3/465p2] (Coherence) (thread=GridExecutor:Thread-14 member=15) Executed SKP:{0x000001292CF15C350A653364FC0A1A9157C27BEF38C75404C2BD886BE6ECBA41,0x000001292CF15C350A653364F0C89340F040CE44FDA78EF6CF12C351E6ECBA40} to produce PRIMED Class:com.oracle.coherence.patterns.processing.taskprocessor.TaskRunner

          -- server 2 node C and server 1 node A each logged some odd stuff that I can't definitely account for. I suspect that the first Task running on server 1 node A attempted to update its associated SubmissionResult stored on server 2 node C, but since that Task had previously had its Lease expired, the SubmissionResult was in an unexpected state:

          server2-storage-C:2010-06-12 16:18:30,906 WARN [Logger@9271164 3.5.3/465p2] (Coherence) (thread=DistributedCache:DistributedServiceForProcessingPattern member=3) Could not set SR:{DONE,0x000001292CF15C350A653364F0C89340F040CE44FDA78EF6CF12C351E6ECBA40,1276359397562,95363,1276359302199} to DONE because it wasnt EXECUTING Class:com.oracle.coherence.patterns.processing.internal.DefaultSubmissionResult

          server1-storage-A.log:2010-06-12 16:18:30,906 INFO [Logger@9237753 3.5.3/465p2] (Coherence) (thread=GridExecutor:Thread-15 member=5) Failed to set processing to executed for task SKP:{0x000001292CF15C350A653364FC0A1A9157C27BEF38C75404C2BD886BE6ECBA41,0x000001292CF15C350A653364F0C89340F040CE44FDA78EF6CF12C351E6ECBA40} Class:com.oracle.coherence.patterns.processing.taskprocessor.TaskRunner

          Hope this helps.

          Simon