1 Reply Latest reply: Jun 10, 2010 1:15 AM by 567247 RSS

    Processing Pattern 1.2.1 duplicate Task dispatch

    755396
      My application is exhibiting a behaviour that I can't figure out. So far I haven't been able to reproduce the behaviour in my development environment so I need a nudge in the right direction.

      Background: Each node in my application is also a storage-enabled member of a Coherence cluster. When my application starts up each node submits a ResumableTask to a DefaultProcessingSession. Cluster-wide, I want only one of these submitted Tasks to run, so the first thing my ResumableTask does is to check to see whether another instance of the Task is already running. It does this by using an EntryProcessor to check for an Entry in a replicated cache with default lease-granularity. If the Entry is not present its value is set and this value is returned; conversely if an Entry is already present the stored value is returned. The winning Task is the one that owns the stored value; losing Tasks simply return.

      The issue is that one of my Tasks appears to be dispatched twice. Since the stored value is associated with the twice-dispatched Task, the Task also executes twice, ultimately resulting in a NullPointerException for the "slow" Task. This is something I am seeking to avoid.

      Other Tasks, with different behaviour, are being submitted whilst the Task at issue is running.

      I don't make any use of the TaskExecutionEnvironment. Could this help?

      Here are what appear to be the relevant log lines from my application nodes, in time order. Note that "MyResumableTask:-4009345350290292384" is the name of my Task plus a randomly-generated identifier. I'm not sure whether this might be relevant, but I also use a randomly-generated identifier for each ProcessingSession.

      ---
      Host 3, Application Node C:

      2010-05-30 17:11:19,630 INFO [Logger@9221436 3.5.3/465p2] (Coherence) (thread=Thread-10 member=8) [LoggingDispatcher] UUID=0x00000128EA32393F0AF197829A07560408A66113FC245169CB509E4E09DBC5F2, Payload=MyResumableTask:-4009345350290292384 Class:com.oracle.coherence.patterns.processing.dispatchers.logging.LoggingDispatcher

      2010-05-30 17:12:35,436 INFO [Logger@9221436 3.5.3/465p2] (Coherence) (thread=Thread-10 member=8) [LoggingDispatcher] UUID=0x00000128EA32393F0AF197829A07560408A66113FC245169CB509E4E09DBC5F2, Payload=MyResumableTask:-4009345350290292384 Class:com.oracle.coherence.patterns.processing.dispatchers.logging.LoggingDispatcher

      ---
      Host 1, Application Node B:

      2010-05-30 17:24:00,952 INFO [Logger@9226875 3.5.3/465p2] (Coherence) (thread=GridExecutor:Thread-14 member=11) Executed SKP:{0x00000128EA32393F0AF197829A07560408A66113FC245169CB509E4E09DBC5F2,0x00000128EA32393E0AF197826E551231AD6E0A0AE92DCF87C6F4B0DD09DBC5F1} to produce SUCCESS Class:com.oracle.coherence.patterns.processing.taskprocessor.TaskRunner

      ---
      Host 2, Application Node C:

      2010-05-30 17:24:01,296 WARN [Logger@9215997 3.5.3/465p2] (Coherence) (thread=GridExecutor:Thread-14 member=2) TaskRunner - Failed to process SKP:{0x00000128EA32393F0AF197829A07560408A66113FC245169CB509E4E09DBC5F2,0x00000128EA32393E0AF197826E551231AD6E0A0AE92DCF87C6F4B0DD09DBC5F1} due to:
      Portable(java.lang.NullPointerException): Entry is null, key:0x00000128EA32393E0AF197826E551231AD6E0A0AE92DCF87C6F4B0DD09DBC5F1 Class:com.oracle.coherence.patterns.processing.taskprocessor.TaskRunner

      ---

      Any suggestions gratefully received!

      Thanks,

      Simon
        • 1. Re: Processing Pattern 1.2.1 duplicate Task dispatch
          567247
          Simon,

          Sorry for the delay in responding to you.

          The only case where this should be possible is if there is a partition transfer due to a node failure or a cluster starting up. If that is the case, there might be a re-dispatch of a Task that hasn't finished.

          This code has been re-worked for the next release, fixing a number of holes in the implementation.

          If this is repeatable, would you mind running with -Dtangosol.coherence.log.level=7 and send the logs to me at christer.fahlgren at oracle dot com.

          Thanks,
          Christer