0 Replies Latest reply on Apr 2, 2019 9:51 AM by User13419260-Oracle

    One of the coherence nodes shutdown causing task execution to halt.

    User13419260-Oracle

      Hi Team,

       

      We are currently on Coherence 12.2.1.2.0 and using Processing Pattern jar of version 12.5.0.http://

       

      Issue Description.

      Lets say I have two machines (Machine1 and Machine2) joining and participating in a coherence cluster which are termed as Agents in our terminology.

      We generally use stand alone Client (No Storage enabled) to submit tasks to cluster above for distributed processing by registering them in overridden coherence-config files. (Provided configuration below for reference:)

       

      Now when the both the machines (agents) are up and running and when I try to submit the task from Client, I am able to successfully submit and see it executing on the server side which is good.

      As a disaster scenario, when I made the Machine1 down and submit a task from client on another machine Machine2, my expectation is the other registered Machine2 which is up and running should be successfully picking the task. But it gets stuck and doesn't throw any stack trace.

       

      As a first step, I tried to print additional loggers by modifying the logging severity to 9 in coherence-config file which didn't help.

      Then I tried to debug coherence processing pattern code and found the task that is submitted (submissionResult.getSubmissionState()) is not moving from Assigned state to Executing state.

      In failure scenario, I don't see the task submission state moving to Executing and hence the below method does nothing.

       

      I guess when one of the nodes is shutdown the submission state is not moving to Executing state and hence the required method submissionOutcome.onStarted(); is not getting called which is making the task getting stuck.

       

      I have printed the submission state in failure case and came to know its stuck in Assigned state. Here is my query.

       

      Which listener will move the submission state from Assigned to Executing? (I believe its onMapEvent that triggers the status change)

      Will that listener fails due to external JVM (part of cluster) getting shutdown ?

       

      Please note that I ahve already configured below

      <shutdown-listener>

                  <enabled system-property="tangosol.coherence.shutdownhook">none</enabled>

      </shutdown-listener>

       

      Code snippet  from

      com.oracle.coherence.patterns.processing.internal.DefaultProcessingSession.handleResult ().

       

      else if (submissionOutcome != null)

          {

            if (submissionResult.getSubmissionState() == SubmissionState.SUSPENDED) {

              submissionOutcome.onSuspended();

            } else if (submissionResult.getSubmissionState() == SubmissionState.EXECUTING) {

              if (submissionResult.getProgress() == null) {

               submissionOutcome.onStarted();

              } else {

                submissionOutcome.onProgress(submissionResult.getProgress());

              }

            }

          }

       

      Sample Coherence-config file for Machine1.

       

      <coherence xml-override="/tangosol-coherence-override.xml">

      <cluster-config>

      <member-identity>

      <cluster-name>CLUSTERNAME</cluster-name>

      <member-name>Machine1</member-name>

      </member-identity>

      <unicast-listener>

      <address>Machine1Host</address>

      <port>12010</port>

      <port-auto-adjust>false</port-auto-adjust>

      <well-known-addresses>

      <!--  Agents -->

      <socket-address id="1">

      <address>Machine1Host</address>

      <port>12010</port>

      </socket-address>

      <socket-address id="2">

      <address>Machine2Host</address>

      <port>12020</port>

      </socket-address>

      <!--  Clients -->

      <socket-address id="3">

      <address>Machine1Host</address>

      <port>12030</port>

      </socket-address>

      <socket-address id="4">

      <address>Machine2Host</address>

      <port>12040</port>

      </socket-address>

      </well-known-addresses>

      </unicast-listener>

              <shutdown-listener>

                  <enabled system-property="tangosol.coherence.shutdownhook">none</enabled>

              </shutdown-listener>

      </cluster-config>

      </coherence>

       

      Sample Coherence-Config for Machine2

       

      <coherence xml-override="/tangosol-coherence-override.xml">

      <cluster-config>

      <member-identity>

      <cluster-name>CLUSTERNAME</cluster-name>

      <member-name>Machine2</member-name>

      </member-identity>

      <unicast-listener>

      <address>Machine2Host</address>

      <port>12020</port>

      <port-auto-adjust>false</port-auto-adjust>

      <well-known-addresses>

      <!--  Agents -->

      <socket-address id="1">

      <address>Machine1Host</address>

      <port>12010</port>

      </socket-address>

      <socket-address id="2">

      <address>Machine2Host</address>

      <port>12020</port>

      </socket-address>

      <!--  Clients -->

      <socket-address id="3">

      <address>Machine1Host</address>

      <port>12030</port>

      </socket-address>

      <socket-address id="4">

      <address>Machine2Host</address>

      <port>12040</port>

      </socket-address>

      </well-known-addresses>

      </unicast-listener>

              <shutdown-listener>

                  <enabled system-property="tangosol.coherence.shutdownhook">none</enabled>

              </shutdown-listener>

      </cluster-config>

      </coherence>

       

      Sample Client Config for Client 1. (similar config is available for client 2 as well)

      <coherence xml-override="/tangosol-coherence-override.xml">

      <cluster-config>

      <member-identity>

      <cluster-name>CLUSTERNAME</cluster-name>

      <member-name>Client1</member-name>

      </member-identity>

      <unicast-listener>

      <address>Machine1Host</address>

      <port>12030</port>

      <port-auto-adjust>false</port-auto-adjust>

      <well-known-addresses>

      <!--  Agents -->

      <socket-address id="1">

      <address>Machine1Host</address>

      <port>12010</port>

      </socket-address>

      <socket-address id="2">

      <address>Machine2Host</address>

      <port>12020</port>

      </socket-address>

      <!--  Clients -->

      <socket-address id="3">

      <address>Machine1Host</address>

      <port>12030</port>

      </socket-address>

      <socket-address id="4">

      <address>Machine2Host</address>

      <port>12040</port>

      </socket-address>

      </well-known-addresses>

      </unicast-listener>

              <shutdown-listener>

                  <enabled system-property="tangosol.coherence.shutdownhook">none</enabled>

              </shutdown-listener>

      </cluster-config>

      </coherence>

       

       

      Please let me know if you have any additional queries to look into this.

       

      Message was edited by: User13419260-Oracle