Forum Stats

  • 3,780,578 Users
  • 2,254,411 Discussions
  • 7,879,386 Comments

Discussions

Unicast Issue

User_MCOJG
User_MCOJG Member Posts: 2 Green Ribbon
edited Aug 28, 2020 3:09PM in Coherence Support

    I have 5 nodes on a single server.  One is a proxy node and the other 4 are storage nodes.  I have configured the cluster to be unicast and each node contains the following override.xml file:

<coherence>
  <license-config>
  <license-mode system-property="tangosol.coherence.mode">prod</license-mode>
  </license-config>
  <cluster-config>
  <member-identity>
  <cluster-name>AggregationService-LinuxUAT</cluster-name>
  </member-identity>
  <unicast-listener>
  <well-known-addresses>
  <socket-address id="1">
  <address id="1">172.30.144.76</address>
  <port>18099</port>
   <!--address id="2">qatritongrd11</address-->
   </socket-address>
  </well-known-addresses>
  <address>172.30.144.76</address>
  <port>18099</port>
  <port-auto-adjust>true</port-auto-adjust>
  </unicast-listener>
  <authorized-hosts>
  <host-address id="1">qatritongrd10</host-address>
  <host-address id="2">qatritongrd11</host-address>
  </authorized-hosts>

...

When I run the nodes on the same server, each tries to join the cluster but after a while the nodes drop off.  I see in the logs that partitions are trying to be rebalanced but they get backed up.  Almost like they can't communicate with the other nodes.

Even though I configured for Unicast when I check the JMX statistics for each it is showing that Multicast is enabled with a multicast address of /224.12.1.3 with a port of 12130 in addition to the unicast port and address that I configured.

My question is did I configure something else.  How do nodes on the same server communicate with each other when i unicast mode?  Does each node on the same machine need to be configured with the unicast listener?  Why is multicast still shown as enabled in the node mbeans?

Looking at the logs.  I see the following:

The service currently has an active outgoing (sent but not yet completed) request for a primary distribution.

Aug 28 01:13:58 qatritongrd10 java[31764]: 3360 scheduled distributions remain to be processed:

Aug 28 01:13:58 qatritongrd10 java[31764]: Transfer of Primary PartitionSet{0..1679} from Member 1

Aug 28 01:13:58 qatritongrd10 java[31764]: Transfer of Backup PartitionSet{0..1679} to Member 1

Aug 28 01:14:00 qatritongrd10 java[31764]: 01:14:00,980  WARN Coherence:3 - 2020-08-28 01:14:00.980 Oracle Coherence GE n/a <Warning> (thread=DistributedCache:dist-securitytransfercache-service, member=3): Current partition distribution has been pending for over 426 seco

Aug 28 01:14:00 qatritongrd10 java[31764]: There are 1681 outgoing backup transfers in progress

Aug 28 01:14:00 qatritongrd10 java[31764]: 1681 scheduled distributions remain to be processed:

Aug 28 01:14:00 qatritongrd10 java[31764]: Transfer of Backup PartitionSet{1680..3360} to Member 1

Aug 28 01:14:01 qatritongrd10 java[31764]: 01:14:01,091  WARN Coherence:3 - 2020-08-28 01:14:01.091 Oracle Coherence GE n/a <Warning> (thread=DistributedCache:dist-currencyholidayoverridecache-service, member=3): Current partition distribution has been pending for over 4

Aug 28 01:14:01 qatritongrd10 java[31764]: There are 64 outgoing backup transfers in progress

Aug 28 01:14:01 qatritongrd10 java[31764]: 64 scheduled distributions remain to be processed:

Aug 28 01:14:01 qatritongrd10 java[31764]: Transfer of Backup PartitionSet{63..126} to Member 1

Aug 28 01:14:24 qatritongrd10 java[31764]: 01:14:24,220  WARN Coherence:3 - 2020-08-28 01:14:24.220 Oracle Coherence GE n/a <Warning> (thread=DistributedCache:dist-brokertradecache-service, member=3): Current partition distribution has been pending for over 450 seconds;

Aug 28 01:14:24 qatritongrd10 java[31764]: There are 1018 outgoing backup transfers in progress

Aug 28 01:14:24 qatritongrd10 java[31764]: 1681 scheduled distributions remain to be processed:

Aug 28 01:14:24 qatritongrd10 java[31764]: Transfer of Backup PartitionSet{1680..3360} to Member 1

Aug 28 01:14:26 qatritongrd10 java[31764]: 01:14:26,428  WARN Coherence:3 - 2020-08-28 01:14:26.428 Oracle Coherence GE n/a <Warning> (thread=DistributedCache:dist-tradeobject-service, member=3): Current partition distribution has been pending for over 450 seconds;

Aug 28 01:14:26 qatritongrd10 java[31764]: There are 140 outgoing backup transfers in progress

Aug 28 01:14:26 qatritongrd10 java[31764]: 140 scheduled distributions remain to be processed:

Aug 28 01:14:26 qatritongrd10 java[31764]: Transfer of Backup PartitionSet{2241..2380} to Member 2

Aug 28 01:14:26 qatritongrd10 java[31764]: 01:14:26,840 ERROR Coherence:3 - 2020-08-28 01:14:26.839 Oracle Coherence GE n/a <Error> (thread=Cluster, member=3): Received cluster heartbeat from the senior Member(Id=1, Timestamp=2020-08-28 00:58:50.254, Address=172.30.144.7

Aug 28 01:14:27 qatritongrd10 java[31764]: 01:14:27,116 ERROR Coherence:3 - 2020-08-28 01:14:27.107 Oracle Coherence GE n/a <Error> (thread=Cluster, member=3): Full Thread Dump:

....

01:24:33,172 ERROR Coherence:3 - 2020-08-28 01:24:33.172 Oracle Coherence GE n/a <Error> (thread=ServiceMonitor, member=n/a): Error while starting cluster: com.tangosol.net.RequestTimeoutException: Timeout during service start:

Aug 28 01:24:33 qatritongrd10 java[31764]: MemberSet=MasterMemberSet(

Aug 28 01:24:33 qatritongrd10 java[31764]: ThisMember=null

Aug 28 01:24:33 qatritongrd10 java[31764]: OldestMember=null

Aug 28 01:24:33 qatritongrd10 java[31764]: ActualMemberSet=MemberSet(Size=0

Aug 28 01:24:33 qatritongrd10 java[31764]: )

Aug 28 01:24:33 qatritongrd10 java[31764]: MemberId|ServiceVersion|ServiceJoined|MemberState

Aug 28 01:24:33 qatritongrd10 java[31764]: RecycleMillis=2400000

Aug 28 01:24:33 qatritongrd10 java[31764]: RecycleSet=MemberSet(Size=0

Aug 28 01:24:33 qatritongrd10 java[31764]: )

Aug 28 01:24:33 qatritongrd10 java[31764]: )

Aug 28 01:24:33 qatritongrd10 java[31764]: )

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onStartupTimeout(Grid.CDB:3)

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.util.daemon.queueProcessor.Service.start(Service.CDB:27)

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.start(Grid.CDB:6)

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.net.Cluster.startSystemServices(Cluster.CDB:4)

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.net.Cluster.onStart(Cluster.CDB:53)

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.net.Cluster.start(Cluster.CDB:11)

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.util.SafeCluster.startCluster(SafeCluster.CDB:4)

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.util.SafeCluster.restartCluster(SafeCluster.CDB:10)

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.util.SafeCluster.ensureRunningClusterInternal(SafeCluster.CDB:30)

Aug 28 01:24:33 qatritongrd10 java[31764]: at com.tangosol.coherence.component.util.SafeCluster$EnsureClusterAction.run(SafeCluster.CDB:1)

Aug 28 01:24:33 qatritongrd10 java[31764]: at java.security.AccessController.doPrivileged(Native Method)

Aug 28 01:24:33 qatritongrd10 java[31764]: at javax.security.auth.Subject.doAs(Subject.java:360)