2 Replies Latest reply on Feb 10, 2010 10:33 AM by 751727

    Failure to join a cluster

      Hi ,
      We are using Tangosol coherence for clustering purpose in our product Webmethods Integration server.
      When our server starts up it tries to jojn tne cluster.
      Recently we have found a strange thing when we try to start our 2 servers simultaneously to join a cluster.
      The first server successfully joins the cluster, but I see the following strange message in tangosol.log for 2nd server like :-

      ++2010-01-04 05:06:27.562 Tangosol Coherence CE 3.2.2/371 <D5> (thread=Cluster, member=n/a): Received a discovery message that indicates the presence of an existing cluster that does not respond to join requests; this is usually caused by a network layer failure:++
      ++Message "SeniorMemberHeartbeat"++
      ++FromMember=Member(Id=1, Timestamp=2010-01-04 04:59:51.684, Address=, MachineId=18713)++
      ++[000]=Broadcast{PacketType=0x0DDF00D2, ToId=0, FromId=1, Direction=Incoming, ReceivedMillis=05:06:27.562, MessageType=17, MessagePartCount=1, MessagePartIndex=0, Body=0x0000000125F8C6BC84C0A8211900000000000000000000000040005FDE000049190001030208080040404040404000000000000000000000010000000100, Body.length=62}++
      ++Service=ClusterService{Name=Cluster, State=(SERVICE_STARTED, STATE_ANNOUNCE), Id=0, Version=3.2}++

      ++MemberSet=MemberSet(Size=1, BitSetCount=1, ids=[1])++
      ++2010-01-04 05:06:27.563 Tangosol Coherence CE 3.2.2/371 <Error> (thread=Cluster, member=n/a): Failure to join a cluster for 96 seconds; stopping cluster service.++
      ++2010-01-04 05:06:27.563 Tangosol Coherence CE 3.2.2/371 <D5> (thread=Cluster, member=n/a): Service Cluster left the cluster.++

      This is peculiar as this node continue to get heartbeat of senior node , but senior node is not responding to join requests!.

      My question is, in this case should not the 2nd node form it's own cluster when senior node is not responding? Instead we find that the node is leaving the cluster. Is this the expected behavior or is this is a bug of coherence?
      Please let me know what are the troubleshooting options available for this problem?