1 Reply Latest reply: Dec 10, 2012 5:56 PM by 913473 RSS

    Retrying service handshake request message

    Jonathan.Knight
      Hi,

      Anyone know what this message means...

      DEBUG Coherence - 2012-04-14 16:29:50.256/11.421 Oracle Coherence GE 3.7.1.3 <D6> (thread=Cluster, member=77): Retrying service handshake request due to a concurrent membership change; resending the request to MemberSet(Size=78, ids=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 78, 79])

      I know it is a debug level message but we had an issue starting a cluster over the weekend where we see this message hundreds of times in various nodes. Then a number of the nodes die with OutOfMemoryException.

      When I look at the heap dumps the main culprit is Cluster$PacketReciever$InQueue where the __m_ElementList RecyclingLinkedList has a retained heap size of over 2GB

      At the point it all went haywire there was no data in the caches, we were just starting the cluster. This cluster is 285 x 3GB heap storage nodes and 19 extend proxy nodes spread across 19 physical servers. As you can see from the log we are using 3.7.1.3

      We managed to restart the cluster eventually but I now have the unenviable task of trying to explain what happened.

      JK

      Edited by: Jonathan.Knight on Apr 16, 2012 2:03 PM