I know it is a debug level message but we had an issue starting a cluster over the weekend where we see this message hundreds of times in various nodes. Then a number of the nodes die with OutOfMemoryException.
When I look at the heap dumps the main culprit is Cluster$PacketReciever$InQueue where the __m_ElementList RecyclingLinkedList has a retained heap size of over 2GB
At the point it all went haywire there was no data in the caches, we were just starting the cluster. This cluster is 285 x 3GB heap storage nodes and 19 extend proxy nodes spread across 19 physical servers. As you can see from the log we are using 18.104.22.168
We managed to restart the cluster eventually but I now have the unenviable task of trying to explain what happened.
Edited by: Jonathan.Knight on Apr 16, 2012 2:03 PM
Any idea what caused this. We have a large cluster (16GB JVM's) - and about 2 TB of total allocated space. We ran into the same issue couple of days and the cluster shut down on its own. We lost all data and had to reload all of it.