I am getting OOM on one node of a 2-node Weblogic cluster when coherence tries to replicate cache data.
I am unable to insert images, so ...
Please download zip containing screenshots and config files from the link below: (Ignore the browser error -> Click on the download link at the top-left)
List of files:
3 screenshots of MAT (Memory Analyser tool)
2 coherence config files
WebLogic Server Version: 10.3.3.0
Can you please share with us the JVM arguments used in the Coherence nodes?
Also, can you provide the average of data that is handled (intended) to store in Coherence?
My guess is that you're not reclaiming appropriately the entries when they became unused, and so, they won't fit in the heap space.
Finally, are you using J9 as the JVM? Is this a AIX platform?
weblogic 17118 17034 12 04:55 ? 00:06:19 /apps/sw/oracle/mw/jrockit-x64/bin/java -jrockit -Xms6g -Xmx6g -Dweblogic.Name=HWP-NGI-1 -Djava.security.policy=/apps/sw/oracle/mw/app_18.104.22.168/wlserver_10.3/server/lib/weblogic.policy -Dweblogic.ProductionModeEnabled=true -Xmanagement:ssl=false,authenticate=false,port=7099 -Dweblogic.nodemanager.ServiceEnabled=true -Dweblogic.security.SSL.ignoreHostnameVerification=true -Dweblogic.ReverseDNSAllowed=false -Dweblogic.Stdout=/logs/servers/HWP-NGI-1/HWP-NGI-1.out -Djava.util.logging.config.file=gs_logging.properties -da -Dplatform.home=/apps/sw/oracle/mw/app_22.214.171.124/wlserver_10.3 -Dwls.home=/apps/sw/oracle/mw/app_126.96.36.199/wlserver_10.3/server -Dweblogic.home=/apps/sw/oracle/mw/app_188.8.131.52/wlserver_10.3/server -Ddomain.home=/apps/services/mw/HIAS-WLS-PAT -Dcommon.components.home=/apps/sw/oracle/mw/app_184.108.40.206/oracle_common -Djrf.version=11.1.1 -Djrockit.optfile=/apps/sw/oracle/mw/app_220.127.116.11/oracle_common/modules/oracle.jrf_11.1.1/jrocket_optfile.txt
See more screenshots (you need to unzip)
Object cached to coherence - ScheduledSegment. This is a huge list (> 3 million), but pushed to coherence as a batch (average 100 objects in a batch)
JVM - jRockit JDK 1.6.0_26
OS Platform - Linux CentOS 5.6 (Same kernel as Red Hat Enterprise 5.6)
Based on your response, I made a few calculations, considering:
- 6GB of Heap per Node
- 1.5KB of each entry size (an average from my experience with entities in the market)
Each Heap should be able to handle ~4M entries at maximum. So there are two scenarios to consider:
1) Your cache scheme is based on partitioning: your total store capacity should be 12GB since you have two nodes. In this case, you can suffering from bad GC configuration. An heap dump could help us to better assist you.
2) Your cache scheme is based on replication: your total store capacity should be 6GB since each entry is replicated for each node. In this case, check your caches's size to investigate if you are not storing more than 4M entries.