We have a web based content management system that is deployed on a cluster of 6 app server nodes - one node on one machine. The mem allocation for all these servers is exactly the same as follows.
JAVA_OPTIONS="-XX:NewRatio=4 -XX:TargetSurvivorRatio=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:SurvivorRatio=4 -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -Xloggc:<log path> -XX:+UseParallelOldGC"
Salient features of this allocation are
1. max and min heap size set to 2GB
2. New ratio set to 4. that means around 1/5th of total heap size i.e. 400MB should be allocated to new generation
3. explicit gc is disabled
4. Parallel gc is used for old gen
The actual memory allocation by JVM (as printed by the GC log) is quite different for 4 of these 6 nodes from the that specified at start up (as above). For the other 2 it matches the start up parameters.
Same app server (Weblogic 9.2) version runs all these nodes and it uses the same sun JVM.
The gc log for the nodes that follow startup parameters (say group A) is -
115002.287: [GC [PSYoungGen: 292636K->4774K(309632K)] 1913160K->1625386K(1988992K), 0.0298902 secs]
115052.791: [GC [PSYoungGen: 292134K->17553K(301568K)] 1912746K->1641982K(1980928K), 0.0612448 secs]
115063.928: [GC [PSYoungGen: 301521K->20383K(307648K)] 1925950K->1660326K(1987008K), 0.0905133 secs]
115070.851: [GC [PSYoungGen: 301023K->17394K(294784K)] 1940966K->1674667K(1974144K), 0.0862440 secs]
115077.942: [GC [PSYoungGen: 294770K->15302K(309696K)] 1952043K->1683397K(1989056K), 0.0766161 secs]
115078.019: [Full GC [PSYoungGen: 15302K->0K(*309696K*)] [ParOldGen: 1668094K->698215K(1630208K)] 1683397K->698215K(*1939904K*) [PSPermGen: 77517K->77487K(262144K)], 8.3740731 secs]
We can see that the total heap allocation is same as one specified - 2 GB. The new generation is around 300MB. But I have seen it going to 400MB quite often.
The 4 nodes that do not follow this pattern (say group B) have following gc log.
75508.021: [GC [PSYoungGen: 42998K->1368K(44416K)] 1712098K->1671475K(1723776K), 0.0304173 secs]
75523.338: [GC [PSYoungGen: 42947K->2101K(44352K)] 1713054K->1673104K(1723712K), 0.0312454 secs]
75538.383: [GC [PSYoungGen: 43677K->1460K(44416K)] 1714680K->1674079K(1723776K), 0.0305500 secs]
75553.920: [GC [PSYoungGen: 43060K->1541K(44352K)] 1715679K->1675125K(1723712K), 0.0308016 secs]
75566.451: [GC [PSYoungGen: 43141K->1381K(44224K)] 1716725K->1676131K(1723584K), 0.0308855 secs]
75577.718: [GC [PSYoungGen: 42981K->1460K(44160K)] 1717731K->1677193K(1723520K), 0.0305216 secs]
75587.308: [GC [PSYoungGen: 43050K->1418K(44096K)] 1718782K->1678154K(1723456K), 0.0308643 secs]
75598.593: [GC [PSYoungGen: 43018K->991K(44032K)] 1719754K->1678798K(1723392K), 0.0468571 secs]
75598.640: [Full GC [PSYoungGen: 991K->0K(*44032K*)] [ParOldGen: 1677806K->493810K(1679360K)] 1678798K->493810K(*1723392K*) [PSPermGen: 73874K->73848K(262144K)], 6.4483570 secs]
The max space allocated for new gen is around 40MB - only 10% of what we would expect from the settings. Old gen is what we would expect. This is leading to a collection on new gen every 10-20 sec. Time taken for it is less only 30-40 ms. But I can't explain why only 10% of expected space is allocated to the new gen.
There are some difference in the nodes from group A and group B. But I don't see how that could make such a difference.
The difference are -
Group A: Sun Fire 1500K with 4 dual core CPUs (16 threads) and 16 GB RAM (around 40% of it is used)
Group B: Sun Fire T-2000 with 1 quad core CPU with 4 threads each (16 threads) and 8 GB RAM (only around 40% mem used)
Group A: Solaris 9
Group B: Solaris 10
Though the app server, jvm and apps installed are the same there is some difference in the way apps are used.
Group A: only used for batch operations.
Group B: used for realtime requests from user.
I can try specifying the amount of memory that should be used for new gen on nodes in group B. Not sure if that will have any impact on mem allocation.
I can provide any further details on the setting if required.
There are quite a few default settings used by JVM. If there is a way to print those, may be we can compare and see if there is any difference.
One more observation. On group A servers collection of new gen happen every 10 sec or less. On group B servers it is at every 10-20 sec. I am wondering if the throughput collector used for new gen uses only as much memory as is required to keep gc interval or gc time within certain limit - no matter how much is made available. I have not specified any such criteria though. Is there a way to see these defaults?