Skip navigation

One of the old bits of tuning advice given when Java memory management was not as tall as it it today was to set max heap to min heap. After all, we don't really want the JVM messing around with memory when it should really be getting on with things. Fast forward a few years and the adaptive memory management picture has matured considerably. So much so that setting -Xmx == to -Xms would now be considered bad practice. Unfortunately, those of us that are rallying against the tide created by those who practice "tuning by folklore", have had our voices drowned out by the countless admin manuals that say, setup, step 12, in the console window set max heap to 2048m and min heap to 2048m, the overwhelming noise in the blog-o-sphere parroting this advice. As I've written before in this blog, setting max heap to min heap is a way to turn off the JVMs ability to adapt to changing conditions in your Java application's run time. But, maybe no more.

Those working on the long awaited G1 have just published a change request stating;

"The idea is to give G1 some room to allow it to use its heuristics for calculating the young gen size even when the heap size is fixed.

CR:
7113021 G1: automatically enable young gen size auto-tuning when -Xms==-Xmx"

This is what I like about the JVM.. if we can't change people's ideas about how to tune the JVM, we'll just change the JVM to match people's ideas of.. how to tune the JVM.

As far as I know, this will apply only the the G1 collector. So the best advice currently available for those needing/wanting adaptive GC is to use a Parallel Collector where -Xms is set to the maximum amount of heap you'd like to use and -Xmx is set higher to provide a wee bit of headroom just incase business picks up.

Just wrapped up my last performance tuning course for this year and for the second time running, some members of my Parisian group had the opportunity to run the exercises on virtualized hardware. Granted, the underlying hardware wasn't quite top shelf but then this hardware had been un-virtualized in past offerings. And, the point is to learn how to diagnose Java performance bottlenecks and in that context, OS/Hardware isn't so important.

I say that some of the group used virtualized hardware because the rest of the group (wisely) chose to use their own laptops. The mix of hardware that has been used for the exercises contains all of the usual suspects, MacBook Pro, Dell, Sony, HP, Lenovo, Siemans and a bunch of other brands. Operating systems included XP, Windows 7, OSX, and Linux. I've also run the benchmarks on Solaris (just to round out the list). At the end of the exercise the tradition is to share results.

The first benchmark is single threaded and as it turns out it surprisingly stresses the thread scheduler while trying to use the RTC. The results of the of the bench are very predicatble in that XP completes the 1 second unit of work in 1.8-1.9 seconds, OSX and Linux run in about 1.1-1.2 seconds. Out of the box Solaris runs in about 10.100 seconds. Set the clock resolution kernel parameter to 1ms and the unit of work runs in about 1.150ms. Windows 7 runs in just over 1.000-1.010 seconds. Note that results are entirely dependent upon the OS. Reboot the same hardware in a different OS and the benchmark result change accordingly.

The second benchmark tries to measure the multi-threaded throughput of various Map implementations. As you can imagine, the benchmark is sensitive to the number of cores along with the choice of OS. Again, XP scores poorly where as Linux and Mac OSX run neck and neck with Windows 7 leading the way for the synchronized collections. Linux, OSX and Windows 7 run neck in neck when benching the java.util.concurrent or non-blocking (Cliff Click's NonBlockingHashMap) maps.

The fun comes when any of these benches are run on any of these operating systems and the change in results is astounding. The 1.200 second run time balloons to 2.3-2.8 seconds. Throughput on the other benchmark drops from 10s of millions per second to a couple of million for the synchronized collections. The results are not as devistating for the non-blocking implementation.

My guess is that virtualization in the first bench, is adding additional stress to the already overworked stressed thread scheduler. I've not thought through why the second bench is so adversely affected. Bottom line, running these benches in a virtualized environment doesn't paint a very pretty performance picture. After running the benchmarks, the group started trading their virtualization war stories. My response, you can't virtualize yourself into more hardware but you can certainly virtualize yourself out of it. That said, I still like virtualization.