I'm one of the developers on Apache Lucene, trying to run performance
tests of Lucene's new near real-time search feature...
But, I'm getting erratic behaviour out of the hotspot compiler,
whereby the compiled code can run up to ~60% slower depending on what
code has run before it, which really makes my testing hard!
Somehow, hotspot seems to get tricked into compiling the code very
badly, and then never recompiles it.
I'm running with JDK 1.6.0_17-b04, on a CentOS 5.4 (Linux) box. I run
java like this:
java -server -Xms1g -Xmx1g
I've tried various advanced -XX options to try to workaround this,
to no avail.
After whittling down my test, I managed to get a very simple
standalone test (depends only on the Lucene 3.0 JAR) that shows the
problem. The test first runs unrelated "warmup" code, then runs a
fixed search test multiple times, printing the fastest run.
There are 4 options for "warmup", and when I run the test with each of
these 4, it runs as fast as 726 msec and as slow as 1160. The numbers
are very stable (low noise) so I'm pretty sure this is accurately
measuring what hotspot had compiled.
I turned on -XX:+PrintOptoAssembly (downloaded the fastdebug JDK), and
it's really weird -- even very low level methods like readVInt (reads
a variable-length encoded integer, a serious hot spot in Lucene) was
compiled differently, depending on what code ran during "warmup".
Any ideas? For my testing I really just want consistency, but, going
forward I also would like to somehow "bias" hotspot to not produce the
60% slower compiled code...
The "compilation planning", and whole discussion/referenced papers
from there, are very interesting. It sounds like I'm indeed running
up against the complexities in the implicit "performance model" from
I just wish I had a simple way to "manage", or even understand
the differences. I'll try to study the output of
-XX:+PrintCompilation to see if I can gain any insight.... but this
situation pretty much makes it impossible for me to do the performance
testing I had wanted to do (the 60% perf hit due to different hotspot
decisions completely obliterates what I'm trying to measure!).