9 Replies Latest reply on Feb 16, 2006 4:46 PM by 843829

    Why does hprof=heap=dump have so much overhead?

      I udnerstand why the HPROF option heap=sites incurs a massive performance overhead; it has to intercept every allocation and record the current call stack.

      However, I don't understand why the HPROF option heap=dump incurs so much of a performance overhead. Presumably it could do nothing until invoked, and only then trace from the system roots the entire heap.

      Can anyone speak to why it doesn't work that way?

      - Gordon @ IA
        • 1. Re: Why does hprof=heap=dump have so much overhead?
          The reason it is so expensive is because all object allocation is instrumented so as to record the stack trace where it was allocated.

          The following blog contains many useful information:
          • 2. Re: Why does hprof=heap=dump have so much overhead?
            Traditionally agents like hprof had to be loaded into the virtual machine at startup, and this was the only way to capture these object allocations. The new hprof in the JDK 5.0 release (Tiger) was written using the newer VM interface JVM TI and this new hprof was mostly meant to reproduce the functionality of the old hprof from JDK 1.4.2 that used JVMPI. (Just FYI: run 'java -Xrunhprof:help' for help on hprof).

            The JDK 5.0 hprof will at startup, instrument java.lang.Object.<init>() and all classes and methods that use the newarray bytecodes. This instrumentation doesn't take long and is just an initial startup cost, it's the run time and what happens then that is the performance bottleneck. At run time, as any object is allocated, the instrumented methods trigger an extra call into a Java tracker class which in turn makes a JNI call into the hprof agent and native code. At that point, hprof needs to track all the objects that are live (the JVM TI free event tells it when an object is freed), which takes a table inside the hprof agent and memory space. So if the machine you are using is low on RAM, using hprof will cause drastic slowdowns, you might try heap=sites which uses less memory but just tracks allocations based on site of allocation not individual objects.

            The more likely run time performance issue is that at each allocation, hprof wants to get the stack trace, this can be expensive, depends on how many objects are allocated. You could try using depth=0 and see if the stack trace samples are a serious issue for your situation. If you don't need stack traces, then you would be better off looking at the pmap command that gets you an hprof binary dump on the fly, no overhead, then you can use jhat (or HAT) to browse the heap. This may require use of the JDK 6 (Mustang) release for this experiment, see http://mustang.dev.java.net for the free downloads of JDK 6 (Mustang).

            There is an RFE for hprof to allow the tracking of allocations to be turned on/off in the Java tracker methods that were injected, at the Java source level. But this would require adding some Java APIs to control sun/tools/hprof/Tracker which is in rt.jar. This is very possible and more with the JVM TI interfaces.

            If you haven't tried the NetBeans Profiler (http://www.netbeans.org) you may want to look at it. It does take an incremental approach to instrumentation and tries to focus in on the areas of interest and allows you to limit the overhead of the profiler. It works with the latest JDK 5 (Tiger) update release, see http://java.sun.com/j2se.

            Oh yes, also look at some of the JVM TI demos that come with the JDK 5 download. Look in the demo/jvmti directory and try the small agents HeapTracker and HeapViewer, they have much lower overhead and the binaries and all the source is right there for you to just use or modify and customize for yourself.

            Hope this helps.

            • 3. Re: Why does hprof=heap=dump have so much overhead?
              Thanks for your answers, that is a great blog entry and it seems like in 6.0 the built-in heap dump is exactly what I'd expected -- a trace from roots when you want a dump, with no intrumentation overhead in regular ops.

              The behavior of 5.0 hprof still baffles. When "heap=dump" is selected, as far as I can tell, there's no way to dump allocation sites info, so I had assumed it wouldn't be collected at all (and that the 'depth' setting was thus irrelevant with heap=dump). I'll try the 'depth=0' next time I do a 'heap=dump' hprof run, to see if it helps.

              Kelly, I will take a look at the 5.0 JVM TI examples -- but in general, do you think that a heap dump based not on tracking every alloc/free but rather just tracing from roots at dump time is possible with 5.0 JVM TI?
              • 4. Re: Why does hprof=heap=dump have so much overhead?
                With hprof a 'trace' is a stack trace in the Java code that shows you where the Java object was allocated, the so called allocation site.

                I think the kind of trace you want is to follow the references so that you can determine what is preventing an object from being garbage collected. You should look at HAT or the newer jhat in JDK 6 for that abaility.

                Hprof has always dumped stack traces for the allocation sites, but you can't depend on a stack trace being available in all cases. If you look at the java.hprof.txt file you will
                see TRACE records, and references to those traces with something like trace=NNNNN.
                If the TRACE record is empty, that means that the object was likely allocated before the VM was fully initialized or perhaps during the VM shutdown, where getting stack traces is difficult if not impossible. For baby Hello world programs or small programs, you may see more empty traces than real traces, but on longer running apps most of the TRACE records should be there.

                In Tiger (JDK 5), look into jps (to find the VM pid), and 'jmap -heap PID', then compare that heap dump to the one coming from hprof, there will be no TRACE records, or maybe just one empty one that everyone refers to.

                Using jmap, the VM can be any old VM running, doesn't need to have been started up with any options, but does need to be running. The ' jmap -heap PID' command will tell you lots about the heap state.

                If you are a HAT user (http://hat.dev.java.net) you can use 'jmap -heap:format=b' and get a HPROF binary dump that HAT can read, then HAT allows you to use your browser to browse the heap and it's connections between objects. But again, there is no stack traces as to where the objects were allocated from, that's what HPROF will do and is probably a large part of the performance cost (getting a stack trace on every object allocation). But if you want to walk around object references, you should give HAT or jhat a try.

                Our goal in the future is to enable more 'on demand' type things like jmap, so that re-starting VM's with special options won't be as necessary in the future. The JDK 6 (Mustang at http://mustang.dev.java.net) includes what we call "on demand" agent loading, so that JVM TI agents (like hprof and debugger backends) could attach themselves to a running Java process, in the past VM agents could only be included in the VM process at VM startup.

                Occasionally we tweek hprof to add a new feature, but getting detailed heap profiling has been traditionally left to the Java heap profiling tool vendors, and there are many good ones out there. With the newer JVM TI in JDK 5+ the profiling tools have a whole new set of low level tools, so I expect them to start taking advantage of JVM TI soon if they haven't already and get off of that ancient JVMPI. ;^)

                • 5. Re: Why does hprof=heap=dump have so much overhead?
                  Aha, 'jmap -heap:format=b' is what I was expecting; I didn't realize that worked in 1.5. I had expected hprof=heap=dump to be analogous, because the hprof docs only indirectly imply that heap=dump also collects much of the heap=sites info, too.

                  Thanks for your help.

                  Alas, neither the heap dump from hprof nor later ones retrieved with jmap will open in 'hat'.

                  The one from hprof fails thus:
                  bin/hat ../OOM.hprof
                  Started HTTP server on port 7000
                  Reading from ../OOM.hprof...
                  Dump file created Sat Jan 24 18:35:34 PST 1970
                  java.io.IOException: Thread 200014 not found for JNI local ref
                  at hat.parser.HprofReader.getThreadObjectFromSequence(HprofReader.java:5 77)
                  at hat.parser.HprofReader.readHeapDump(HprofReader.java:379)
                  at hat.parser.HprofReader.read(HprofReader.java:221)
                  at hat.parser.Reader.readFile(Reader.java:90)
                  at hat.Main.main(Main.java:149)

                  ...while the ones from 'jmap' fail with...
                  bin/hat post-OOM-heap.bin
                  Started HTTP server on port 7000
                  Reading from post-OOM-heap.bin...
                  Dump file created Mon Nov 14 16:58:59 PST 2005
                  java.io.IOException: Stack trace not found for serial # 0
                  at hat.parser.HprofReader.getStackTraceFromSerial(HprofReader.java:611)
                  at hat.parser.HprofReader.readArray(HprofReader.java:777)
                  at hat.parser.HprofReader.readHeapDump(HprofReader.java:436)
                  at hat.parser.HprofReader.read(HprofReader.java:221)
                  at hat.parser.Reader.readFile(Reader.java:90)
                  at hat.Main.main(Main.java:149)

                  This rings some bells... I feel like I've tried 'hat' several times before and never gotten past this.

                  Can I presume this is because the hprof and jmap heap dumps, pre-6.0, aren't necessarily getting a consistent heap dump?
                  • 6. Re: Why does hprof=heap=dump have so much overhead?
                    If you can download the Mustang (JDK 6) binary build from http://mustang.dev.java.net, the jhat (new version of HAT) should work better on the second case. You only need to use the jhat out of JDK 6 in this case, no need to re-generate that file.

                    The first problem with the Thread not found is a bug in hprof that should be fixed in JDK 5 update 7 release (not out yet), and also in JDK 6. In dumping out the threads, hprof was using the "live thread" view, but should have been using a list of allocated java.lang.Thread objects. The problem happens when the hprof dump happens as threads are being terminated. Unfortunately the HAT code was rather particular about thread local objects refering to a thread object that wasn't in the thread list, and it was correct. This was one of the reasons why we wanted to include HAT as a permanent member of the JDK (as jhat in JDK 6) so that we could find and fix these kind of issues. Sorry about that, we didn't detect it in the JDK 5 release until late, and it seemed to only happen on applications with either lots of threads, or ones that are terminated by the VM shutdown maybe?

                    Can you try JDK 6?

                    • 7. Re: Why does hprof=heap=dump have so much overhead?
                      Heap dumps produced by jmap don't have object allocation site information. So, please use -stack=false option with HAT. With jhat and jmap of Mustang, this option is not needed anymore. There have been many bug fixes and feature enhancements in jhat (for eg. OQL - Object Query Language -- more details: http://blogs.sun.com/roller/page/sundararajan?entry=what_s_in_my_java). As Kelly mentioned, Mustang's jhat can parse heap dumps produced from older releases of JDK
                      • 8. Re: Why does hprof=heap=dump have so much overhead?
                        JHat doesn't seem to want to work with heap dumps produced by jmap from JDK 1.5.0.

                        When I try to run it (jhat from Mustang 70), I get the following error:

                        java.io.IOException: Unrecognized magic number: -2147483648
                        at com.sun.tools.hat.internal.parser.Reader.readFile(Reader.java:100)
                        at com.sun.tools.hat.Main.main(Main.java:163)
                        • 9. Re: Why does hprof=heap=dump have so much overhead?
                          For what it's worth, I'm still getting a substantially identical stack to the first one above with the latest binary JDK 6 release jdk-6-rc-bin-b71-linux-i586-09_feb_2006.bin.

                          java.io.IOException: Thread 200186 not found for JNI local ref
                          at com.sun.tools.hat.internal.parser.HprofReader.getThreadObjectFromSequence(HprofReader.java:622)
                          at com.sun.tools.hat.internal.parser.HprofReader.readHeapDump(HprofReader.java:443)
                          at com.sun.tools.hat.internal.parser.HprofReader.read(HprofReader.java:223)
                          at com.sun.tools.hat.internal.parser.Reader.readFile(Reader.java:91)
                          at com.sun.tools.hat.Main.main(Main.java:143)

                          I generated java.hprof on 1.5.0_05-83 running on Macintosh OS X, and tried running jhat on Linux x86. I've tried several times and have been unable to get any dump that jhat will read; they all fail with this problem.

                          Next move: I'll try running the whole system on JDK 6 and see if that works any better.