10 Replies Latest reply: May 27, 2011 3:44 AM by pele RSS

    Heisenberg effect - slowness of JVM execution that can't be profiled

    pele
      Hi all,

      I'm turning to this forum to ask if anyone had any similar problems or has any idea how to approach solving them. I already filed a bug report some time ago: [http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7030307] but since I haven't been able to create a test case to reproduce it, it was closed. I'll just paste the report here:

      FULL PRODUCT VERSION :
      java version "1.6.0_25-ea"
      Java(TM) SE Runtime Environment (build 1.6.0_25-ea-b01)
      Java HotSpot(TM) 64-Bit Server VM (build 20.0-b08, mixed mode)

      the same observed also with latest Java 7 beta:

      java version "1.7.0-ea"
      Java(TM) SE Runtime Environment (build 1.7.0-ea-b131)
      Java HotSpot(TM) 64-Bit Server VM (build 21.0-b02, mixed mode)


      FULL OS VERSION :
      SunOS nobiex1 5.10 Generic_141444-09 sun4v sparc SUNW,Sun-Blade-T6320

      EXTRA RELEVANT SYSTEM CONFIGURATION :
      32 GB of RAM

      A DESCRIPTION OF THE PROBLEM :
      We have an application that loads almost 20,000 classes, uses about 17 GB of heap (when stabilized) on each computer and is distributed over several computers. Lately we have observed that a slowness has creeped in the execution of the code. We haven't yet established the exact cause and time of when this started to happen, because when it was apparent that this is causing trouble, no action of reverting back application version or JVM helped to overcome the issue. We have ruled out GC pauses - we don't have GC problems. To illustrate the slowness: A simple HTTP request on an embedded Jetty engine that returns a static file served off a local filesystem takes about 2 seconds. What we did first was to manually insert instrumentation into the application code that measured real-time of the whole request processing and also CPU time (using java.lang.management.ThreadMXBean). We have found out that the thread that takes 2 seconds to process the simple local-file-serving request takes very litle CPU time. It seems that it must be suspended by something. We have not been able to tell what causes this suspension of the thread, since when we load Yourkit Java profiler agent into the JVM, the slowness dissapears. It's like Heisenberg effect - the measurement changes the world so that the problem dissapears. We have turned-on HotSpot compilation debug output and could not detect any suspicious behavior. We would like to submit a bug report that was reproducible, but I'm afraid we can't do that at the moment since we have been unable to isolate the problem yet. So if you have any idea about how to investigate this further, you can contact xxxxx@xxxxx . Just to "wet your curiosity" here are the JVM flags we are currently using:

      -Xmx26624m
      -XX:MaxPermSize=1024m
      -XX:+DisableExplicitGC
      -XX:+UseParallelGC
      -XX:ParallelGCThreads=32
      -XX:+UseParallelOldGC
      -XX:LargePageSizeInBytes=256m
      -XX:+UseLargePages
      -XX:+UseCompressedOops
      -XX:+MaxFDLimit
      -XX:ReservedCodeCacheSize=512m
      -agentpath:/app/tbill/thinkbill/system/yjp-9.5.4/bin/solaris-sparc-64/libyjpagent.so
      com.marand.thinkbill.system.Boot


      THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try

      THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Don't have exact steps yet

      REPRODUCIBILITY :
      This bug can be reproduced always.

      CUSTOMER SUBMITTED WORKAROUND :
      Loading Yourkit Java profiler agent into the JVM.
      Posted Date : 2011-03-23 16:45:40.0

      Submitted On 17-MAY-2011
      5er_levart
      I have some further information for you. We are still experiencing this. Even in the latest JDK 7 ea-142 build. We found out that attaching to JVM with jvisualvm and starting "Sampling CPU" profiling session early in the startup sequence of our APP inhibits the slow-down behavior even if later the jvisualvm is detatched from the JVM. If we attach the jvisualvm for the first time at a later time after APP start-up and the slow-down behavior already starts to build-up then attaching/profiling does not help preventing it. In either case the profiler only shows what is going on in the Java code and there's nothing unusual there. I think the problem will have to be tracked down by profiling the JVM internals. Can you provide some pointers on how one can approach that?


      Submitted On 18-MAY-2011
      5er_levart
      Further testing on our part with different platforms shows that this problem is not observed on the Linux/amd64 and not on Solaris/amd64 platforms. So it only happens on the Solaris/SPARC platform. We have observed the problem on different JVMs: (1.6.0_25, 7-ea-142) and different versions of OS: (Solaris 11 Express/SPARC, Solaris 10/SPARC).
        • 1. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
          EJP
          This sounds more like a network or file-system delay than a code problem.
          • 2. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
            802316
            I would use the released version rather than the "-ea" early access edition. It is possible there is some sort of extra checking overhead or minor bug which is impacting you.

            You say added instrumentation for you "simple local-file-serving request", what is the break down of where the time is spent. I don't image every stage of your code is running equals slow (and compare it with your YourKit run) Try to add time stamp samples which are a few milli-second apart. Even at micro-second timing I find this useful.
            • 3. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
              pele
              @EJP: No, it's not FS or Network related. I'm making the HTTP request measurement via the localhost interface (on the same machine) and get consistent results: about 2secs per HTTP request that returns a small file (1k) served of the local FS (UFS on Solaris 10 or ZFS on Solaris 11). It should be noted that this slowness disappears if Yourkit Java profiler agent is loaded into the JVM at startup (the same requests return in about 1ms) or if jvisualvm is attached to the JVM process early in the startup sequence of the JVM/APP and "Sampling CPU" profiling session is started.

              @Peter Lawrey: I did mention the same is happening on released 1.6.0_25 (and 1.6.0_24 and 1.6.0_23) and on latest 7 ea-142 prerelease. The problem with breaking down timings is that the "slowness" disappears when I try to profile it. So far I tried using two different Java profilers (Yourkit and jvisualvm/NetBeans) and both of them change the world enough so that the problem is gone!

              Currently I'm diving deep into the DTraceToolkit. I'll post when I have anything to report.
              • 4. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
                jschellSomeoneStoleMyAlias
                I would suppose that profiling is going to impact what hotspot does.

                You could manually insert timing into the code and log the results. Presuming that that doesn't impact the the problem you might find the source.
                We have an application that loads almost 20,000 classes, uses about 17 GB of heap
                20,000 classes isn't that much

                17 GB is. Presumably you are not claiming that the 20,000 classes by themselves leads to the 17 GB space because that would mean that the average class size is 850k. And if that is the case you have a serious design/implementation problem.

                So presuming it isn't that means you are loading class instances or in some other way increasing the data. Reducing that should still still produce the problem. And if it doesn't then it would suggest that the nature of the data itself is the problem.
                • 5. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
                  pele
                  jschell wrote:
                  I would suppose that profiling is going to impact what hotspot does.

                  You could manually insert timing into the code and log the results. Presuming that that doesn't impact the the problem you might find the source.
                  I rather went and studied something new: DTrace. A revolutionary tool.

                  >
                  We have an application that loads almost 20,000 classes, uses about 17 GB of heap
                  20,000 classes isn't that much

                  17 GB is. Presumably you are not claiming that the 20,000 classes by themselves leads to the 17 GB space because that would mean that the average class size is 850k. And if that is the case you have a serious design/implementation problem.
                  No, of course not. Classes take approx. 150 MB in PermGen. We have big caches (using Coherence) for data objects...

                  But let's get to the point. I think I pin-pointed calls in Java code, that take unusually long time to complete. How much time would you say may take the following Java method to complete:

                  Thread.currentThread()

                  Would you be surprised if it took more than 1 second on an otherwise idle system? Here's the proof. This is a jstack() output of one of such calls:

                  lwpcond_wait (1207496110 nanos):

                  libc.so.1`_lwp_cond_wait+0x54
                  libjvm.so`__1cCosNPlatformEventEpark6M_v_+0x108
                  libjvm.so`__1cKParkCommon6FpnJParkEvent_l_i_+0x50
                  libjvm.so`__1cHMonitorFILock6MpnGThread__v_+0xa4
                  libjvm.so`__1cHMonitorbClock_without_safepoint_check6MpnGThread__v_+0x8
                  libjvm.so`__1cUSafepointSynchronizeFblock6FpnKJavaThread__v_+0x1fc
                  libjvm.so`__1cKJavaThreadbScheck_safepoint_and_suspend_for_native_trans6Fp0_v_+0x150
                  libjvm.so`__1cKJavaThreadbOcheck_special_condition_for_native_trans6Fp0_v_+0x4
                  java/lang/System.currentTimeMillis()J*
                  com/tangosol/util/Base.getSafeTimeMillis()J*
                  0xffffffff60005ef8
                  com/tangosol/coherence/component/util/Daemon.onWait()V*
                  0xffffffff60005fdc
                  0xffffffff60005fdc
                  0xffffffff60006534
                  0xffffffff6000024c
                  libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x1f8
                  libjvm.so`__1cJJavaCallsEcall6FpnJJavaValue_nMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x34
                  libjvm.so`__1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_+0x148
                  libjvm.so`__1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_+0x48
                  libjvm.so`__1cMthread_entry6FpnKJavaThread_pnGThread__v_+0x140
                  libjvm.so`__1cKJavaThreadRthread_main_inner6M_v_+0x44
                  libjvm.so`java_start+0x17c
                  libc.so.1`_lwp_start


                  I think I've got enough stuff to file another bug report.
                  • 6. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
                    pele
                    Ooops, that was the wrong stack trace. Here's the one I was trying to show and affects the execution of the HTTP request:


                    lwpcond_wait (1185943336 nanos):

                    libc.so.1`_lwp_cond_wait+0x54
                    libjvm.so`__1cCosNPlatformEventEpark6M_v_+0x108
                    libjvm.so`__1cKParkCommon6FpnJParkEvent_l_i_+0x50
                    libjvm.so`__1cHMonitorFILock6MpnGThread__v_+0xa4
                    libjvm.so`__1cHMonitorbClock_without_safepoint_check6MpnGThread__v_+0x8
                    libjvm.so`__1cUSafepointSynchronizeFblock6FpnKJavaThread__v_+0x1fc
                    libjvm.so`__1cKJavaThreadbScheck_safepoint_and_suspend_for_native_trans6Fp0_v_+0x150
                    libjvm.so`JVM_CurrentThread+0xe8
                    java/lang/Thread.currentThread()Ljava/lang/Thread;*
                    0xffffffff60005e60
                    0xffffffff60005fdc
                    0xffffffff60005fdc
                    0xffffffff60006404
                    org/mortbay/jetty/HttpParser.parseNext()J*
                    0xffffffff60005ef8
                    0xffffffff60006450
                    0xffffffff60006534
                    0xffffffff60006534
                    StubRoutines (1)
                    libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x1f8
                    libjvm.so`__1cJJavaCallsEcall6FpnJJavaValue_nMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x34
                    libjvm.so`__1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_+0x148
                    libjvm.so`__1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_+0x48
                    libjvm.so`__1cMthread_entry6FpnKJavaThread_pnGThread__v_+0x140
                    libjvm.so`__1cKJavaThreadRthread_main_inner6M_v_+0x44
                    libjvm.so`java_start+0x17c
                    libc.so.1`_lwp_start


                    ... but even System.currentTimeMillis() should not take as much as 1 second or more to complete. All these are native methods and there's some synchronization going on in the native world of JVM that blocks execution of those "innocent" Java platform methods for more than a second...
                    • 7. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
                      802316
                      As I said you should do your own custom profiling. Take a time stamp as key phases in your application and compare the timing of each section. no YourKit or VisualVM and also try it with profiling on for comparison. Getting a DTrace stack trace is interesting, but why not take a `jstack` stack trace which will give you information about thread and locks (e.g. if one thread is waiting for another)
                      • 8. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
                        pele
                        Peter Lawrey wrote:
                        As I said you should do your own custom profiling. Take a time stamp as key phases in your application and compare the timing of each section. no YourKit or VisualVM and also try it with profiling on for comparison. Getting a DTrace stack trace is interesting, but why not take a `jstack` stack trace which will give you information about thread and locks (e.g. if one thread is waiting for another)
                        I think I already identified "section" of code that slows down the simple local file serving HTTP request (I can't insert timing statements in Jetty easily without modifying it's code). I used the following DTrace script which prints jstack() at every return from a call to libc.so library that lasts for more than 1100 milliseconds:


                        #!/usr/sbin/dtrace -Zs

                        /*
                        * Usage:
                        * 1. trace_big_call_stacks.d -c "java ..."
                        * 2. trace_big_call_stacks.d -p JAVA_PID
                        *
                        * This script traces call stacks of calls to libc.so library that take more than 1100 msecs...
                        *
                        */

                        #pragma D option quiet
                        #pragma D option destructive
                        #pragma D option defaultargs
                        #pragma D option bufsize=16m
                        #pragma D option aggrate=100ms


                        BEGIN
                        {
                        SAMPLE_NAME = "trace big call stacks";

                        printf("BEGIN %s\n\n", SAMPLE_NAME);
                        }

                        pid$target:libc.so::entry
                        {
                        self->start[probefunc] = timestamp;
                        }

                        pid$target:libc.so::return
                        /self->start[probefunc] && timestamp - self->start[probefunc] > 1100000000/
                        {
                        this->duration = timestamp - self->start[probefunc];
                        @quants[probefunc] = quantize(this->duration);

                        printf("\n%s (%d nanos):\n", probefunc, this->duration);

                        jstack(50,4096);
                        }

                        pid$target:libc.so::return
                        /self->start[probefunc]/
                        {

                        self->start[probefunc] = 0;
                        }

                        :::END
                        {
                        printf("\nDistributions of time by function:\n");
                        printa(@quants);

                        printf("\nEND %s\n", SAMPLE_NAME);
                        }


                        ...among Java platform calls to native methods that are expected to block (Object.wait, PlainDatagramSocketImpl.receive0, etc.) this script also prints out some Java platform calls to native methods, that are normally expected to return in an instant, but as the jstack above shows, block for more than a second. As these calls occur during HTTP request processing, they incur large slowdowns of request execution.

                        To debug this further, I would need some tool that examines the state of the LWP mutex in question (which thread holds the mutex and why). Java profilers only show Thread contention that occurs in Java code. Do you know of a tool, that does this for native Solaris LWP mutexes too? Could DTrace also help here? I'll see...
                        • 9. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
                          pele
                          Just informing the thread that this issue is being tracked now as bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7046791

                          JVM is periodically doing cleanup (not GC) for some internal structures and it does that while in the so called "safepoint" mode (any thread that tries to interfere is blocked). It's almost like GC. The cleanup task that takes so long to process is "deflating idle monitors". It turns out that in our application it takes about 1/2 sec to do this cleanup and then there is only a very short window where other threads are allowed to progress before JVM drops into "safepoint" mode again...

                          It seems that JVM is hitting performance limitations with our app. We are going to investigate what part of our (or our vendor's) code is using monitors in such a way that results in these anomalies. I'll post here (and in bug report) what I find.
                          • 10. Re: Heisenberg effect - slowness of JVM execution that can't be profiled
                            pele
                            Jut informing that we solved the problem by specifying the following JVM option: -XX:MonitorBound=16384 (new in 1.6.0_23+) which instructs JVM to trigger monitor deflation as soon as the number of monitors exceeds the bound which actually increases the frequency of deflations at the time the allocation rate is high but keeps the number of allocated monitors bounded which has a positive effect on our app performance since the monitor allocation rate is high only at the boot-up time and later drops to "normal" together with the frequency of deflations (approx. 1/sec) which don't last for 1/2 sec or more so that their effect on overall app performance is negligible.

                            Thanks for listening.