5 Replies Latest reply: Dec 26, 2008 1:02 PM by 843829 RSS

    JDK 1.6 on Solaris. Multiple java processes and thread freezes

    843829
      Hi, we've come across a really weird behavior on the Solaris JVM, reported by a customer of ours.
      Our server application consists of multiple threads. Normally we see them all running within a single Java process, and all is fine.

      At some point in time, and only on Solaris 10, it seems that the main Java process starts a second Java process. This is not our code trying to execute some other application/command. It's the JVM itself forking a new copy of itself. I assumed this was because of some JVM behaviour on Solaris that uses multiple processes if the number of threads is > 128. However at the time of spawn there are less than 90 threads running.

      In any case, once this second process starts, some of the threads of the application (incidentally, they're the first threads created by the application at startup, in the first threadgroup) stop working. Our application dumps a list of all threads in the system every ten minutes, and even when they're not working, the threads are still there. Our logs also show that when the second process starts, these threads were not in the running state. They had just completed their operations and were sleeping in their thread pool, in a wait() call. Once the second process starts, jobs for these threads just queue up, and the wait() does not return, even after another thread has done a notify() to inform them of the new jobs.

      Even more interesting, when the customer manually kills -9 the second process, without doing anything in our application, all threads that were 'frozen' start working again, immediately. This (and the fact that this never happens on other OSes) makes us think that this is some sort of problem (or misconfiguration) specific to the Solaris JVM, and not our application.

      The customer initially reported this with JDK 1.5.0_12 , we told them to upgrade to the latest JDK 1.6 update 6, but the problem remains. There are no special JVM switches (apart from -Xms32m -Xmx256m) used. We're really at a dead end here in diagnosing this problem, as it clearly seems to be outside our app. Any suggestion?
        • 1. Re: JDK 1.6 on Solaris. Multiple java processes and thread freezes
          843829
          Hi,

          I am seeing this exact same behavior with JRE 1.5.0_12 on Solaris 10. Is there a fix for this problem?

          Thanks in advance!

          Regards.
          • 2. Re: JDK 1.6 on Solaris. Multiple java processes and thread freezes
            843829
            ...We're really at a dead end here in diagnosing this problem, as it clearly seems to be outside our app. Any suggestion?
            This is the first I am hearing of a problem of such a bizarre nature.

            What did truss reveal as to the identifty of the thread(s) forking off the new process(es)
            and the circumstances under which this was happening? What did ptree reveal as to
            the ancestry of the descendant processes? For a sampling of "parent-child" pairs
            in this tree/chain, could you post the complete pstack of the two processes?

            Do you have a test case or a reproducible set-up, which you may be able to take to
            your Sun support representative and open a ticket. Please make sure to quote the
            exact version of Solaris 10 you were using when this phenomenon was observed.

            thanks.
            • 3. Re: JDK 1.6 on Solaris. Multiple java processes and thread freezes
              843829
              I have had this issue for a while, and have not been able to solve the issue.

              I have raised a number of incidents @ Sun for this but unfortunetly these are just ignored and I don't get any feedback.

              Has anyone got anywhere with this issue?
              • 4. Re: JDK 1.6 on Solaris. Multiple java processes and thread freezes
                843829
                We're also seeing this on a Solaris 10 machine. Our Java process sits for a long time, doing the same thing (which is sending UDP broadcasts and listening for a response), and at some point many hours later another process is forked from the JVM. This new process has the original JVM as its parent PID, and has not consumed any CPU (ps reports CPU usage of 0:00). The command line of the second process appears to be identical, indicating this is just a plain fork, and not an exec.
                • 5. Re: JDK 1.6 on Solaris. Multiple java processes and thread freezes
                  843829
                  Actually, we've discovered that that's not really what was going on. I still believe there's a bug in the JVM, but the fork was happening because our Java code tries to exec a command line tool once a minute. After hours of this, we get a rogue child process with this stack (which is where we are forking this command line tool once a minute):
                  JVM version is 1.5.0_08-b03 
                  Thread t@38: (state = IN_NATIVE)
                   - java.lang.UNIXProcess.forkAndExec(byte[], byte[], int, byte[], int, byte[], boolean, java.io.FileDescriptor, java.io.FileDescriptor, java.io.FileDescriptor) @bci=168980456 (Interpreted frame)
                   - java.lang.UNIXProcess.forkAndExec(byte[], byte[], int, byte[], int, byte[], boolean, java.io.FileDescriptor, java.io.FileDescriptor, java.io.FileDescriptor) @bci=0 (Interpreted frame)
                   - java.lang.UNIXProcess.<init>(byte[], byte[], int, byte[], int, byte[], boolean) @bci=62, line=53 (Interpreted frame)
                   - java.lang.ProcessImpl.start(java.lang.String[], java.util.Map, java.lang.String, boolean) @bci=182, line=65 (Interpreted frame)
                   - java.lang.ProcessBuilder.start() @bci=112, line=451 (Interpreted frame)
                   - java.lang.Runtime.exec(java.lang.String[], java.lang.String[], java.io.File) @bci=16, line=591 (Interpreted frame)
                   - java.lang.Runtime.exec(java.lang.String, java.lang.String[], java.io.File) @bci=69, line=429 (Interpreted frame)
                   - java.lang.Runtime.exec(java.lang.String) @bci=4, line=326 (Interpreted frame)
                  ...
                   - java.lang.Thread.run() @bci=11, line=595 (Interpreted frame)
                  There are also several dozen other threads all with the same stack:
                  Thread t@32: (state = BLOCKED)
                  Error occurred during stack walking:
                  sun.jvm.hotspot.debugger.DebuggerException: can't map thread id to thread handle!
                       at sun.jvm.hotspot.debugger.proc.ProcDebuggerLocal.getThreadIntegerRegisterSet0(Native Method)
                       at sun.jvm.hotspot.debugger.proc.ProcDebuggerLocal.getThreadIntegerRegisterSet(ProcDebuggerLocal.java:364)
                       at sun.jvm.hotspot.debugger.proc.sparc.ProcSPARCThread.getContext(ProcSPARCThread.java:35)
                       at sun.jvm.hotspot.runtime.solaris_sparc.SolarisSPARCJavaThreadPDAccess.getCurrentFrameGuess(SolarisSPARCJavaThreadPDAccess.java:108)
                       at sun.jvm.hotspot.runtime.JavaThread.getCurrentFrameGuess(JavaThread.java:252)
                       at sun.jvm.hotspot.runtime.JavaThread.getLastJavaVFrameDbg(JavaThread.java:211)
                       at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:50)
                       at sun.jvm.hotspot.tools.JStack.run(JStack.java:41)
                       at sun.jvm.hotspot.tools.Tool.start(Tool.java:204)
                       at sun.jvm.hotspot.tools.JStack.main(JStack.java:58)
                  I'm pretty sure this is because the fork part of the UnixProcess.forkAndExec is using the Solaris fork1 system call, and thus all the Java context thinks all those threads exist, whereas the actual threads don't exist in that process.

                  It seems to me that something is broken in UnixProcess.forkAndExec in native code; it did the fork, but not the exec, and this exec thread just sits there forever. And of course, it's still holding all the file descriptors of the original process, which means that if we decide to restart our process, we can't reopen our sockets for listening or whatever else we want to do.

                  There is another possibility, which I can't completely rule out: this child process just happened to be the one that was fork'd when the parent process called Runtime.halt(), which is how the Java process exits. We decided to exit halfway through a Runtime.exec(), and got this child process stuck. But I don't think that's what happens... from what I understand that we collected, we see this same child process created at some point in time, and it doesn't go away.

                  Yes, I realize that my JVM is very old, but I cannot find any bug fixes in the release notes that claim to fix something like this. And since this only happens once every day or two, I'm reluctant to just throw a new JVM at this--although I'm sure I will shortly.

                  Has anyone else seen anything like this?