6 Replies Latest reply: Sep 4, 2013 1:28 PM by keithk2 RSS

    How can a JVM terminate with an exit code of 141 and no other diagnostics?

    keithk2

      Hello,

       

      We are encountering a JVM process that dies with little explanation other than an exit code of 141. No hotspot error file (hs_err_*) or crash dump.  To date, the process runs anywhere from 30 minutes to 8 days before the problem occurs. The last application log entry is always the report of a lost SSL connection, the result of an thrown SSLException.  (The exception itself is unavailable at this time – the JVM dies before it is logged -- working on that.)

       

      How can a JVM produce an exit code of 141, and nothing else?  Can anyone suggest ideas for capturing additional diagnostic information?  Any help would be greatly appreciated!  Environment and efforts to date are described below.

       

      Thanks,

       

      -KK

       

       

      Host machine: 8x Xeon server with 256GB memory, RHEL 6 (or RHEL 5.5) 64-bit

      Java: Oracle Java SE 7u21 (or 6u26)

       

      java version "1.7.0_21"

      Java(TM) SE Runtime Environment (build 1.7.0_21-b11)

      Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

       

      JVM arguments:

      -XX:+UseConcMarkSweepGC

      -XX:+CMSIncrementalMode

      -XX:+CMSClassUnloadingEnabled

      -XX:MaxPermSize=256m

      -XX:NewSize=64m

      -Xms128m

      -Xmx1037959168

      -Djava.awt.headless=true

      -Djava.security.egd=file:///dev/./urandom

       

      Diagnostics attempted to date:

      1. LD_PRELOAD=libjsig.so.   A modified version of libjsig.so was created to report all signal handler registrations and to report SIGPIPE signals received.  (Exit code 141 could be interpreted as 128+SIGPIPE(13).)  No JNI libraries are registering any signal handlers, and no SIGPIPE signal is reported by the library for the duration of the JVM run.  Calls to ::exit() are also intercepted and reported.  No call to exit() is reported.
      2. Inspect /var/log/messages for any indication that the OS killed the process, e.g. via the Out Of Memory (OOM) Killer.  Nothing found.
      3. Set ‘ulimit –c unlimited’, in case the default limit of 0 (zero) was preventing a core file from being written.  Still no core dump.
      4. ‘top’ reports the VIRT size of the process can grow to 20GB or more in a matter of hours, which is unusual compared to other JVM processes.  The RES (resident set size) does not grow beyond about 375MB, however, which is an considered normal.

        This JVM process creates many short-lived Thread objects by way of a thread pool, averaging 1 thread every 2 seconds, and these objects end up referenced only by a Weak reference.   The CMS collector seems lazy about collecting these, and upwards of 2000 Thread objects have been seen (in heap dumps) held only by Weak references.  (The Java heap averages about 100MB, so the collector is not under any pressure.) However, a forced collection (via jconsole) cleans out the Thread objects as expected.  Any relationship of this to the VIRT size or the JVM disappearance, however, cannot be established.

        The process also uses NIO and direct buffers, and maintains a DirectByteBuffer cache. There is some DirectByteBuffer churn. MBeans report stats like:

        Direct buffer pool: allocated=669 (20,824,064 bytes), released=665 (20,725,760), active=4 (98,304)  [note: equals 2x 32K buffers and 2x 16K buffers]
        java.nio.BufferPool > direct: Count=18, MemoryUsed=1343568, TotalCapacity=1343568

        These numbers appear normal and also do not seem to correlate with the VIRT size or the JVM disappearance.
        • 1. Re: How can a JVM terminate with an exit code of 141 and no other diagnostics?
          jschellSomeoneStoleMyAlias

          >How can a JVM produce an exit code of 141, and nothing else?

           

          Because it executes some java code that does the following and nothing else.

           

          Runtime.getRuntime().exit(141)

          • 2. Re: How can a JVM terminate with an exit code of 141 and no other diagnostics?
            keithk2

            That's correct, but my code contains no such call, and neither does the JVM, as far as I can see (searching OpenJDK source).  And even if the code existed, it's not being executed, or the LD_PRELOAD library would report it.  For example, running the following application ..

             

            public class GoodbyeWorld {

                public static void main(String[] args) {

                    Runtime.getRuntime().exit(141);

                }

            }

             

            .. produces a diagnostic of ..

             

            JSIG: exit(141) called

            JSIG: Call stack has 10 frames:

            JSIG: /opt/rxadvantage/lib/linux-amd64/libjsigdebug.so [0x2b5484f87bff]

            JSIG: /opt/rxadvantage/lib/linux-amd64/libjsigdebug.so(exit+0x29) [0x2b5484f88a04]

            JSIG: /opt/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so [0x2b548590db67]

            JSIG: /opt/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so [0x2b5485c6e6cc]

            JSIG: /opt/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so [0x2b5485c6d0e0]

            JSIG: /opt/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so [0x2b5485c6d666]

            JSIG: /opt/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so [0x2b5485c6dd00]

            JSIG: /opt/jdk1.7.0_21/jre/lib/amd64/server/libjvm.so [0x2b5485b07010]

            JSIG: /lib64/libpthread.so.0 [0x34ab00683d]

            JSIG: /lib64/libc.so.6(clone+0x6d) [0x34aa0d4f8d]

             

             

            So, more precisely, my question is, given a JVM on a RHEL6 platform which is running an application that does not call exit(), what can cause it to abort with an exit code of 141, bypass the JVM's exception handler, and not produce an entry in the system log, a system error message, a heap dump, or any other artifacts that normally accompany a severe JVM crash or shutdown?

            • 3. Re: How can a JVM terminate with an exit code of 141 and no other diagnostics?
              jschellSomeoneStoleMyAlias

              JNI code that calls a OS system exit() API method would also produce the 141 exit code.

              • 4. Re: How can a JVM terminate with an exit code of 141 and no other diagnostics?
                keithk2

                True, but the JNI call would still be reported by the LD_PRELOAD intercept, unless the native code could somehow circumvent that.  Using a test similar to GoodbyeWorld (shown below), I verified that the JNI call to exit() is reported.  In the failure case, no call to exit() is reported.

                 

                Can an OS (or a manual) 'kill' specify an exit code?  Where could "141" be coming from?

                 

                Thanks,

                 

                -K2

                 

                === GoodbyeWorldFromJNI.java ===

                package com.attachmate.test;

                public class GoodbyeWorldFromJNI

                {

                    public static final String LIBRARY_NAME = "goodbye";

                    static {

                        try {

                            System.loadLibrary(LIBRARY_NAME);

                        } catch (UnsatisfiedLinkError error) {

                            System.err.println("Failed to load " + System.mapLibraryName(LIBRARY_NAME));

                        }

                    }

                 

                    private static native void callExit(int exitCode);

                 

                    public static void main(String[] args) {

                        callExit(141);

                    }

                }

                 

                === goodbye.c ===

                #include <stdlib.h>

                #include "goodbye.h"  // javah generated header file

                 

                JNIEXPORT void JNICALL Java_com_attachmate_test_GoodbyeWorldFromJNI_callExit

                  (JNIEnv *env, jclass theClass, jint exitCode)

                {

                    exit(exitCode);

                }

                 

                === script.sh ===

                #!/bin/bash -v

                uname -a

                export PATH=/opt/jre1.7.0_25/bin:$PATH

                java -version

                pwd

                LD_PRELOAD=./lib/linux-amd64/libjsigdebug.so java -classpath classes -Djava.library.path=lib/linux-amd64 com.attachmate.test.GoodbyeWorldFromJNI > stdout.txt

                echo $?

                tail stdout.txt

                 

                === script output ===

                [keithk@keithk-RHEL5-dev goodbyeJNI]$ ./script.sh

                #!/bin/bash -v

                uname -a

                Linux keithk-RHEL5-dev 2.6.18-164.2.1.el5 #1 SMP Mon Sep 21 04:37:42 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

                export PATH=/opt/jre1.7.0_25/bin:$PATH

                java -version

                java version "1.7.0_25"

                Java(TM) SE Runtime Environment (build 1.7.0_25-b15)

                Java HotSpot(TM) 64-Bit Server VM (build 23.25-b01, mixed mode)

                pwd

                /tmp/goodbyeJNI

                LD_PRELOAD=./lib/linux-amd64/libjsigdebug.so java -classpath classes -Djava.library.path=lib/linux-amd64 com.attachmate.test.GoodbyeWorldFromJNI > stdout.txt

                echo $?

                141

                tail stdout.txt

                 

                JSIG: exit(141) called

                JSIG: Call stack has 4 frames:

                JSIG: ./lib/linux-amd64/libjsigdebug.so [0x2b07dc1bdc2f]

                JSIG: ./lib/linux-amd64/libjsigdebug.so(exit+0x29) [0x2b07dc1bea41]

                JSIG: /tmp/goodbyeJNI/lib/linux-amd64/libgoodbye.so [0x2aaab3e82547]

                JSIG: [0x2aaaab366d8e]       

                === ===

                • 5. Re: How can a JVM terminate with an exit code of 141 and no other diagnostics?
                  keithk2

                  The Linux 'strace' utility reports:

                  17:45:52.333755 +++ killed by SIGPIPE +++

                   

                  So exit code 141 apparently is from SIGPIPE (13)  + 128.

                   

                  A workaround for this might be to set the socket option MSG_NOSIGNAL (on Linux; equivalent is SO_NOSIGPIPE on BSD UNIX), but the Java Socket implementation doesn't support changing this socket option.  Does the JVM internally manipulate this option?  Is there any way it can be set on a Socket object?

                  • 6. Re: How can a JVM terminate with an exit code of 141 and no other diagnostics?
                    keithk2

                    We've discovered that a native library loaded via a PAM configuration is replacing the JVM's signal handler for SIGPIPE, reverting to the default handler.  How this library is circumventing libjsig.so is TBD.