4 Replies Latest reply on Jun 19, 2011 10:32 PM by user10623291

    Application crashes in JNI layer on linux, runs fine on Solaris

      We have recently ported our C++ application from Solaris Sparc to Linux. The application uses JNI to access a third party java-based tool. The application runs fine on Solaris bu on Linux we get a SIGSEGV. The interesting part is that when we run a large volume of requests through and keep the process busy it works fine. As soon as we give it a little "break" and then submit another requests it crashes in libjvm with the call stack as below. Would it have something to do with garbage collection? Our application is 64 bit ans uses 64 bit JNI libraries. We are using Java 1.6 U26.

      Program received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 0x43002940 (LWP 9646)]
      0x00002aaaab35d0ed in resource_allocate_bytes(unsigned long) ()
      from .........jdk1.6.0_26/jre/lib/amd64/server/libjvm.so
      (gdb) where
      #0 0x00002aaaab35d0ed in resource_allocate_bytes(unsigned long) ()
      from .........jdk1.6.0_26/jre/lib/amd64/server/libjvm.so
      #1 0x00002aaaab428aeb in UNICODE::as_utf8(unsigned short*, int) ()
      from .......jdk1.6.0_26/jre/lib/amd64/server/libjvm.so
      #2 0x00002aaaab0c25ac in java_lang_String::as_utf8_string(oopDesc*, int, int) ()
      from .........jdk1.6.0_26/jre/lib/amd64/server/libjvm.so
      #3 0x00002aaaab0f9c71 in jni_GetStringUTFRegion ()
      from .........jdk1.6.0_26/jre/lib/amd64/server/libjvm.so
      #4 0x00002aaaaaac22ac in Java_java_lang_Class_forName0 ()
      from .........jdk1.6.0_26/jre/lib/amd64/libjava.so
      #5 0x00002aaac6b7103c in ?? ()
      #6 0x00000000edf454a0 in ?? ()
      #7 0x0000000000000000 in ?? ()
        • 1. Re: Application crashes in JNI layer on linux, runs fine on Solaris
          FYI, the crash seems to be occurring when executing Class.forName().
          • 2. Re: Application crashes in JNI layer on linux, runs fine on Solaris
            In C++ is relatively easy to write code which happens to work, but contains a bug never the less.

            If the JVM fails only when you load your JNI, it is because the JNI is corrupting the memory.
            It could be incorrectly handling Java Objects in a way which doesn't cause a problem on Solaris but does break on Linux x64. I would review how Objects are handled in the JNI layer.
            • 3. Re: Application crashes in JNI layer on linux, runs fine on Solaris
              This sounds like a garbage-collection issue, so I would make sure you have read and understood the sections of the JNI Specification starting with Referencing Java Objects, and make sure:

              - your JNI code complies with that in every* respect

              - it tests the result of every* JNI call via ExceptionOccurred(), ExceptionDescribe(), etc

              - checks zero returns where necessary

              and does not proceed blindly after an error or exception or zero return.

              The JVM side of JNI doesn't do any checking at all, it is entirely up to you (e.g. that you don't pass zero as a method descriptor).
              • 4. Re: Application crashes in JNI layer on linux, runs fine on Solaris
                Thanks everyone. You guys pushed me to do my homework and do some heavy debugging. First I found the -Xcheck:jni option very useful.
                I turns out quite a few things were wrong:
                1. One static method was being invoked as member method, not static:
                JNIEnv::CallLongMethod instead of JNIEnv::CallStaticLongMethod. Note that the program seemed to work just fine unil I turned on JNI checking.
                2. The java class handles were not globalized even though they were invoked from different threads and from different procedures after the procedure that obtained the handel has exited. The solution was to "globalize" the handle before saving it off:

                jclass clsid = myJNI_ENV->FindClass(QualifiedClassName);
                /* error checking here */
                jclass globid = (jclass)myJNI_ENV->NewGlobalRef(clsid);

                3. The objects that were accessed across different methods were not globalized. Similar to the class ids, I added NewGlobalRef right after the creation:

                jobject objid = myJNI_ENV->NewObject(classid, constructorMethodID
                jobject globobjid = myJNIENV->NewGlobalRef(objid);

                4. And finally, a multithreading issue. The java methods were being called from different threads. The application was coded in an attempt to handle that by calling
                JavaVM->AttachCurrentThread((void **)&mpoJNI_ENV, NULL)
                The problem was, detecting whether we were dealing with a new thread or not. Since the threads are being created completely outside our control in the middleware layer( Orbix, to be precise) we'd save and compare the thread IDs. The problem is, on linux, pthread_self() seems to always return the same ID even though we were dealing with an entirely different thread! We used linux-specific gettid() instead and it worked! The app stopped crashing and JNI check stopped complaining.