3 Replies Latest reply on Feb 26, 2008 3:36 PM by Linda Lee-Oracle

    NPE during environment startup (recovery)

      While testing our app, we tested JVM failure by killing the process rather forcefully using kill -9. We brought our app back up and saw the following errors. I'm not sure what to make of this:

      22 Feb 2008 10:47:36,440 (:com.foo.bar.app.TomcatListener) ERROR Error initializing Store
      com.foo.bar.ApplicationException: Failed to initialize BDB Environment

      Caused by: com.sleepycat.je.DatabaseException: (JE 3.2.68) last LSN=0x0/0x1d2
      at com.sleepycat.je.recovery.RecoveryManager.traceAndThrowException(RecoveryManager.java:2365)
      at com.sleepycat.je.recovery.RecoveryManager.redoLNs(RecoveryManager.java:1182)
      at com.sleepycat.je.recovery.RecoveryManager.buildTree(RecoveryManager.java:440)
      at com.sleepycat.je.recovery.RecoveryManager.recover(RecoveryManager.java:153)
      at com.sleepycat.je.dbi.EnvironmentImpl.<init>(EnvironmentImpl.java:348)
      at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:102)
      at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:54)
      at com.sleepycat.je.Environment.<init>(Environment.java:103)
      Caused by: java.lang.NullPointerException
      at com.sleepycat.je.utilint.DbLsn.compareTo(DbLsn.java:76)
      at com.sleepycat.je.recovery.RecoveryManager.redoLNs(RecoveryManager.java:1081)
      ... 34 more

      I have the following questions:
      * What is causing the NPE? Is this something that you'd expect from a JVM sudden death scenario?
      * If I should "expect" this sort of thing under those failure conditions, how (during startup/recovery) can I detect that this will happen (aside from catching a fairly arbitrary DatabaseException or NPE)?
      * Assuming I know how to detect this at startup/recovery, how can I gracefully get going again?

        • 1. Re: NPE during environment startup (recovery)
          Linda Lee-Oracle
          Our apologies, this is our bug. This can only happen if JE is killed off when the log is less than 605 bytes, before the standard first checkpoint is written. JE should never fail at recovery time in this way.

          In this case, the log has no user data at all, and only holds the beginning of JE internal structures. You can safely remove the log file to restart in this case. We'll also contact you shortly with a fix for the problem. The fix is trivial, frankly all the time will be in writing a unit test.

          • 2. Re: NPE during environment startup (recovery)
            Thanks for the quick reply. Is there a workaround available? Maybe a config value we can set to avoid this scenario?

            • 3. Re: NPE during environment startup (recovery)
              Linda Lee-Oracle
              Is there a workaround
              available? Maybe a config value we can set to avoid
              this scenario?
              No, I'm afraid there's no workaround. However, the way to recognize this case is that the log has less than 1345 bytes (not the 605 I reported earlier), and in the case, can be moved aside with no loss of data.

              The next patch release will contain this fix, and it will be labeled in the change log with the tag [#16016]