This discussion is archived
14 Replies Latest reply: May 15, 2012 3:54 PM by greybird RSS

EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown

909806 Newbie
Currently Being Moderated
An application server was accidentally rebooted and so my BDB environment wasn't closed cleanly, and now I'm getting the following exception. Is there a way to recover from this? (No other process is likely to have deleted any files.)

java.lang.IllegalMonitorStateException
     at java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryReleaseShared(ReentrantReadWriteLock.java:363)
     at java.util.concurrent.locks.AbstractQueuedSynchronizer.releaseShared(AbstractQueuedSynchronizer.java:1317)
     at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.unlock(ReentrantReadWriteLock.java:745)
     at com.sleepycat.je.latch.SharedLatch.release(SharedLatch.java:191)
     at com.sleepycat.je.tree.IN.releaseLatch(IN.java:396)
     at com.sleepycat.je.recovery.RecoveryManager.replaceOrInsert(RecoveryManager.java:1723)
     at com.sleepycat.je.recovery.RecoveryManager.replayOneIN(RecoveryManager.java:976)
     at com.sleepycat.je.recovery.RecoveryManager.readINs(RecoveryManager.java:846)
     at com.sleepycat.je.recovery.RecoveryManager.buildINs(RecoveryManager.java:621)
     at com.sleepycat.je.recovery.RecoveryManager.buildTree(RecoveryManager.java:506)
     at com.sleepycat.je.recovery.RecoveryManager.recover(RecoveryManager.java:175)
     at com.sleepycat.je.dbi.EnvironmentImpl.finishInit(EnvironmentImpl.java:529)
     at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:204)
     at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:230)
     at com.sleepycat.je.Environment.<init>(Environment.java:212)
     at com.sleepycat.je.Environment.<init>(Environment.java:166)
     at voldemort.store.bdb.BdbStorageConfiguration.getEnvironment(BdbStorageConfiguration.java:170)
     at voldemort.store.bdb.BdbStorageConfiguration.getStore(BdbStorageConfiguration.java:122)
     at voldemort.server.storage.StorageService.startInner(StorageService.java:203)
     at voldemort.server.AbstractService.start(AbstractService.java:62)
     at voldemort.server.VoldemortServer.startInner(VoldemortServer.java:214)
     at voldemort.server.AbstractService.start(AbstractService.java:62)
     at voldemort.server.VoldemortServer.main(VoldemortServer.java:261)
Exception in thread "main" voldemort.store.StorageInitializationException: com.sleepycat.je.EnvironmentFailureException: (JE 4.0.92) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 4.0.92) /data/voldemort/bdb fetchTarget of 0x9eea1/0x1945f9d parent IN=3061966 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x9eed7/0x372284d parent.getDirty()=false state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
     at voldemort.store.bdb.BdbStorageConfiguration.getStore(BdbStorageConfiguration.java:136)
     at voldemort.server.storage.StorageService.startInner(StorageService.java:203)
     at voldemort.server.AbstractService.start(AbstractService.java:62)
     at voldemort.server.VoldemortServer.startInner(VoldemortServer.java:214)
     at voldemort.server.AbstractService.start(AbstractService.java:62)
     at voldemort.server.VoldemortServer.main(VoldemortServer.java:261)
Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 4.0.92) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 4.0.92) /data/voldemort/bdb fetchTarget of 0x9eea1/0x1945f9d parent IN=3061966 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x9eed7/0x372284d parent.getDirty()=false state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
     at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:197)
     at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1403)
     at com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1279)
     at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:245)
     at com.sleepycat.je.log.SyncedLogManager.flushInternal(SyncedLogManager.java:49)
     at com.sleepycat.je.log.LogManager.flush(LogManager.java:1105)
     at com.sleepycat.je.dbi.EnvironmentImpl.finishInit(EnvironmentImpl.java:539)
     at com.sleepycat.je.dbi.DbEnvPool.getEnvironment(DbEnvPool.java:204)
     at com.sleepycat.je.Environment.makeEnvironmentImpl(Environment.java:230)
     at com.sleepycat.je.Environment.<init>(Environment.java:212)
     at com.sleepycat.je.Environment.<init>(Environment.java:166)
     at voldemort.store.bdb.BdbStorageConfiguration.getEnvironment(BdbStorageConfiguration.java:170)
     at voldemort.store.bdb.BdbStorageConfiguration.getStore(BdbStorageConfiguration.java:122)
     ... 5 more
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 4.0.92) /data/voldemort/bdb fetchTarget of 0x9eea1/0x1945f9d parent IN=3061966 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x9eed7/0x372284d parent.getDirty()=false state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1241)
     at com.sleepycat.je.tree.IN.findParent(IN.java:2704)
     at com.sleepycat.je.tree.Tree.getParentINForChildIN(Tree.java:879)
     at com.sleepycat.je.tree.Tree.getParentINForChildIN(Tree.java:809)
     at com.sleepycat.je.recovery.RecoveryManager.replaceOrInsertChild(RecoveryManager.java:2031)
     at com.sleepycat.je.recovery.RecoveryManager.replaceOrInsert(RecoveryManager.java:1703)
     at com.sleepycat.je.recovery.RecoveryManager.replayOneIN(RecoveryManager.java:976)
     at com.sleepycat.je.recovery.RecoveryManager.readINs(RecoveryManager.java:846)
     at com.sleepycat.je.recovery.RecoveryManager.buildINs(RecoveryManager.java:621)
     at com.sleepycat.je.recovery.RecoveryManager.buildTree(RecoveryManager.java:506)
     at com.sleepycat.je.recovery.RecoveryManager.recover(RecoveryManager.java:175)
     at com.sleepycat.je.dbi.EnvironmentImpl.finishInit(EnvironmentImpl.java:529)
     ... 11 more
Caused by: java.io.FileNotFoundException: /data/voldemort/bdb/0009eea1.jdb (No such file or directory)
     at java.io.RandomAccessFile.open(Native Method)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
     at com.sleepycat.je.log.FileManager$1.<init>(FileManager.java:993)
     at com.sleepycat.je.log.FileManager.openFileHandle(FileManager.java:992)
     at com.sleepycat.je.log.FileManager.getFileHandle(FileManager.java:888)
     at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:1073)
     at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:779)
     at com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:743)
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1225)
     ... 22 more
  • 1. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    greybird Expert
    Currently Being Moderated
    Hi,

    Sorry to hear you've run into this. I see you're using JE 4.0.92 and the current release of JE is 5.0. I can find the following fixes in the change logs
    http://www.oracle.com/technetwork/database/berkeleydb/downloads/index-098622.html
    since that release, that may possibly cause the problem you've reported.

    4.0.117 (latest 4.0 patch release) -- I see two potentially related fixes, [#17252] and [#18663], but they both have to do with duplicates, and your stack trace does not indicate that duplicates are being processed. If your app has a database configured for duplicates, then it may be possible that these bugs are applicable, but it is difficult to know for sure. See the change log for details.

    4.1.17 (latest 4.1 patch release) -- [#19160] (only possible if using HA/replication) and [#20732] (see change log for bug signature) are potential fixes.

    If you upgrade to the latest JE release (which is strongly advised), be sure to see the upgrade notes in the JE 5.0 change log.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  • 2. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    909806 Newbie
    Currently Being Moderated
    Thanks for the information. Yes the database is configured for duplicates. BTW it's being used inside of Voldemort. (That's also the reason I'm currently on the old version of BDB--I don't believe they've tested with a newer version yet, and ran into a potential performance problem in the 4.1 series--see BerkeleyDB Je 4.0.90 to 4.1.7 update yields to increased put latency But I could potentially update my version.

    Due to the usage of duplicates, I believe that I can't upgrade to 5.0 without doing a data conversion first, right?

    So what's the best way forward? Do you expect that upgrading to a newer version will fix the current problem? (That is, I can't quite tell if the newer versions would allow me to recover at this point, or if they would just have prevented the problem initially and it's too late.) Failing that, is there a way to recover the data, or the majority of it? (I have a backup that's fairly recent, but of course it would be lacking some data.)

    Thanks again.
  • 3. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    909806 Newbie
    Currently Being Moderated
    Forgot to mention: I'm not using HA/replication.

    Also it could be relevant: That IllegalMonitorStateException at the beginning of the stack track happend both times I tried to open the environment. Thought I'd call that out, in case it is indicative of a specific.
  • 4. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    greybird Expert
    Currently Being Moderated
    Some of the bugs I mentioned are fixed by simply upgrading to a new version, some aren't and require that you recreate your data set. To avoid spending a very large amount of time analyzing your logs to see which bug it is, I suggest you simply try a more recent version, using a copy of your environment directory, as an experiment.

    The first step I suggest is to try upgrading to JE 4.0.117, since this is the closest to what is tested with Voldemort and has two fixes for duplicate DBs. This upgrade should be very safe.

    Second (if 4.0.117 doesn't work), I would try 4.1.17. And if that doesn't work, JE 5.0. To upgrade to JE 5.0, you do have to run an upgrade utility (see the change log), but it is not a data conversion.

    If upgrading does not fix the problem, you can recover your data using the "DbDump -r" utility to dump each database, followed by DbLoad. The -r option requires enough memory to hold all transactions, so make the heap as large as possible. With "DbDump -R" the memory requirement is avoided, but the result is much less desirable.

    http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/util/DbDump.html
    >
    -r - Salvage data from a possibly corrupt file. When used on a uncorrupted database, this option should return equivalent data to a normal dump, but most likely in a different order. The data for each database is saved into <databaseName>.dump files in the current directory.

    -R - Aggressively salvage data from a possibly corrupt file. The -R flag differs from the -r option in that it will return all possible data from the file at the risk of also returning already deleted or otherwise nonsensical items. Data dumped in this fashion will almost certainly have to be edited by hand or other means before the data is ready for reload into another database. The data for each database is saved into <databaseName>.dump files in the current directory.
    >

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  • 5. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    greybird Expert
    Currently Being Moderated
    I don't think the IllegalMonitorStateException is significant; it's a side effect.

    You mentioned recovering from a backup, and I just realized that I gave you instructions for salvaging data from the environment that can't be opened.

    If you can simply restore from a backup, that's by far the simplest thing. When you do that, also be sure to at least upgrade to 4.0.117. If possible, upgrade to 4.1.17, since it has additional fixes for bugs that could have caused the problem.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 6. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    909806 Newbie
    Currently Being Moderated
    Thanks, I'll try the newer versions. Yes I was interested in the recovery option too, since that might potentially recover more data than using a backup (which will be missing at least some data, since it's not up-to-the-minute). But I'll try opening with the newer versions first, and then go from there.

    -- Jeff
  • 7. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    909806 Newbie
    Currently Being Moderated
    Just to close the loop, I tried both JE 4.0.117 and 4.1.17 and still encountered the same error, and running the DbPreUpgrade_4_1 tool to enable use of 5.0 also fails with the same error message. So I think my only option is to restore from a backup. (My data set is too large to hold the entire tree in memory, so it sounds like "DbDump -r" won't work, and the "-R" option also sounds like it isn't workable, since it would require fixing up the resulting data by hand, and that isn't feasible.)
  • 8. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    greybird Expert
    Currently Being Moderated
    Thanks for letting us know. Sorry to hear that you must lose some data, but I'm glad that at least you have a backup.
    --mark                                                                                                                                                                                                                                                           
  • 9. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    909806 Newbie
    Currently Being Moderated
    Another follow-up: It turned out that my backup had the file that it was complaining was missing, so I tried adding that one file back and was then able to open the environment and recover! Of note, it was one of the last few files created the last time the environment was closed (though not the very last file). Wanted to mention this, in case it comes up again.
  • 10. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    greybird Expert
    Currently Being Moderated
    I'm glad you were able to recover!

    Even so I strongly suggest upgrading to 4.0.117, since this has fixes that may have caused the problem, and of course you don't want it to happen again. I suggest 4.0.117 because it is very likely not to have compatibility problems with Voldemort, which you said is tested with 4.0.92.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  • 11. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    909806 Newbie
    Currently Being Moderated
    Yep I'm actually giving 4.1.17 a try on a few instances (including the one that had this issue). So far so good!
  • 12. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    greybird Expert
    Currently Being Moderated
    Great!

    I followed the link to the Voldemort performance issue with JE 4.1, the one you mentioned above, and it didn't have enough info to say what exactly the problem was or how serious, and it looks like it may not have been followed up. Hopefully Voldemort will be tested with JE 5.0 at some point, since there were many performance enhancements in JE 5.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 13. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    909806 Newbie
    Currently Being Moderated
    BTW I had this happen again, this time under 4.1.17. Since I am not using deferred-write databases, I assume this is different from the similar-looking file-deletion issue fixed in 4.1.20, right?

    I saw what looks to be the same error in a separate forum topic recently, so I thought I'd mention it, since it seems to be more than a one-off. In my case I've only seen this under abrupt termination circumstances (accidental host reboot in one case, exception during environment close() in the other case).
  • 14. Re: EnvironmentFailureException with LOG_FILE_NOT_FOUND after abrupt shutdown
    greybird Expert
    Currently Being Moderated
    Right, there is no such known problem with JE 4.1.17, other than the one with deferred-write DBs. Thanks for mentioning it.
    --mark                                                                                                                                                                                                                                                                       

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points