This discussion is archived
8 Replies Latest reply: Jul 11, 2012 2:00 AM by 943066 RSS

LOG_FILE_NOT_FOUND bug possible in current BDB JE?

943066 Newbie
Currently Being Moderated
I've seen references to the LOG_FILE_NOT_FOUND bug in older BDB JE versions (4.x and 5 <= 5.0.34(, however, I seem to be suffering something similar with 5.0.48.

I have a non-transactional, deferred-write DB that seems to have gotten itself into an inconsistent state. It was fine loading several million records, but after ~8 hours of operation, bailed out with:

------------
com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 5.0.55) /tmp/data/index fetchTarget of 0x9f1/0x24d34eb parent IN=44832 IN class=com.sleepycat.je.tree.BIN lastFullVersion=0xdcf/0x5a96c91 lastLoggedVersion=0xdcf/0x5a96c91 parent.getDirty()=true state=0 LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1429)
     at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1251)
     at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2229)
     at com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1434)
     at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2716)
     at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2576)
     at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2430)
     at com.sleepycat.je.Cursor.search(Cursor.java:2397)
     at com.sleepycat.je.Database.get(Database.java:1042)
     at com.xxxx.db.BDBCalendarStorageBackend.indexCalendar(BDBCalendarStorageBackend.java:95)
     at com.xxxx.indexer.TicketIndexer.indexDeltaLogs(TicketIndexer.java:201)
     at com.xxxx.indexer.DeltaLogLoader.run(DeltaLogLoader.java:87)
Caused by: java.io.FileNotFoundException: /tmp/data/index/000009f1.jdb (No such file or directory)
     at java.io.RandomAccessFile.open(Native Method)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
     at java.io.RandomAccessFile.<init>(RandomAccessFile.java:101)
     at com.sleepycat.je.log.FileManager$6.<init>(FileManager.java:1282)
     at com.sleepycat.je.log.FileManager.openFileHandle(FileManager.java:1281)
     at com.sleepycat.je.log.FileManager.getFileHandle(FileManager.java:1147)
     at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:1102)
     at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:808)
     at com.sleepycat.je.log.LogManager.getLogEntryAllowInvisibleAtRecovery(LogManager.java:772)
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1412)
     ... 11 more
------------

Subsequent opens/use on the DB pretty much instantly yield the same error. I tried upgrading to 5.0.55 (hence the ver in the output above) but still get the same error.

As a recovery attempt, I used DbDump to try to dump the DB, however, it failed with a similar error. Enabling salvage mode enabled me to successfuly dump it, however, reloading it into a clean environment by programmatically running DbLoad.load() (so I can setup my env) caused the following error (after about 30% of the DB has restored):

------------
Exception in thread "main" com.sleepycat.je.EnvironmentFailureException: (JE 5.0.55) Node 11991 should have been split before calling insertEntry UNEXPECTED_STATE: Unexpected internal state, may have side effects. fetchTarget of 0x25/0x155a822 parent IN=2286 IN class=com.sleepycat.je.tree.IN lastFullVersion=0x3e/0x118d8f6 lastLoggedVersion=0x3e/0x118d8f6 parent.getDirty()=false state=0
     at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:376)
     at com.sleepycat.je.tree.IN.insertEntry1(IN.java:2326)
     at com.sleepycat.je.tree.IN.insertEntry(IN.java:2296)
     at com.sleepycat.je.tree.BINDelta.reconstituteBIN(BINDelta.java:216)
     at com.sleepycat.je.tree.BINDelta.reconstituteBIN(BINDelta.java:144)
     at com.sleepycat.je.log.entry.BINDeltaLogEntry.getIN(BINDeltaLogEntry.java:53)
     at com.sleepycat.je.log.entry.BINDeltaLogEntry.getResolvedItem(BINDeltaLogEntry.java:43)
     at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1422)
     at com.sleepycat.je.tree.Tree.searchSubTreeUntilSplit(Tree.java:1786)
     at com.sleepycat.je.tree.Tree.searchSubTreeSplitsAllowed(Tree.java:1729)
     at com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1296)
     at com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:2205)
     at com.sleepycat.je.dbi.CursorImpl.putInternal(CursorImpl.java:834)
     at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:779)
     at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:2243)
     at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2200)
     at com.sleepycat.je.Cursor.putNotify(Cursor.java:2117)
     at com.sleepycat.je.Cursor.putNoDups(Cursor.java:2052)
     at com.sleepycat.je.Cursor.putInternal(Cursor.java:2020)
     at com.sleepycat.je.Database.putInternal(Database.java:1324)
     at com.sleepycat.je.Database.put(Database.java:1194)
     at com.sleepycat.je.util.DbLoad.loadData(DbLoad.java:544)
     at com.sleepycat.je.util.DbLoad.load(DbLoad.java:414)
     at com.xxxx.db.BDBCalendarStorageBackend.loadBDBDump(BDBCalendarStorageBackend.java:254)
     at com.xxxx.cli.BDBTool.run(BDBTool.java:49)
     at com.xxxx.cli.AbstractBaseCommand.execute(AbstractBaseCommand.java:114)
     at com.xxxx.cli.BDBTool.main(BDBTool.java:69)
------------

The only other slightly exotic thing I'm using is a custom partial BTree comparator, however, it quite happily loaded/updated literally tens of millions of records for hours before the FileNotFound error cropped up, so it seems unlikely that would be the cause.

Any ideas?

Thanks in advance,

fb.
  • 1. Re: LOG_FILE_NOT_FOUND bug possible in current BDB JE?
    greybird Expert
    Currently Being Moderated
    The only other slightly exotic thing I'm using is a custom partial BTree comparator, however, it quite happily loaded/updated literally tens of millions of records for hours before the FileNotFound error cropped up, so it seems unlikely that would be the cause.
    Could you please post the source code for this comparator? It is the first suspect. An incorrect comparator could cause this error -- in fact an incorrect comparator could create any number of problems. And it could happen after loading a large number of records.

    We do not know of a bug in JE 5 that could cause LOG_FILE_NOT_FOUND. If there is such a bug, the only way to find it would be for you to: 1) reproduce it with EnvironmentConfig.CLEANER_EXPUNGE set to "false", 2) save the home directory, 3) contact us and we'll work with you to analyze the logs (we'll need to ask you to send specific log files to us).

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 2. Re: LOG_FILE_NOT_FOUND bug possible in current BDB JE?
    943066 Newbie
    Currently Being Moderated
    Thanks heaps for that - is there a way I can communicate with you/your team off-forum?

    Keen to share log files/etc but would prefer not to have the data contained be public links/etc for obvious reasons.

    Thanks!

    fb.
  • 3. Re: LOG_FILE_NOT_FOUND bug possible in current BDB JE?
    greybird Expert
    Currently Being Moderated
    Sure, you can email your comparator code to me, mark.hayes at o.com (o == oracle).
    --mark                                                                                                                                                                                   
  • 4. Re: LOG_FILE_NOT_FOUND bug possible in current BDB JE?
    943066 Newbie
    Currently Being Moderated
    Just updating this thread to contain the conclusion (thanks to Mark's help) -

    The root cause here was to do with my custom Btree comparator, which did something unexpected.

    Ostensibly, I was using a partial comparator to use one portion of the key for testing equality and a different part for sort order.

    If you imagine a composite key made up of three components: C1:C2:C3

    Sorting was being done on C1:C2, whilst uniqueness (ie: equality) on C1:C3 - the idea being is that if keys had matching C1:C3 values, they would be replaced (rather than duplicated).

    This appears to work at a high level, but causes very subtle/confusing problems. Because a BTree uses the comparator to traverse the tree, it is possible to attempt to unintentionally insert a key in such a way that the existing "duplicate" was never seen.

    Given the theoretical keys (C1:C2:C3):

    a:1:20
    a:3:30
    a:5:40

    All are valid as C1:C3 are all unique (a, 20), (a, 30), (a, 40), however, if you now went to insert the key: "a:4:20", we'd have a problem.

    The intended behaviour would be to update the existing "a:1:20" key, as C1:C3 are duplicates (a, 20), however, instead what would happen is we'd unintentionally insert a new key.

    This is because the btree will use C1:C2 as the sort comparator (returning -1 or 1) and only examine the keys a:3:30 and a:5:40 as candidates for duplication, missing the a:1:20 key entirely.

    This is the same sort of trickyness you can run into in Java proper if you override hashCode and not equals/etc - it's effectively breaking the "expected contract" for a Btree.

    My solution is to switch to using a secondary database for providing differentiated "uniqueness" vs "ordering".

    Thanks heaps to Mark for working through this with me.

    fb.
  • 5. Re: LOG_FILE_NOT_FOUND bug possible in current BDB JE?
    greybird Expert
    Currently Being Moderated
    Thanks heaps to Mark for working through this with me.
    You're welcome. Thanks for following up and explaining it for the benefit of others. And I'm very glad it wasn't a JE bug!
    My solution is to switch to using a secondary database for providing differentiated "uniqueness" vs "ordering".
    An index for uniqueness may be a good solution. But as you said in email, it adds significant overhead (memory and disk). This overhead can be minimized by keeping your keys (primary and secondary) as small as possible, and enabling key prefixing.

    I'd also like to point out that adding a secondary isn't always the best choice. For example, if the number of keys with the same C1 value is fairly small, another way of checking for uniqueness (when inserting) is to iterate over them, looking for a match on C1:C3. The cost of this iteration may be less than the cost of maintaining a uniqueness index. To make this work, you'll have to use Serializable isolation during the iteration, to prevent another thread from inserting a key in that range.

    If you're pushing the performance limits of your hardware, it may be worth trying more than one such approach and comparing the performance. If performance is not a big concern, then the additional index is the simplest approach to get right.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  • 6. Re: LOG_FILE_NOT_FOUND bug possible in current BDB JE?
    943066 Newbie
    Currently Being Moderated
    I'd also like to point out that adding a secondary isn't always the best choice. For example, if the number of keys with the same C1 value is fairly small, another way of checking for uniqueness (when inserting) is to iterate over them, looking for a match on C1:C3. The cost of this iteration may be less than the cost of maintaining a uniqueness index. To make this work, you'll have to use Serializable isolation during the iteration, to prevent another thread from inserting a key in that range.
    Yup, I've considered this as an alternative implementation. The number of possible keys sharing C1 is actually relatively small (<350) so it may end up being quicker to use this approach.

    I haven't done a performance comparison yet (actually having some slight issues with secondary database consistency) but will see how I go with the implementation. Thanks for the tip in key-prefixing too, I actually hadn't seen that and it should help quite a bit in my case.

    By the way, does this operate on literal prefixes (ie: at a byte array level) or does it consider the comparator? In my case, the literal prefix and effective prefix differ.
  • 7. Re: LOG_FILE_NOT_FOUND bug possible in current BDB JE?
    greybird Expert
    Currently Being Moderated
    By the way, does this operate on literal prefixes (ie: at a byte array level) or does it consider the comparator? In my case, the literal prefix and effective prefix differ.
    It's just the literal prefix, sorry. To handle anything else there would need to be another callback that splits/combines a key into prefix and suffix, and we don't have such a callback.

    You may find it to be worthwhile to reorganize the key so fields are in sorted order, even if it requires reloading your data, at some point in time. Key size has a big impact.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  • 8. Re: LOG_FILE_NOT_FOUND bug possible in current BDB JE?
    943066 Newbie
    Currently Being Moderated
    Hey Mark,

    Thanks heaps for that - I've re-organized my key serialization so it creates useful prefixes.

    However, you'll never believe it, but I've also struck some additional oddball issues: com.sleepycat.je.SecondaryIntegrityException when deleting (sorry).

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points