This discussion is archived
1 2 3 Previous Next 32 Replies Latest reply: Jan 2, 2008 5:08 AM by greybird RSS

(JE 3.2.44) Couldn't open file

536712 Newbie
Currently Being Moderated
Hello!

We recently upgraded to newer version of BDB JE, and failed with random, but weird issue:
17/Oct/2007 04:49:38 ERROR [pool-97-thread-2] - com.sleepycat.je.DatabaseException: (JE 3.2.44) Database r
elations_database14269 id=26 rootLsn=0xffffffff/0xffffffff  IN type=DBIN/2 id=29976291 not expected on INL
ist
com.sleepycat.je.DatabaseException: (JE 3.2.44) Database relations_database14269 id=26 rootLsn=0xffffffff/
0xffffffff  IN type=DBIN/2 id=29976291 not expected on INList
        at com.sleepycat.je.evictor.Evictor.selectIN(Evictor.java:511)
        at com.sleepycat.je.evictor.Evictor.evictBatch(Evictor.java:354)
        at com.sleepycat.je.evictor.Evictor.doEvict(Evictor.java:249)
        at com.sleepycat.je.evictor.Evictor.doCriticalEviction(Evictor.java:274)
        at com.sleepycat.je.dbi.CursorImpl.close(CursorImpl.java:684)
        at com.sleepycat.je.Cursor.close(Cursor.java:262)
        at com.sleepycat.je.Cursor.close(Cursor.java:250)
        at com.sleepycat.je.Database.putInternal(Database.java:667)
        at com.sleepycat.je.Database.put(Database.java:609)
        at com.redwerk.webcrawler.storage.url.bdbje.BDBJEStorage.saveRelations(BDBJEStorage.java:361)
        at com.redwerk.webcrawler.storage.url.bdbje.BDBJEStorage.update(BDBJEStorage.java:396)
        at com.redwerk.webcrawler.storage.url.RetryHandlingStorageDecorator.update(RetryHandlingStorageDec
orator.java:159)
        at com.redwerk.webcrawler.storage.url.CountHandlingStorageDecorator.update(CountHandlingStorageDec
orator.java:86)
        at com.redwerk.webcrawler.filter.record.UrlProcessedFilter._accept(UrlProcessedFilter.java:34)
        at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:23)
        at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:25)
        at com.redwerk.webcrawler.Crawler.addRecord(Crawler.java:731)
        at com.redwerk.webcrawler.ConsumerThread.run(ConsumerThread.java:301)
        at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
2)
        at edu.emory.mathcs.backport.java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:305)
        at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:135)
        at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:990)
        at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:531)
        at java.lang.Thread.run(Thread.java:595)
17/Oct/2007 04:49:38 ERROR [pool-97-thread-4] - com.sleepycat.je.DatabaseException: (JE 3.2.44) fetchTarge
t of 0x1d3/0x472b30 parent IN=32738188 lastFullVersion=0x1d3/0x479cdc parent.getDirty()=true state=2 com.s
leepycat.je.log.LogFileNotFoundException: (JE 3.2.44) 0x1d3/0x472b30 (JE 3.2.44) Couldn't open file db/000
001d3.jdb: db/000001d3.jdb (No such file or directory)
com.sleepycat.je.DatabaseException: (JE 3.2.44) fetchTarget of 0x1d3/0x472b30 parent IN=32738188 lastFullV
ersion=0x1d3/0x479cdc parent.getDirty()=true state=2 com.sleepycat.je.log.LogFileNotFoundException: (JE 3.
2.44) 0x1d3/0x472b30 (JE 3.2.44) Couldn't open file db/000001d3.jdb: db/000001d3.jdb (No such file or dire
ctory)
        at com.sleepycat.je.tree.ChildReference.fetchTarget(ChildReference.java:143)
        at com.sleepycat.je.tree.DIN.getDupCountLN(DIN.java:147)
        at com.sleepycat.je.dbi.CursorImpl.lockDupCountLN(CursorImpl.java:2409)
        at com.sleepycat.je.tree.Tree.insertDuplicate(Tree.java:2703)
        at com.sleepycat.je.tree.Tree.insert(Tree.java:2652)
        at com.sleepycat.je.dbi.CursorImpl.putLN(CursorImpl.java:1004)
        at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:1027)
        at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:871)
        at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:808)
        at com.sleepycat.je.Cursor.putInternal(Cursor.java:762)
        at com.sleepycat.je.Database.putInternal(Database.java:663)
        at com.sleepycat.je.Database.put(Database.java:609)
        at com.redwerk.webcrawler.storage.url.bdbje.BDBJEStorage.saveRelations(BDBJEStorage.java:361)
        at com.redwerk.webcrawler.storage.url.bdbje.BDBJEStorage.update(BDBJEStorage.java:396)
        at com.redwerk.webcrawler.storage.url.RetryHandlingStorageDecorator.update(RetryHandlingStorageDec
orator.java:159)
        at com.redwerk.webcrawler.storage.url.CountHandlingStorageDecorator.update(CountHandlingStorageDec
orator.java:86)
        at com.redwerk.webcrawler.filter.record.UrlProcessedFilter._accept(UrlProcessedFilter.java:34)
        at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:23)
        at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:25)
        at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:25)
        at com.redwerk.webcrawler.Crawler.addRecord(Crawler.java:731)
        at com.redwerk.webcrawler.ConsumerThread.run(ConsumerThread.java:301)
        at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
2)
        at edu.emory.mathcs.backport.java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:305)
        at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:135)
        at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:990)
        at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:531)
        at java.lang.Thread.run(Thread.java:595)
Caused by: com.sleepycat.je.log.LogFileNotFoundException: (JE 3.2.44) 0x1d3/0x472b30 (JE 3.2.44) Couldn't 
open file db/000001d3.jdb: db/000001d3.jdb (No such file or directory)
        at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:760)
        at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:594)
        at com.sleepycat.je.log.LogManager.get(LogManager.java:735)
        at com.sleepycat.je.tree.ChildReference.fetchTarget(ChildReference.java:135)
        ... 27 more
We checked our code and found we are not doing any removal of JE files at this stage (we had to implement removal of JE files BEFORE thread start to avoid cases when corrupted environment causes current task to not start, probably there's a better way? However this is definitely not a cause of the error, because removal is performed before environment is created)

The je.properties we're using is:
je.cleaner.threads=7
je.maxMemoryPercent=40
je.cleaner.deadlockRetry=7
#je.cleaner.minFileUtilization=20
je.cleaner.readSize=1048576
je.cleaner.lookAheadCacheSize=262144
#je.evictor.lruOnly=false
#je.evictor.nodesPerScan=100
je.log.faultReadSize=102400
So the question is - is this is some bug in BDB JE, or we just missed things?
  • 1. Re: (JE 3.2.44) Couldn't open file
    greybird Expert
    Currently Being Moderated
    Hi Eugene,
    We recently upgraded to newer version of BDB JE, and
    failed with random, but weird issue:
    Were any of the log files for this environment created with an earlier version of JE? If so, what version?

    Are you using DeferredWrite? There were bugs in earlier JE releases involving DeferredWrite and database removal/truncation and duplicates. I see from the information in your post that you are using database removal/truncation and duplicates.
    We checked our code and found we are not doing any removal
    of JE files at this stage (we had to implement
    removal of JE files BEFORE thread start to avoid
    cases when corrupted environment causes current task
    to not start, probably there's a better way? However
    this is definitely not a cause of the error, because
    removal is performed before environment is created)
    I'm not sure I understand. Before opening an environment, if you wish to clear out all data in the environment directory and start from scratch, then the right thing to do is delete all the log files. If that's what you're doing, that's fine.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 2. Re: (JE 3.2.44) Couldn't open file
    536712 Newbie
    Currently Being Moderated
    Were any of the log files for this environment
    created with an earlier version of JE? If so, what
    version?
    No, there were no files from another version of BDB JE
    Are you using DeferredWrite? There were bugs in
    earlier JE releases involving DeferredWrite and
    database removal/truncation and duplicates. I see
    from the information in your post that you are using
    database removal/truncation and duplicates.
    We are not using DeferredWrite
    I'm not sure I understand. Before opening an
    environment, if you wish to clear out all data in the
    environment directory and start from scratch, then
    the right thing to do is delete all the log files.
    If that's what you're doing, that's fine.
    Yup, we are deleting old log files (*.jdb) from the directory where environment was created right before we are re-using the directory for creation of new environment.
  • 3. Re: (JE 3.2.44) Couldn't open file
    greybird Expert
    Currently Being Moderated
    No, there were no files from another version of BDB
    JE
    We are not using DeferredWrite
    In that case, this must be a problem we've not seen before.

    Please save the log files from the environment directory with this problem, as we may need them later for debugging.

    The best way for us to debug this problem is for you to do the following:

    1) Set the following environment config property so that cleaned log files are not deleted. They will be renamed to .del instead of being deleted.

    je.cleaner.expunge=false

    2) Run the application and reproduce the problem.

    3) Create a zip or tar file of the entire environment directory.

    4) Send me email (mark.hayes at o....com) and I'll give you an FTP address for uploading the archive. Send all exceptions from your output log in the email.

    If the archive is too large for uploading (500 MB or larger) then please send me the exceptions and a list of the log file names, and I'll ask you to upload specific files instead of all files.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 4. Re: (JE 3.2.44) Couldn't open file
    greybird Expert
    Currently Being Moderated
    Eugene, I have another question for you.
    17/Oct/2007 04:49:38 ERROR [pool-97-thread-2] -
    com.sleepycat.je.DatabaseException: (JE 3.2.44)
    Database relations_database14269 id=26
    rootLsn=0xffffffff/0xffffffff IN type=DBIN/2
    id=29976291 not expected on INList
    The internal information in the above exception message indicates that the relations_database14269 is either:

    A) a DeferredWrite database that was never written to disk

    or

    B) an empty database, and no record was ever added to this database, and the database was removed or truncated.

    Since you say you are not using DeferredWrite, it must be B.

    Could you please describe the life cycle of relations_database14269?

    Thanks,
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 5. Re: (JE 3.2.44) Couldn't open file
    greybird Expert
    Currently Being Moderated
    Eugene,

    I'm having a difficult time figuring out how the following exception could possibly occur, if there were no other problems prior to this. Is this the first exception you see in the log?

    If you see other exceptions, before or after this one, please send all the stack traces.

    Thanks,
    --mark
    17/Oct/2007 04:49:38 ERROR [pool-97-thread-2] -
    com.sleepycat.je.DatabaseException: (JE 3.2.44)
    Database relations_database14269 id=26
    rootLsn=0xffffffff/0xffffffff IN type=DBIN/2
    id=29976291 not expected on INL
    ist
    com.sleepycat.je.DatabaseException: (JE 3.2.44)
    Database relations_database14269 id=26
    rootLsn=0xffffffff/
    0xffffffff IN type=DBIN/2 id=29976291 not expected
    on INList
    at
    com.sleepycat.je.evictor.Evictor.selectIN(Evictor.java
    :511)
    at
    com.sleepycat.je.evictor.Evictor.evictBatch(Evictor.ja
    va:354)
    at
    ....
  • 6. Re: (JE 3.2.44) Couldn't open file
    536712 Newbie
    Currently Being Moderated
    Could you please describe the life cycle of relations_database14269?
    this database is created when worker thread is started, then worker thread creates a pool of internal threads which are interacting with this database, and after these threads finish their job, the database is deleted using Environment's methods, when next time the worker thread is invoked, it clears the *.jdb files from the directory

    I have a question - does the je.lck file have some special meaning, may it be a case the environment information is stored in it somehow?
  • 7. Re: (JE 3.2.44) Couldn't open file
    536712 Newbie
    Currently Being Moderated
    Eugene,

    I'm having a difficult time figuring out how the
    following exception could possibly occur, if there
    were no other problems prior to this. Is this the
    first exception you see in the log?

    If you see other exceptions, before or after this
    one, please send all the stack traces.
    No, there are no exceptions at all, this is firstmost exception I can see in logs
  • 8. Re: (JE 3.2.44) Couldn't open file
    greybird Expert
    Currently Being Moderated
    I have a question - does the je.lck file have some
    special meaning, may it be a case the environment
    information is stored in it somehow?
    No, it is only used to prevent multiple processes from writing to the environment at the same time.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 9. Re: (JE 3.2.44) Couldn't open file
    greybird Expert
    Currently Being Moderated
    Eugene,

    I'm working on a theory about why the first exception occurred. And I'm trying to connect the first exception "...not expected on INList" to the second exception "...Couldn't open file...".

    I see that both exceptions have the same time stamp, so they occur in the same second.

    I see that in both exceptions, your application is calling Database.put from BDBJEStorage.java:361.

    Can I conclude that in both cases, you are calling Database.put for the relations_database14269 database, that is named in the first exception? In other words, is this the database that is accessed by BDBJEStorage.java:361?

    Is your application retrying the operation after the first exception occurs, and that's why we see another exception immediately afterwards at the same line number?

    Thanks,
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 10. Re: (JE 3.2.44) Couldn't open file
    536712 Newbie
    Currently Being Moderated
    Eugene,

    I'm working on a theory about why the first exception
    occurred. And I'm trying to connect the first
    exception "...not expected on INList" to the second
    exception "...Couldn't open file...".

    I see that both exceptions have the same time stamp,
    so they occur in the same second.

    I see that in both exceptions, your application is
    calling Database.put from BDBJEStorage.java:361.

    Can I conclude that in both cases, you are calling
    Database.put for the relations_database14269
    database, that is named in the first exception? In
    other words, is this the database that is accessed by
    BDBJEStorage.java:361?
                    for (Iterator it = record.getParents().iterator(); it.hasNext();) {
                        Integer parent_id = (Integer) it.next();
                        binding.objectToEntry(parent_id, valueEntry);
                        if (cursor.getSearchBoth(keyEntry, valueEntry,
                                LockMode.DEFAULT) != OperationStatus.SUCCESS)
    line 361:                        relationDB.put(txn, keyEntry, valueEntry);
                    }
    Is your application retrying the operation after the
    first exception occurs, and that's why we see another
    exception immediately afterwards at the same line
    number?
    No, in this method the exception is being thrown to the caller, so no immediate retry of the put operation occurs
  • 11. Re: (JE 3.2.44) Couldn't open file
    greybird Expert
    Currently Being Moderated
    Eugene,

    Thanks for answering my 2nd question about retries. Could you please answer my 1st question also:
    Can I conclude that in both cases, you are calling
    Database.put for the relations_database14269
    database, that is named in the first exception? In
    other words, is this the database that is accessed by
    BDBJEStorage.java:361?
    Is the database being accessed by this code: relations_database14269?

    Thanks,
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 12. Re: (JE 3.2.44) Couldn't open file
    greybird Expert
    Currently Being Moderated
    Eugene,
    Could you please describe the life cycle of
    relations_database14269?

    this database is created when worker thread is
    started, then worker thread creates a pool of
    internal threads which are interacting with this
    database, and after these threads finish their job,
    the database is deleted using Environment's methods,
    when next time the worker thread is invoked, it
    clears the *.jdb files from the directory
    If I understand what you said above, the relations DB is created only once per JE Environment. Is that correct?

    In other words, this is the sequence:

    1) Delete all .jdb files from directory
    2) Open the environment (new Environment(...))
    3) Create the relations DB (env.openDatabase)
    4) Access the relations DB with worker threads
    5) Close the relations DB (Database.close)
    6) Remove the relations DB (Environment.removeDatabase)
    7) Close the environment (Environment.close)

    Is that correct?

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 13. Re: (JE 3.2.44) Couldn't open file
    536712 Newbie
    Currently Being Moderated
    Exactly!

    Finally we chosen to not remove database through the environment but rather just close environment and then remove files. However this did not fix the issue, and such weird errors with absent files still do occur during step 4, when the threads are working with the databases.

    I would assume something could be wrong with cleaner threads?
  • 14. Re: (JE 3.2.44) Couldn't open file
    greybird Expert
    Currently Being Moderated
    Hi Eugene,

    Based on the information you've given me, I suspect this is a JE bug and is an interaction between the cleaner or checkpointer and the removeDatabase method. And although the symptoms are different, I suspect it is the same problem reported here:

    Re: DatabaseException

    As I mentioned in the post above, this may take us several weeks to resolve. I have your email and I will send you a note directly when I have more information on this problem.

    I apologize for the inconvenience.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
1 2 3 Previous Next