1 2 3 Previous Next 32 Replies Latest reply: Jan 2, 2008 7:08 AM by Greybird-Oracle RSS

    (JE 3.2.44) Couldn't open file

    536712
      Hello!

      We recently upgraded to newer version of BDB JE, and failed with random, but weird issue:
      17/Oct/2007 04:49:38 ERROR [pool-97-thread-2] - com.sleepycat.je.DatabaseException: (JE 3.2.44) Database r
      elations_database14269 id=26 rootLsn=0xffffffff/0xffffffff  IN type=DBIN/2 id=29976291 not expected on INL
      ist
      com.sleepycat.je.DatabaseException: (JE 3.2.44) Database relations_database14269 id=26 rootLsn=0xffffffff/
      0xffffffff  IN type=DBIN/2 id=29976291 not expected on INList
              at com.sleepycat.je.evictor.Evictor.selectIN(Evictor.java:511)
              at com.sleepycat.je.evictor.Evictor.evictBatch(Evictor.java:354)
              at com.sleepycat.je.evictor.Evictor.doEvict(Evictor.java:249)
              at com.sleepycat.je.evictor.Evictor.doCriticalEviction(Evictor.java:274)
              at com.sleepycat.je.dbi.CursorImpl.close(CursorImpl.java:684)
              at com.sleepycat.je.Cursor.close(Cursor.java:262)
              at com.sleepycat.je.Cursor.close(Cursor.java:250)
              at com.sleepycat.je.Database.putInternal(Database.java:667)
              at com.sleepycat.je.Database.put(Database.java:609)
              at com.redwerk.webcrawler.storage.url.bdbje.BDBJEStorage.saveRelations(BDBJEStorage.java:361)
              at com.redwerk.webcrawler.storage.url.bdbje.BDBJEStorage.update(BDBJEStorage.java:396)
              at com.redwerk.webcrawler.storage.url.RetryHandlingStorageDecorator.update(RetryHandlingStorageDec
      orator.java:159)
              at com.redwerk.webcrawler.storage.url.CountHandlingStorageDecorator.update(CountHandlingStorageDec
      orator.java:86)
              at com.redwerk.webcrawler.filter.record.UrlProcessedFilter._accept(UrlProcessedFilter.java:34)
              at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:23)
              at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:25)
              at com.redwerk.webcrawler.Crawler.addRecord(Crawler.java:731)
              at com.redwerk.webcrawler.ConsumerThread.run(ConsumerThread.java:301)
              at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
      2)
              at edu.emory.mathcs.backport.java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:305)
              at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:135)
              at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
      java:990)
              at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
      .java:531)
              at java.lang.Thread.run(Thread.java:595)
      17/Oct/2007 04:49:38 ERROR [pool-97-thread-4] - com.sleepycat.je.DatabaseException: (JE 3.2.44) fetchTarge
      t of 0x1d3/0x472b30 parent IN=32738188 lastFullVersion=0x1d3/0x479cdc parent.getDirty()=true state=2 com.s
      leepycat.je.log.LogFileNotFoundException: (JE 3.2.44) 0x1d3/0x472b30 (JE 3.2.44) Couldn't open file db/000
      001d3.jdb: db/000001d3.jdb (No such file or directory)
      com.sleepycat.je.DatabaseException: (JE 3.2.44) fetchTarget of 0x1d3/0x472b30 parent IN=32738188 lastFullV
      ersion=0x1d3/0x479cdc parent.getDirty()=true state=2 com.sleepycat.je.log.LogFileNotFoundException: (JE 3.
      2.44) 0x1d3/0x472b30 (JE 3.2.44) Couldn't open file db/000001d3.jdb: db/000001d3.jdb (No such file or dire
      ctory)
              at com.sleepycat.je.tree.ChildReference.fetchTarget(ChildReference.java:143)
              at com.sleepycat.je.tree.DIN.getDupCountLN(DIN.java:147)
              at com.sleepycat.je.dbi.CursorImpl.lockDupCountLN(CursorImpl.java:2409)
              at com.sleepycat.je.tree.Tree.insertDuplicate(Tree.java:2703)
              at com.sleepycat.je.tree.Tree.insert(Tree.java:2652)
              at com.sleepycat.je.dbi.CursorImpl.putLN(CursorImpl.java:1004)
              at com.sleepycat.je.dbi.CursorImpl.put(CursorImpl.java:1027)
              at com.sleepycat.je.Cursor.putAllowPhantoms(Cursor.java:871)
              at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:808)
              at com.sleepycat.je.Cursor.putInternal(Cursor.java:762)
              at com.sleepycat.je.Database.putInternal(Database.java:663)
              at com.sleepycat.je.Database.put(Database.java:609)
              at com.redwerk.webcrawler.storage.url.bdbje.BDBJEStorage.saveRelations(BDBJEStorage.java:361)
              at com.redwerk.webcrawler.storage.url.bdbje.BDBJEStorage.update(BDBJEStorage.java:396)
              at com.redwerk.webcrawler.storage.url.RetryHandlingStorageDecorator.update(RetryHandlingStorageDec
      orator.java:159)
              at com.redwerk.webcrawler.storage.url.CountHandlingStorageDecorator.update(CountHandlingStorageDec
      orator.java:86)
              at com.redwerk.webcrawler.filter.record.UrlProcessedFilter._accept(UrlProcessedFilter.java:34)
              at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:23)
              at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:25)
              at com.redwerk.webcrawler.filter.record.ChainedFilter.accept(ChainedFilter.java:25)
              at com.redwerk.webcrawler.Crawler.addRecord(Crawler.java:731)
              at com.redwerk.webcrawler.ConsumerThread.run(ConsumerThread.java:301)
              at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44
      2)
              at edu.emory.mathcs.backport.java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:305)
              at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:135)
              at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
      java:990)
              at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
      .java:531)
              at java.lang.Thread.run(Thread.java:595)
      Caused by: com.sleepycat.je.log.LogFileNotFoundException: (JE 3.2.44) 0x1d3/0x472b30 (JE 3.2.44) Couldn't 
      open file db/000001d3.jdb: db/000001d3.jdb (No such file or directory)
              at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:760)
              at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:594)
              at com.sleepycat.je.log.LogManager.get(LogManager.java:735)
              at com.sleepycat.je.tree.ChildReference.fetchTarget(ChildReference.java:135)
              ... 27 more
      We checked our code and found we are not doing any removal of JE files at this stage (we had to implement removal of JE files BEFORE thread start to avoid cases when corrupted environment causes current task to not start, probably there's a better way? However this is definitely not a cause of the error, because removal is performed before environment is created)

      The je.properties we're using is:
      je.cleaner.threads=7
      je.maxMemoryPercent=40
      je.cleaner.deadlockRetry=7
      #je.cleaner.minFileUtilization=20
      je.cleaner.readSize=1048576
      je.cleaner.lookAheadCacheSize=262144
      #je.evictor.lruOnly=false
      #je.evictor.nodesPerScan=100
      je.log.faultReadSize=102400
      So the question is - is this is some bug in BDB JE, or we just missed things?
        • 1. Re: (JE 3.2.44) Couldn't open file
          Greybird-Oracle
          Hi Eugene,
          We recently upgraded to newer version of BDB JE, and
          failed with random, but weird issue:
          Were any of the log files for this environment created with an earlier version of JE? If so, what version?

          Are you using DeferredWrite? There were bugs in earlier JE releases involving DeferredWrite and database removal/truncation and duplicates. I see from the information in your post that you are using database removal/truncation and duplicates.
          We checked our code and found we are not doing any removal
          of JE files at this stage (we had to implement
          removal of JE files BEFORE thread start to avoid
          cases when corrupted environment causes current task
          to not start, probably there's a better way? However
          this is definitely not a cause of the error, because
          removal is performed before environment is created)
          I'm not sure I understand. Before opening an environment, if you wish to clear out all data in the environment directory and start from scratch, then the right thing to do is delete all the log files. If that's what you're doing, that's fine.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
          • 2. Re: (JE 3.2.44) Couldn't open file
            536712
            Were any of the log files for this environment
            created with an earlier version of JE? If so, what
            version?
            No, there were no files from another version of BDB JE
            Are you using DeferredWrite? There were bugs in
            earlier JE releases involving DeferredWrite and
            database removal/truncation and duplicates. I see
            from the information in your post that you are using
            database removal/truncation and duplicates.
            We are not using DeferredWrite
            I'm not sure I understand. Before opening an
            environment, if you wish to clear out all data in the
            environment directory and start from scratch, then
            the right thing to do is delete all the log files.
            If that's what you're doing, that's fine.
            Yup, we are deleting old log files (*.jdb) from the directory where environment was created right before we are re-using the directory for creation of new environment.
            • 3. Re: (JE 3.2.44) Couldn't open file
              Greybird-Oracle
              No, there were no files from another version of BDB
              JE
              We are not using DeferredWrite
              In that case, this must be a problem we've not seen before.

              Please save the log files from the environment directory with this problem, as we may need them later for debugging.

              The best way for us to debug this problem is for you to do the following:

              1) Set the following environment config property so that cleaned log files are not deleted. They will be renamed to .del instead of being deleted.

              je.cleaner.expunge=false

              2) Run the application and reproduce the problem.

              3) Create a zip or tar file of the entire environment directory.

              4) Send me email (mark.hayes at o....com) and I'll give you an FTP address for uploading the archive. Send all exceptions from your output log in the email.

              If the archive is too large for uploading (500 MB or larger) then please send me the exceptions and a list of the log file names, and I'll ask you to upload specific files instead of all files.

              --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
              • 4. Re: (JE 3.2.44) Couldn't open file
                Greybird-Oracle
                Eugene, I have another question for you.
                17/Oct/2007 04:49:38 ERROR [pool-97-thread-2] -
                com.sleepycat.je.DatabaseException: (JE 3.2.44)
                Database relations_database14269 id=26
                rootLsn=0xffffffff/0xffffffff IN type=DBIN/2
                id=29976291 not expected on INList
                The internal information in the above exception message indicates that the relations_database14269 is either:

                A) a DeferredWrite database that was never written to disk

                or

                B) an empty database, and no record was ever added to this database, and the database was removed or truncated.

                Since you say you are not using DeferredWrite, it must be B.

                Could you please describe the life cycle of relations_database14269?

                Thanks,
                --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
                • 5. Re: (JE 3.2.44) Couldn't open file
                  Greybird-Oracle
                  Eugene,

                  I'm having a difficult time figuring out how the following exception could possibly occur, if there were no other problems prior to this. Is this the first exception you see in the log?

                  If you see other exceptions, before or after this one, please send all the stack traces.

                  Thanks,
                  --mark
                  17/Oct/2007 04:49:38 ERROR [pool-97-thread-2] -
                  com.sleepycat.je.DatabaseException: (JE 3.2.44)
                  Database relations_database14269 id=26
                  rootLsn=0xffffffff/0xffffffff IN type=DBIN/2
                  id=29976291 not expected on INL
                  ist
                  com.sleepycat.je.DatabaseException: (JE 3.2.44)
                  Database relations_database14269 id=26
                  rootLsn=0xffffffff/
                  0xffffffff IN type=DBIN/2 id=29976291 not expected
                  on INList
                  at
                  com.sleepycat.je.evictor.Evictor.selectIN(Evictor.java
                  :511)
                  at
                  com.sleepycat.je.evictor.Evictor.evictBatch(Evictor.ja
                  va:354)
                  at
                  ....
                  • 6. Re: (JE 3.2.44) Couldn't open file
                    536712
                    Could you please describe the life cycle of relations_database14269?
                    this database is created when worker thread is started, then worker thread creates a pool of internal threads which are interacting with this database, and after these threads finish their job, the database is deleted using Environment's methods, when next time the worker thread is invoked, it clears the *.jdb files from the directory

                    I have a question - does the je.lck file have some special meaning, may it be a case the environment information is stored in it somehow?
                    • 7. Re: (JE 3.2.44) Couldn't open file
                      536712
                      Eugene,

                      I'm having a difficult time figuring out how the
                      following exception could possibly occur, if there
                      were no other problems prior to this. Is this the
                      first exception you see in the log?

                      If you see other exceptions, before or after this
                      one, please send all the stack traces.
                      No, there are no exceptions at all, this is firstmost exception I can see in logs
                      • 8. Re: (JE 3.2.44) Couldn't open file
                        Greybird-Oracle
                        I have a question - does the je.lck file have some
                        special meaning, may it be a case the environment
                        information is stored in it somehow?
                        No, it is only used to prevent multiple processes from writing to the environment at the same time.

                        --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
                        • 9. Re: (JE 3.2.44) Couldn't open file
                          Greybird-Oracle
                          Eugene,

                          I'm working on a theory about why the first exception occurred. And I'm trying to connect the first exception "...not expected on INList" to the second exception "...Couldn't open file...".

                          I see that both exceptions have the same time stamp, so they occur in the same second.

                          I see that in both exceptions, your application is calling Database.put from BDBJEStorage.java:361.

                          Can I conclude that in both cases, you are calling Database.put for the relations_database14269 database, that is named in the first exception? In other words, is this the database that is accessed by BDBJEStorage.java:361?

                          Is your application retrying the operation after the first exception occurs, and that's why we see another exception immediately afterwards at the same line number?

                          Thanks,
                          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
                          • 10. Re: (JE 3.2.44) Couldn't open file
                            536712
                            Eugene,

                            I'm working on a theory about why the first exception
                            occurred. And I'm trying to connect the first
                            exception "...not expected on INList" to the second
                            exception "...Couldn't open file...".

                            I see that both exceptions have the same time stamp,
                            so they occur in the same second.

                            I see that in both exceptions, your application is
                            calling Database.put from BDBJEStorage.java:361.

                            Can I conclude that in both cases, you are calling
                            Database.put for the relations_database14269
                            database, that is named in the first exception? In
                            other words, is this the database that is accessed by
                            BDBJEStorage.java:361?
                                            for (Iterator it = record.getParents().iterator(); it.hasNext();) {
                                                Integer parent_id = (Integer) it.next();
                                                binding.objectToEntry(parent_id, valueEntry);
                                                if (cursor.getSearchBoth(keyEntry, valueEntry,
                                                        LockMode.DEFAULT) != OperationStatus.SUCCESS)
                            line 361:                        relationDB.put(txn, keyEntry, valueEntry);
                                            }
                            Is your application retrying the operation after the
                            first exception occurs, and that's why we see another
                            exception immediately afterwards at the same line
                            number?
                            No, in this method the exception is being thrown to the caller, so no immediate retry of the put operation occurs
                            • 11. Re: (JE 3.2.44) Couldn't open file
                              Greybird-Oracle
                              Eugene,

                              Thanks for answering my 2nd question about retries. Could you please answer my 1st question also:
                              Can I conclude that in both cases, you are calling
                              Database.put for the relations_database14269
                              database, that is named in the first exception? In
                              other words, is this the database that is accessed by
                              BDBJEStorage.java:361?
                              Is the database being accessed by this code: relations_database14269?

                              Thanks,
                              --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
                              • 12. Re: (JE 3.2.44) Couldn't open file
                                Greybird-Oracle
                                Eugene,
                                Could you please describe the life cycle of
                                relations_database14269?

                                this database is created when worker thread is
                                started, then worker thread creates a pool of
                                internal threads which are interacting with this
                                database, and after these threads finish their job,
                                the database is deleted using Environment's methods,
                                when next time the worker thread is invoked, it
                                clears the *.jdb files from the directory
                                If I understand what you said above, the relations DB is created only once per JE Environment. Is that correct?

                                In other words, this is the sequence:

                                1) Delete all .jdb files from directory
                                2) Open the environment (new Environment(...))
                                3) Create the relations DB (env.openDatabase)
                                4) Access the relations DB with worker threads
                                5) Close the relations DB (Database.close)
                                6) Remove the relations DB (Environment.removeDatabase)
                                7) Close the environment (Environment.close)

                                Is that correct?

                                --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                                • 13. Re: (JE 3.2.44) Couldn't open file
                                  536712
                                  Exactly!

                                  Finally we chosen to not remove database through the environment but rather just close environment and then remove files. However this did not fix the issue, and such weird errors with absent files still do occur during step 4, when the threads are working with the databases.

                                  I would assume something could be wrong with cleaner threads?
                                  • 14. Re: (JE 3.2.44) Couldn't open file
                                    Greybird-Oracle
                                    Hi Eugene,

                                    Based on the information you've given me, I suspect this is a JE bug and is an interaction between the cleaner or checkpointer and the removeDatabase method. And although the symptoms are different, I suspect it is the same problem reported here:

                                    Re: DatabaseException

                                    As I mentioned in the post above, this may take us several weeks to resolve. I have your email and I will send you a note directly when I have more information on this problem.

                                    I apologize for the inconvenience.

                                    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
                                    1 2 3 Previous Next