3 Replies Latest reply: Jun 13, 2014 7:27 AM by user733634 RSS

    Issue with DiskOrderedCursor and WriteDeferredDB 5.0.97

    5267b12c-9d06-4ac5-b3ae-02e876ffc12a

      Hi,

       

      I am using the following environment configuration:

      final EnvironmentConfig envConfig = new EnvironmentConfig();
      envConfig.setAllowCreate(true);
      envConfig.setSharedCache(true);
      envConfig.setCachePercent(10);

      envConfig.setConfigParam(EnvironmentConfig.LOG_FILE_MAX, "100000000")

      with the following DB config:

      final DatabaseConfig databaseConfig = new DatabaseConfig();
      databaseConfig.setAllowCreate(true);
      databaseConfig.setDeferredWrite(true);

       

      to create the db calling:

      environment.openDatabase(NoTransaction, databaseId, databaseConfig);

       

      Then i am simply using put() to place records in the db and using no transaction.

       

      I want to read the records back in any order as quick as possible so am using:

       

      DiskOrderedCursorConfig cursorConfig = new DiskOrderedCursorConfig();

      try (DiskOrderedCursor cursor = db.openCursor(cursorConfig)) {

      DatabaseEntry nextKey = new DatabaseEntry();
      DatabaseEntry nextData = new DatabaseEntry();
      OperationStatus status = cursor.getNext(nextKey, nextData, null);

      while (SUCCESS == status) {

      //.do some stuff with entries

      status = cursor.getNext(nextKey, nextData, null);}}

       

      The problem i am having is that i can read the first record everytime and then i get the following:

       

      com.sleepycat.je.EnvironmentFailureException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 5.0.97) F:\calculationservice\workspace\workflow-uatcrttorqt2wf3\replication java.io.FileNotFoundException: F:\calculationservice\workspace\workflow-uatcrttorqt2wf3\replication\ffffffff.jdb (The system cannot find the file specified) LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.

          at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:210)

          at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)

          at com.sleepycat.je.dbi.DiskOrderedCursorImpl.checkEnv(DiskOrderedCursorImpl.java:234)

          at com.sleepycat.je.DiskOrderedCursor.checkState(DiskOrderedCursor.java:367)

          at com.sleepycat.je.DiskOrderedCursor.getNext(DiskOrderedCursor.java:324)

          at rbccm.workflow.dependency.resiliency.replication.BerkleyReplicatedThreadStackStore.allThreadStacksFor(BerkleyReplicatedThreadStackStore.java:133)

          at rbccm.workflow.dependency.resiliency.replication.ReplicatedThreadStackRepository.restore(ReplicatedThreadStackRepository.java:180)

          at rbccm.workflow.dependency.resiliency.replication.ReplicatedThreadStackRepository.restore(ReplicatedThreadStackRepository.java:169)

          at rbccm.workflow.dependency.resiliency.replication.ReplicatedThreadStackRestarter.tryRestore(ReplicatedThreadStackRestarter.java:67)

          at rbccm.workflow.dependency.resiliency.replication.ReplicatedThreadStackRestarter.restoreAndRestart(ReplicatedThreadStackRestarter.java:60)

          at rbccm.workflow.dependency.resiliency.replication.ThreadStackReplicationControllerAndMonitor.restoreAndPauseReplication(ThreadStackReplicationControllerAndMonitor.java:47)

          at rbccm.workflow.dependency.resiliency.ThreadStackResiliencyMonitorAndController.restoreAndPauseReplication(ThreadStackResiliencyMonitorAndController.java:90)

          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

          at java.lang.reflect.Method.invoke(Method.java:601)

          at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)

          at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)

          at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235)

          at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)

          at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)

          at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)

          at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:792)

          at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1486)

          at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:96)

          at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1327)

          at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1419)

          at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:847)

          at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)

          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

          at java.lang.reflect.Method.invoke(Method.java:601)

          at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)

          at sun.rmi.transport.Transport$1.run(Transport.java:177)

          at sun.rmi.transport.Transport$1.run(Transport.java:174)

          at java.security.AccessController.doPrivileged(Native Method)

          at sun.rmi.transport.Transport.serviceCall(Transport.java:173)

          at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)

          at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)

          at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)

          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

          at java.lang.Thread.run(Thread.java:722)

          Suppressed: com.sleepycat.je.EnvironmentFailureException: (JE 5.0.97) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 5.0.97) F:\calculationservice\workspace\workflow-uatcrttorqt2wf3\replication java.io.FileNotFoundException: F:\calculationservice\workspace\workflow-uatcrttorqt2wf3\replication\ffffffff.jdb (The system cannot find the file specified) LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.

              at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:210)

              at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1594)

              at com.sleepycat.je.dbi.DiskOrderedCursorImpl.checkEnv(DiskOrderedCursorImpl.java:234)

              at com.sleepycat.je.DiskOrderedCursor.close(DiskOrderedCursor.java:214)

              at rbccm.workflow.dependency.resiliency.replication.BerkleyReplicatedThreadStackStore.allThreadStacksFor(BerkleyReplicatedThreadStackStore.java:139)

              ... 36 more

          Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 5.0.97) F:\calculationservice\workspace\workflow-uatcrttorqt2wf3\replication java.io.FileNotFoundException: F:\calculationservice\workspace\workflow-uatcrttorqt2wf3\replication\ffffffff.jdb (The system cannot find the file specified) LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.

              at com.sleepycat.je.log.LogManager.getLogEntryHandleFileNotFound(LogManager.java:920)

              at com.sleepycat.je.dbi.DiskOrderedScanner.fetchEntry(DiskOrderedScanner.java:575)

              at com.sleepycat.je.dbi.DiskOrderedScanner.fetchAndProcessLNs(DiskOrderedScanner.java:511)

              at com.sleepycat.je.dbi.DiskOrderedScanner.scan(DiskOrderedScanner.java:261)

              at com.sleepycat.je.dbi.DiskOrderedCursorImpl$1.run(DiskOrderedCursorImpl.java:129)

          Caused by: java.io.FileNotFoundException: F:\calculationservice\workspace\workflow-uatcrttorqt2wf3\replication\ffffffff.jdb (The system cannot find the file specified)

              at java.io.RandomAccessFile.open(Native Method)

              at java.io.RandomAccessFile.<init>(RandomAccessFile.java:233)

              at java.io.RandomAccessFile.<init>(RandomAccessFile.java:118)

              at com.sleepycat.je.log.FileManager$DefaultRandomAccessFile.<init>(FileManager.java:3179)

              at com.sleepycat.je.log.FileManager$6.createFile(FileManager.java:3207)

              at com.sleepycat.je.log.FileManager.openFileHandle(FileManager.java:1276)

              at com.sleepycat.je.log.FileManager.getFileHandle(FileManager.java:1148)

              at com.sleepycat.je.log.LogManager.getLogSource(LogManager.java:1199)

              at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:903)

              at com.sleepycat.je.log.LogManager.getLogEntry(LogManager.java:857)

              at com.sleepycat.je.log.LogManager.getLogEntryHandleFileNotFound(LogManager.java:918)

              ... 4 more

          [CIRCULAR REFERENCE:com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 5.0.97) F:\calculationservice\workspace\workflow-uatcrttorqt2wf3\replication java.io.FileNotFoundException: F:\calculationservice\workspace\workflow-uatcrttorqt2wf3\replication\ffffffff.jdb (The system cannot find the file specified) LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is invalid and must be closed.]

       

      This only seems to happen i have put a large number of records into the DB as sometimes it does work succesfully. When performing the cursor scan nothing else is running against the db.

       

      When I use a normal cursor with null tx and null config everything works fine regardless of the number of records..

      Its also important to note that if i do a db.sync before the cursor scan I do not get an exception.

       

      Please  could you advise?

       

      Many thanks in advance

       

      Dan

        • 1. Re: Issue with DiskOrderedCursor and WriteDeferredDB 5.0.97
          user733634

          Hi,

           

          Thanks for reporting this issue. Yes, this is indeed a bug, specific to deferred-write databases and DiscOrderedCursor. We think we know what the problem is. We will work on a fix and let you know.

           

          Yes, db.sync does indeed "fix" the problem, but it is just a rather expensive workaround.

           

          Another thing that may help is to increase the size of the queue used by the DiscOrderedCursor (the queue size can be set via DiscOrderedCursorConfig; the default is 1000). However, this will, in general, consume more memory, and since the bug occurs when the queue gets full, you may still run into it.

           

          Best,

          Markos.

          • 2. Re: Issue with DiskOrderedCursor and WriteDeferredDB 5.0.97
            5267b12c-9d06-4ac5-b3ae-02e876ffc12a

            I have seen it work when the database has only a small number of entries is this because the queue is not getting full? if i set the queue to be the same as the number of entries would this help?

            • 3. Re: Issue with DiskOrderedCursor and WriteDeferredDB 5.0.97
              user733634

              Hi again,

               

              Sorry for the late reply. We are very close to our next release deadline and things are rather busy these days.

               

              Yes, making the queue size the same as the number of records should fix the problem. But you may be consuming too much memory, depending on the number of records, their size, and also on the relative "speed" of the producer and consumer threads. Here is a rough explanation of what is going on:

               

              The producer thread is one created internally by the DiskOrderedCursor. It scans the DB and puts the records in the queue. The consumer thread is your application thread; the one that calls the getNext() method. It removes records from the queue. The producer will "immediately" put in the queue records that are already cached in memory, as long as the queue is not full. If the queue is full, it will remember the physical (on-disk) address of the record, and will read the record from disk at a later time. The problem with a deferred-write (DW) DB is that some records may have not been written at all to disk by the time the DiskOrderedCursor starts, and as a result, they have invalid disk addresses. Nevertheless , the current implementation will remember these invalid addresses (if the queue is full) and will crash when it tries to use them later.

               

              Our current simple fix is to just skip records that have never been written to disk. The fix will be in the next JE release (6.1), which should be available in about a month. Can you wait until then to receive the fix? In the mean time, you could use db.sync, or play with the queue size, or use a non-transactional, non-DW DB. Because DW can be used with any non-transactional DB, after you upgrade to JE-6.1, you can open and use this non-transactional DB in DW mode, without the need to convert any data.

               

              Notice that the above fix is compatible with the current API contract of DiskOrderedCursor, which states that the result of a DOS may be missing some records (please see the javadoc for DiskOrderedCursor). We have ideas about greatly reducing the DOS inconsistencies described in the javadoc, but I cannot say right now when those will be implemented (certainly not in 6.1).

               

              I hope this helps,

              Markos.