2 Replies Latest reply on Dec 17, 2019 1:04 PM by 940250

    Master fails unexpectedly due to Feeder Output Halted log file reading at file 0x334f63 offset 0x8ce9e offset(decimal)=577182 prev=0x8ce6c

    940250

      Hi

      We have run into another occurrence of the issue reported as part of thread https://community.oracle.com/thread/4195598 . As that discussion thread is already archived, I decided to open a new discussion thread.

       

      We have 3-nodes BDB JE HA cluster running with BDB JE 7.4.5.

       

      Yesterday, the master node reported EnvironmentFailureException from Feeder Output thread. The exception was reported 3 times from different classes. The stack traces are provided below:

      2019-11-06 19:33:54,031 ERROR [Feeder Output for node_1] (c.s.j.r.i.RepImpl) - node_2 Halted log file reading at file 0x334f63 offset 0x8ce9e offset(decimal)=577182 prev=0x8ce6c:

      entry=DEL_LN_TXtype=31,version=14)

      prev=0x8ce6c

      size=27

      Next entry should be at 0x8cecf

      com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects.

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

       

       

      2019-11-06 19:33:54,032 ERROR [Feeder Output for node_1] (c.s.j.r.i.n.Feeder) - node_2 Unexpected exception: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

       

       

      2019-11-06 19:33:54,034 ERROR [Feeder Output for node_1] (c.s.j.r.i.n.Feeder) - node_2 Uncaught exception in feeder thread Thread[Feeder Output for node_1,5,main](JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

      After the above, a warning below was reported:

      2019-11-06 19:33:54,034 WARN  [MASTER node_2(3)] (c.s.j.r.i.n.FeederManager) - node_2 Feeder manager unexpected interrupt

      After that, the node was detached with the following exception

      2019-11-06 19:33:54,055 ERROR [DETACHED node_2(3)] (c.s.j.r.i.n.RepNode) - node_2 node_2(3):/path/to/node_2/data:DETACHED node_2(3) exited unexpectedly with exception com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node3_2(3)

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

      The exception was also reported via the exception listener as below

      2019-11-06 19:33:54,061 ERROR [DETACHED node_2(3)] (o.a.q.s.s.b.r.ReplicatedEnvironmentFacade) - Asynchronous exception thrown by BDB thread 'DETACHED node_2(3)'

      com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.

              at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)

              at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)

              at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)

      Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

       

      The same exception was reported again via exception listener from a cleaner thread

      2019-11-06 19:33:54,094 ERROR [Cleaner-1] (o.a.q.s.s.b.r.ReplicatedEnvironmentFacade) - Asynchronous exception thrown by BDB thread 'Cleaner-1'

      com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.

              at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)

              at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)

              at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:459)

              at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:419)

              at com.sleepycat.je.log.LogBufferPool.bumpAndWriteDirty(LogBufferPool.java:373)

              at com.sleepycat.je.log.LogBufferPool.getWriteBuffer(LogBufferPool.java:320)

              at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:651)

              at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:476)

              at com.sleepycat.je.log.LogManager.logItem(LogManager.java:423)

              at com.sleepycat.je.log.LogManager.log(LogManager.java:338)

              at com.sleepycat.je.tree.LN.logInternal(LN.java:716)

              at com.sleepycat.je.tree.LN.optionalLog(LN.java:419)

              at com.sleepycat.je.dbi.CursorImpl.updateRecordInternal(CursorImpl.java:1677)

              at com.sleepycat.je.dbi.CursorImpl.updateCurrentRecord(CursorImpl.java:1085)

              at com.sleepycat.je.dbi.DbTree$RewriteMapLN.doWork(DbTree.java:847)

              at com.sleepycat.je.tree.Tree.withRootLatchedExclusive(Tree.java:513)

              at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:815)

              at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:737)

              at com.sleepycat.je.cleaner.UtilizationProfile.removePerDbMetadata(UtilizationProfile.java:355)

              at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:893)

              at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)

              at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

              at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

              at java.lang.Thread.run(Thread.java:748)

      Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.

              at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)

              at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)

              at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)

      Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

       

       

      The application commits BDB JE transactions  using com.sleepycat.je.Transaction#commit(com.sleepycat.je.Durability) (where durability is configured as new Durability( SyncPolicy.NO_SYNC,  SyncPolicy.NO_SYNC, ReplicaAckPolicy.SIMPLE_MAJORITY))

      The transaction syncing to disk is performed in a separate thread by invoking com.sleepycat.je.Environment#flushLog(true) (Essentially, after transaction commit the application is awaiting for #flushLog(true) to complete for multiple parallel transactions committed with local policy SyncPolicy.NO_SYNC ).  The syncing policy SyncPolicy.NO_SYNC is used intentionally as local syncing policy, in order to aggregate the syncing to disk of multiple parallel transactions using Environment#flushLog for performance optimization.

       

      Whilst invoking com.sleepycat.je.Environment#flushLog(true), the EnvironmentFailureException was re-thrown again:

      2019-11-06 19:33:54,094 ERROR [Commit-Thread-nodes] (o.a.q.s.u.ServerScopedRuntimeException) - Exception whilst syncing data to disk

      com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.

              at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)

              at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)

              at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:459)

              at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:419)

              at com.sleepycat.je.log.LogManager.flushInternal(LogManager.java:1270)

              at com.sleepycat.je.log.LogManager.flushSync(LogManager.java:1210)

              at com.sleepycat.je.dbi.EnvironmentImpl.flushLog(EnvironmentImpl.java:2265)

              at com.sleepycat.je.Environment.flushLog(Environment.java:1638)

              at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.flushLog(ReplicatedEnvironmentFacade.java:746)

              at org.apache.qpid.server.store.berkeleydb.CoalescingCommiter$CommitThread.processJobs(CoalescingCommiter.java:201)

              at org.apache.qpid.server.store.berkeleydb.CoalescingCommiter$CommitThread.run(CoalescingCommiter.java:180)

      Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.

              at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)

              at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)

              at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)

      Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

       

       

      Usually, on running into BDB  JE issues reported with EnvironmentFailureException the application tries to recover from the problem by recreating the Environment instance and re-joining the new environment instance to the group, though, the exception thrown from EnvironmentImpl.flushLog are treated as terminal and cause JVM halt.

       

      Potentially, the code can be changed to try to recreate the Environment even on exceptions thrown from EnvironmentImpl.flushLog, though that might results in not exactly right reaction on critical system issues like running out of dusk space or disk failures. I am not sure that Environment recreation would be a feasible approach to work-around the issue.

       

      What could be the best way to work around the reported problem?

      Can upgrade to BDB JE version 7.5.11 help to mitigate  the issue?

       

      If more details about the issue is required, I can try to provide them. Though, the BDB JE files have gone after manual restart of the application. The restart recovers BDB JE environment.

       

      Kind Regards,

      Alex

        • 1. Re: Master fails unexpectedly due to Feeder Output Halted log file reading at file 0x334f63 offset 0x8ce9e offset(decimal)=577182 prev=0x8ce6c
          Laurenfoutz-Oracle

          Sorry for the late response to this.

           

          Upgrading BDB JE to the latest version is always recommended.  What I believe is happening here is that since you are using a NO_SYNC policy and flushing the logs in a separate thread, you are hitting a rare race condition where the VLSN the thread is looking for has moved out of the in-memory VLSN cache, but has not been flushed to disk yet.  So when another thread goes looking for it in the logs it fails since the log is not on disk yet.  Improvements to the VLSN caching since 7.5.11 should hopefully prevent this issue in the future.

           

          As for why multiple threads report the issue.  Environment failures exceptions are actually propagate to all BDB JE threads to force them to shut down, even though only the FeederOutput thread actually experience the exception.

           

          If upgrading does not solve the issue, save all the je.info.# files before you restart the node and post them to this thread.  That will give me more information on what is causing the issue.

           

          Lauren Foutz

          • 2. Re: Master fails unexpectedly due to Feeder Output Halted log file reading at file 0x334f63 offset 0x8ce9e offset(decimal)=577182 prev=0x8ce6c
            940250

            Hi Lauren,

             

            Thanks a lot for the reply and suggestion to upgrade.

             

            I actually tried to upgrade, but, I ran into an issue with 7.5.11 reported in https://community.oracle.com/thread/4303487

            When number of bdb je files drops to one, the 7.5.11 environment cannot start. I've seen this issue happens only on upgrade from older version to newer one.  I've seen another thread Issue Upgrading JE 7.4.5 to JE 7.5.11 reporting the same problem.

             

            I could not reproduce it when environment is created from scratch using version 7.5.11. However, I do not know in what other circumstances the number of BDB JE files can reduce to one file.

             

            Do you know whether it is an upgrade only issue?

             

            If it is an upgrade only issue I can potentially work-around the problem. However, it the issue can  manifest on other conditions, the upgrade to 7.5.11 does not look like a safe option to me.

             

            Kind Regards,

            Alex