Forum Stats

  • 3,759,174 Users
  • 2,251,510 Discussions
  • 7,870,525 Comments

Discussions

Master fails unexpectedly due to Feeder Output Halted log file reading at file 0x334f63 offset 0x8ce

User_BZQZ5
User_BZQZ5 Member Posts: 14 Blue Ribbon
edited Dec 17, 2019 8:04AM in Berkeley DB Java Edition

Hi

We have run into another occurrence of the issue reported as part of thread https://community.oracle.com/thread/4195598 . As that discussion thread is already archived, I decided to open a new discussion thread.

We have 3-nodes BDB JE HA cluster running with BDB JE 7.4.5.

Yesterday, the master node reported EnvironmentFailureException from Feeder Output thread. The exception was reported 3 times from different classes. The stack traces are provided below:

2019-11-06 19:33:54,031 ERROR [Feeder Output for node_1] (c.s.j.r.i.RepImpl) - node_2 Halted log file reading at file 0x334f63 offset 0x8ce9e offset(decimal)=577182 prev=0x8ce6c:entry=DEL_LN_TXtype=31,version=14)prev=0x8ce6csize=27Next entry should be at 0x8cecfcom.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects.        at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)        at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)        at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)        at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)        at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)        at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)        at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)2019-11-06 19:33:54,032 ERROR [Feeder Output for node_1] (c.s.j.r.i.n.Feeder) - node_2 Unexpected exception: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000        at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)        at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)        at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)        at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)        at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)        at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)        at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)2019-11-06 19:33:54,034 ERROR [Feeder Output for node_1] (c.s.j.r.i.n.Feeder) - node_2 Uncaught exception in feeder thread Thread[Feeder Output for node_1,5,main](JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000        at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)        at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)        at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)        at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)        at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)        at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)        at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

After the above, a warning below was reported:

2019-11-06 19:33:54,034 WARN  [MASTER node_2(3)] (c.s.j.r.i.n.FeederManager) - node_2 Feeder manager unexpected interrupt

After that, the node was detached with the following exception

2019-11-06 19:33:54,055 ERROR [DETACHED node_2(3)] (c.s.j.r.i.n.RepNode) - node_2 node_2(3):/path/to/node_2/data:DETACHED node_2(3) exited unexpectedly with exception com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node3_2(3)        at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)        at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)        at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)        at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)        at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)        at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)        at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

The exception was also reported via the exception listener as below

2019-11-06 19:33:54,061 ERROR [DETACHED node_2(3)] (o.a.q.s.s.b.r.ReplicatedEnvironmentFacade) - Asynchronous exception thrown by BDB thread 'DETACHED node_2(3)'com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.        at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)        at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)        at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)        at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)        at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)        at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)        at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)        at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)        at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)        at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

The same exception was reported again via exception listener from a cleaner thread

2019-11-06 19:33:54,094 ERROR [Cleaner-1] (o.a.q.s.s.b.r.ReplicatedEnvironmentFacade) - Asynchronous exception thrown by BDB thread 'Cleaner-1'com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.        at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:459)        at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:419)        at com.sleepycat.je.log.LogBufferPool.bumpAndWriteDirty(LogBufferPool.java:373)        at com.sleepycat.je.log.LogBufferPool.getWriteBuffer(LogBufferPool.java:320)        at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:651)        at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:476)        at com.sleepycat.je.log.LogManager.logItem(LogManager.java:423)        at com.sleepycat.je.log.LogManager.log(LogManager.java:338)        at com.sleepycat.je.tree.LN.logInternal(LN.java:716)        at com.sleepycat.je.tree.LN.optionalLog(LN.java:419)        at com.sleepycat.je.dbi.CursorImpl.updateRecordInternal(CursorImpl.java:1677)        at com.sleepycat.je.dbi.CursorImpl.updateCurrentRecord(CursorImpl.java:1085)        at com.sleepycat.je.dbi.DbTree$RewriteMapLN.doWork(DbTree.java:847)        at com.sleepycat.je.tree.Tree.withRootLatchedExclusive(Tree.java:513)        at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:815)        at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:737)        at com.sleepycat.je.cleaner.UtilizationProfile.removePerDbMetadata(UtilizationProfile.java:355)        at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:893)        at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)        at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)        at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)        at java.lang.Thread.run(Thread.java:748)Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.        at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)        at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)        at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)        at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)        at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)        at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)        at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)        at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)        at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)        at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

The application commits BDB JE transactions  using com.sleepycat.je.Transaction#commit(com.sleepycat.je.Durability) (where durability is configured as new Durability( SyncPolicy.NO_SYNC,  SyncPolicy.NO_SYNC, ReplicaAckPolicy.SIMPLE_MAJORITY))

The transaction syncing to disk is performed in a separate thread by invoking com.sleepycat.je.Environment#flushLog(true) (Essentially, after transaction commit the application is awaiting for #flushLog(true) to complete for multiple parallel transactions committed with local policy SyncPolicy.NO_SYNC ).  The syncing policy SyncPolicy.NO_SYNC is used intentionally as local syncing policy, in order to aggregate the syncing to disk of multiple parallel transactions using Environment#flushLog for performance optimization.

Whilst invoking com.sleepycat.je.Environment#flushLog(true), the EnvironmentFailureException was re-thrown again:

2019-11-06 19:33:54,094 ERROR [Commit-Thread-nodes] (o.a.q.s.u.ServerScopedRuntimeException) - Exception whilst syncing data to diskcom.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.        at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)        at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:459)        at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:419)        at com.sleepycat.je.log.LogManager.flushInternal(LogManager.java:1270)        at com.sleepycat.je.log.LogManager.flushSync(LogManager.java:1210)        at com.sleepycat.je.dbi.EnvironmentImpl.flushLog(EnvironmentImpl.java:2265)        at com.sleepycat.je.Environment.flushLog(Environment.java:1638)        at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.flushLog(ReplicatedEnvironmentFacade.java:746)        at org.apache.qpid.server.store.berkeleydb.CoalescingCommiter$CommitThread.processJobs(CoalescingCommiter.java:201)        at org.apache.qpid.server.store.berkeleydb.CoalescingCommiter$CommitThread.run(CoalescingCommiter.java:180)Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.        at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)        at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)        at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)        at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)        at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)        at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)        at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)        at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)        at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)        at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)        at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

Usually, on running into BDB  JE issues reported with EnvironmentFailureException the application tries to recover from the problem by recreating the Environment instance and re-joining the new environment instance to the group, though, the exception thrown from EnvironmentImpl.flushLog are treated as terminal and cause JVM halt.

Potentially, the code can be changed to try to recreate the Environment even on exceptions thrown from EnvironmentImpl.flushLog, though that might results in not exactly right reaction on critical system issues like running out of dusk space or disk failures. I am not sure that Environment recreation would be a feasible approach to work-around the issue.

What could be the best way to work around the reported problem?

Can upgrade to BDB JE version 7.5.11 help to mitigate  the issue?

If more details about the issue is required, I can try to provide them. Though, the BDB JE files have gone after manual restart of the application. The restart recovers BDB JE environment.

Kind Regards,

Alex

Answers

  • Laurenfoutz-Oracle
    Laurenfoutz-Oracle Member Posts: 310
    edited Dec 4, 2019 3:59PM

    Sorry for the late response to this.

    Upgrading BDB JE to the latest version is always recommended.  What I believe is happening here is that since you are using a NO_SYNC policy and flushing the logs in a separate thread, you are hitting a rare race condition where the VLSN the thread is looking for has moved out of the in-memory VLSN cache, but has not been flushed to disk yet.  So when another thread goes looking for it in the logs it fails since the log is not on disk yet.  Improvements to the VLSN caching since 7.5.11 should hopefully prevent this issue in the future.

    As for why multiple threads report the issue.  Environment failures exceptions are actually propagate to all BDB JE threads to force them to shut down, even though only the FeederOutput thread actually experience the exception.

    If upgrading does not solve the issue, save all the je.info.# files before you restart the node and post them to this thread.  That will give me more information on what is causing the issue.

    Lauren Foutz

  • User_BZQZ5
    User_BZQZ5 Member Posts: 14 Blue Ribbon
    edited Dec 17, 2019 8:04AM

    Hi Lauren,

    Thanks a lot for the reply and suggestion to upgrade.

    I actually tried to upgrade, but, I ran into an issue with 7.5.11 reported in https://community.oracle.com/thread/4303487

    When number of bdb je files drops to one, the 7.5.11 environment cannot start. I've seen this issue happens only on upgrade from older version to newer one.  I've seen another thread Issue Upgrading JE 7.4.5 to JE 7.5.11 reporting the same problem.

    I could not reproduce it when environment is created from scratch using version 7.5.11. However, I do not know in what other circumstances the number of BDB JE files can reduce to one file.

    Do you know whether it is an upgrade only issue?

    If it is an upgrade only issue I can potentially work-around the problem. However, it the issue can  manifest on other conditions, the upgrade to 7.5.11 does not look like a safe option to me.

    Kind Regards,

    Alex

  • User_KPX6C
    User_KPX6C Member Posts: 1 Green Ribbon
    edited Aug 3, 2021 9:16AM

    Hi Alex:

    I have the some problem that you have met and I still not find an appropriate method to solve it.By th way ,did you solve this question eventually ? Or are there any followings after your last edited answer ? I'm really interested in this question and would like to hear from you.

    I tried the solution of QPID-8500 and found it useless.

    Looking forward to your reply .

    Best regards,

    Jackson