1 Reply Latest reply on Mar 23, 2019 4:16 PM by Greybird-Oracle

    Master node in 2-node BDB JE HA cluster unexpectedly shutting down feeder for replica with : " Reason: Expected bytes: 6 read bytes: 0 write time:  0ms Avg write time: 26us"

    mayo

      Hi,

      Mater node failed with this following :

       

      2019-03-21 13:48:06.603 UTC INFO [broker-logic1-10.240.128.66] Shutting down feeder for replica broker-logic2-10.240.136.211 Reason: Expected bytes: 6 read bytes: 0 write time:  668ms Avg write time: 5us

      2019-03-21 13:48:06.604 UTC INFO [broker-logic1-10.240.128.66] Feeder Output for broker-logic2-10.240.136.211 soft shutdown initiated.

      2019-03-21 13:48:10.604 UTC WARNING [broker-logic1-10.240.128.66] Soft shutdown failed for thread:Thread[Feeder Output for broker-logic2-10.240.136.211,5,main] after waiting for 4000ms resorting to interrupt.

      2019-03-21 13:48:18.604 UTC SEVERE [broker-logic1-10.240.128.66] Thread[Feeder Output for broker-logic2-10.240.136.211,5,main] shutdown via interrupt FAILED. Thread still alive despite waiting for 8000ms.

      ...

      2019-03-21 13:48:19.047 UTC INFO [broker-logic1-10.240.128.66] Releasing commit block latch

      2019-03-21 13:48:19.075 UTC SEVERE [broker-logic1-10.240.128.66]

      com.sleepycat.je.EnvironmentWedgedException: (JE 7.5.11) Environment must be closed, caused by: com.sleepycat.je.EnvironmentWedgedException: Environment invalid because of previous exception: (JE 7.5.11) broker-logic1-10.240.128.66(1):/stage/broker-logic1 Thread[Feeder Output for broker-logic2-10.240.136.211,5,main] shutdown via interrupt FAILED. Thread still alive despite waiting for 8000ms. WEDGED: An internal thread could not be stopped. The current process must be shut down and restarted before re-opening the Environment. A full thread dump has been logged. Environment is invalid and must be closed.

        at com.sleepycat.je.EnvironmentWedgedException.wrapSelf(EnvironmentWedgedException.java:77)

        at com.sleepycat.je.EnvironmentWedgedException.wrapSelf(EnvironmentWedgedException.java:49)

        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1750)

        at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:475)

        at com.sleepycat.je.log.LogManager.logItem(LogManager.java:425)

        at com.sleepycat.je.log.LogManager.log(LogManager.java:340)

        at com.sleepycat.je.tree.LN.logInternal(LN.java:716)

        at com.sleepycat.je.tree.LN.log(LN.java:448)

        at com.sleepycat.je.cleaner.FileProcessor.processFoundLN(FileProcessor.java:1448)

        at com.sleepycat.je.cleaner.FileProcessor.processLN(FileProcessor.java:1170)

        at com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:995)

        at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:489)

        at com.sleepycat.je.cleaner.Cleaner.doClean(Cleaner.java:670)

        at com.sleepycat.je.dbi.EnvironmentImpl.invokeCleaner(EnvironmentImpl.java:2299)

        at com.sleepycat.je.Environment.cleanLogFile(Environment.java:1839)

        at com.tencent.hippo.broker.service.impl.DefStoreManagerService$CleanLogFileTask.run(DefStoreManagerService.java:956)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

        at java.lang.Thread.run(Thread.java:748)

        Caused by: com.sleepycat.je.EnvironmentWedgedException: Environment invalid because of previous exception: (JE 7.5.11) broker-logic1-10.240.128.66(1):/stage/broker-logic1 Thread[Feeder Output for broker-logic2-10.240.136.211,5,main] shutdown via interrupt FAILED. Thread still alive despite waiting for 8000ms. WEDGED: An internal thread could not be stopped. The current process must be shut down and restarted before re-opening the Environment. A full thread dump has been logged. Environment is invalid and must be closed.

        at com.sleepycat.je.utilint.StoppableThread.shutdownThread(StoppableThread.java:386)

        at com.sleepycat.je.rep.impl.node.Feeder.shutdown(Feeder.java:570)

        at com.sleepycat.je.rep.impl.node.Feeder$InputThread.run(Feeder.java:762)

       

       

      It's looks like there are  two problem:

      1. master node broker-logic1-10.240.128.66  want to read 6 bytes but read 0 bytes actually

      2. the feeder want to shutdown but failed.

       

      What could  cause "Expected bytes: 6 read bytes: 0" , is there any way to solve this problem ?

       

      Best Regards,

      Mayo

        • 1. Re: Master node in 2-node BDB JE HA cluster unexpectedly shutting down feeder for replica with : " Reason: Expected bytes: 6 read bytes: 0 write time:  0ms Avg write time: 26us"
          Greybird-Oracle

          > 2019-03-21 13:48:06.603 UTC INFO [broker-logic1-10.240.128.66] Shutting down feeder for replica broker-logic2-10.240.136.211 Reason: Expected bytes: 6 read bytes: 0 write time:  668ms Avg write time: 5us

           

          This means the replica connection was closed, it is not a error for this node. Check the replica's log.

           

          >   Caused by: com.sleepycat.je.EnvironmentWedgedException: Environment invalid because of previous exception: (JE 7.5.11) broker-logic1-10.240.128.66(1):/stage/broker-logic1 Thread[Feeder Output for broker-logic2-10.240.136.211,5,main] shutdown via interrupt FAILED. Thread still alive despite waiting for 8000ms. WEDGED: An internal thread could not be stopped. The current process must be shut down and restarted before re-opening the Environment. A full thread dump has been logged. Environment is invalid and must be closed.

           

          As the message mentions, a full thread dump was logged that may provide more insight.

           

          --mark