6 Replies Latest reply on Feb 16, 2019 12:46 AM by Greybird-Oracle

    Master node in 3-node BDB JE HA cluster unexpectedly fails with: Halted log file reading at file 0x2c51 offset 0x34b793 offset(decimal)=3454867 prev=0x34b4f3

    940250

      Hi,

      As per subject the master node in 3-node BDB JE HA group had failed with the following:

       

      2019-01-14 19:07:19,296 ERROR [Feeder Output for node_3] (c.s.j.r.i.RepImpl) - [node_2] Halted log file reading at file 0x2c51 offset 0x34b793 offset(decimal)=3454867 prev=0x34b4f3:

      entry=INS_LN_TXtype=33,version=8)

      prev=0x34b4f3

      size=3883

      Next entry should be at 0x34c6d4

      com.sleepycat.je.EnvironmentFailureException: (JE 5.0.104) want to read 151,849,824 but reader at 151,849,829 UNEXPECTED_STATE: Unexpected internal state, may have side effects.

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:426)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:288)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:309)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:297)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:229)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:272)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:64)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:761)

       

      2019-01-14 19:07:19,298 ERROR [Feeder Output for node_3] (c.s.j.r.i.n.Feeder) - [node_2] Unexpected exception: (JE 5.0.104) want to read 151,849,824 but reader at 151,849,829 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=151,849,824 waitTime=1000com.sleepycat.je.EnvironmentFailureException: (JE 5.0.104) want to read 151,849,824 but reader at 151,849,829 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=151,849,824 waitTime=1000

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:426)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:288)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:309)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:297)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:229)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:272)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:64)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:761)

       

      2019-01-14 19:07:19,299 ERROR [Feeder Output for node_3] (c.s.j.r.i.n.Feeder) - [node_2] Uncaught exception in feeder thread Thread[Feeder Output for node_3,5,main](JE 5.0.104) want to read 151,849,824 but reader at 151,849,829 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=151,849,824 waitTime=1000com.sleepycat.je.EnvironmentFailureException: (JE 5.0.104) want to read 151,849,824 but reader at 151,849,829 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=151,849,824 waitTime=1000

              at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:426)

              at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:288)

              at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:309)

              at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:297)

              at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:229)

              at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:272)

              at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:64)

              at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:761)

       

       

      It looks like master node "node_2"  failed to read data from the disk in order to send it over to "node_3".

       

      The master node detached after that. The impacted environment was restarted without any issue and was able to operate as Replica.

       

      As per logs the BDB JE version in use is 5.0.104.

       

      What could be root cause of the issue with unexpected read offset ( want to read 151,849,824 but reader at 151,849,829)? Is it an existing defect or some sort of environmental issue?

      If it is a defect, would upgrade to the latest 7.x version can solve the problem?

       

      Kind Regards,

      Alex

       

      Message was edited by: 940250

        • 1. Re: Master node in 3-node BDB JE HA cluster unexpectedly fails with: Halted log file reading at file 0x2c51 offset 0x34b793 offset(decimal)=3454867 prev=0x34b4f3
          3886863

          We are running on 7.4.5 version of JE and still saw this issue in our production environment on Friday. I double if version upgrade is gonna solve this issue.

          • 2. Re: Master node in 3-node BDB JE HA cluster unexpectedly fails with: Halted log file reading at file 0x2c51 offset 0x34b793 offset(decimal)=3454867 prev=0x34b4f3
            Greybird-Oracle

            70b2bf78-1fd8-45d0-85ca-769a33cd0c35, could you please post the full stack trace that you're seeing in JE 7.4.5? It may provide a clue if we compare it to the stack trace of the OP.

             

            I've searched our bug reports and elsewhere, and these two reports are the only reports we've seen of this particular problem.

             

            --mark

            • 3. Re: Master node in 3-node BDB JE HA cluster unexpectedly fails with: Halted log file reading at file 0x2c51 offset 0x34b793 offset(decimal)=3454867 prev=0x34b4f3
              Greybird-Oracle

              Alex / 940250,

               

              If you want to keep pursuing this, there is some information you could dump from the jdb files where the problem occurred, that might help us to narrow this down. Do you still have the 0x2c51 file (00002c51.jdb)?

               

              If so, please run the following command and direct output to a file (it will be large).

               

              java -jar DbPrintLog -h <DIR-WITH-JDB-FILE> -s 0x2c51 -e 0x2c52 > outfile

               

              In the outfile there will be lines that look like this:

               

              <entry lsn="0x2c51/0x115" ...lots of data here.../>

               

              The lsn is the file number / offset of the entry. Look for an entry with offset 0x34b793, or at close to that as you can find, and please post several entries before and after it.

               

              You'll see VLSNs in the output as well and we'll be looking for the 151,849,824 and 151,849,829 VLSN values. One of these VLSNs should be in the entries that I asked for above.

               

              Looking at this should tell us whether the information in the file itself is corrupted, or the problem is elsewhere.

               

              Thanks,

              --mark

              • 4. Re: Master node in 3-node BDB JE HA cluster unexpectedly fails with: Halted log file reading at file 0x2c51 offset 0x34b793 offset(decimal)=3454867 prev=0x34b4f3
                940250

                Hi Mark,

                The jdb file has gone.

                I will try to collect the jdb files on next issue re-occurrence. It happens once or twice per month.

                 

                Kind Regards,

                Alex

                • 5. Re: Master node in 3-node BDB JE HA cluster unexpectedly fails with: Halted log file reading at file 0x2c51 offset 0x34b793 offset(decimal)=3454867 prev=0x34b4f3
                  3886863

                  Hello Mark,

                  Below is he entire stack trace of the server. I wanted to try the steps you mentioned above but the JDB files are gone for me as well. I will do that next time it happens.

                  2019-02-08 15:43:27.774 UTC SEVERE [tds5_B] Halted log file reading at file 0x6028b9 offset 0x7fb73e offset(decimal)=8369982 prev=0x7fa621:

                  entry=UPD_LN_TXtype=35,version=14)

                  prev=0x7fa621

                  size=223

                  Next entry should be at 0x7fb833

                  com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 1,658,315,645 but reader at 1,658,315,647 UNEXPECTED_STATE: Unexpected internal state, may have side effects.

                          at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)

                          at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)

                          at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)

                          at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)

                          at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)

                          at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)

                          at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)

                          at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)

                          at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

                   

                  2019-02-08 15:43:27.822 UTC SEVERE [tds5_B] checkpointId=1658750

                  com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) tds5_B(6): com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 1,658,315,645 but reader at 1,658,315,647 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=1,658,315,645 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for tds5_P,5,main] Originally thrown by HA thread: MASTER tds5_B(6) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.

                          at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)

                          at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)

                          at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:669)

                          at com.sleepycat.je.cleaner.Cleaner.updateFilesAtCheckpointEnd(Cleaner.java:1022)

                          at com.sleepycat.je.recovery.Checkpointer.doCheckpoint(Checkpointer.java:910)

                          at com.sleepycat.je.recovery.Checkpointer.onWakeup(Checkpointer.java:559)

                          at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

                          at java.lang.Thread.run(Thread.java:748)

                  Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) tds5_B(6):tds5_B/bdb com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 1,658,315,645 but reader at 1,658,315,647 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=1,658,315,645 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for tds5_P,5,main] Originally thrown by HA thread: MASTER tds5_B(6) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.

                          at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)

                          at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)

                          at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)

                  Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 1,658,315,645 but reader at 1,658,315,647 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=1,658,315,645 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for tds5_P,5,main] Originally thrown by HA thread: MASTER tds5_B(6)

                          at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)

                          at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)

                          at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)

                          at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)

                          at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)

                          at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)

                          at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)

                          at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)

                          at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)

                  • 6. Re: Master node in 3-node BDB JE HA cluster unexpectedly fails with: Halted log file reading at file 0x2c51 offset 0x34b793 offset(decimal)=3454867 prev=0x34b4f3
                    Greybird-Oracle

                    Thanks for this additional info. It is interesting that the VLSNs are only 2 apart, rather then 4 apart in the other case.