Forum Stats

  • 3,769,993 Users
  • 2,253,045 Discussions
  • 7,875,263 Comments

Discussions

InsufficientLogException

mayo
mayo Member Posts: 4 Red Ribbon
edited Aug 16, 2017 11:31PM in Berkeley DB Java Edition

Hi all, we are using BDB JE 6.4.9 in production environment. We bulid a ReplicationEnvironment with two node using HA mode and only the master node can provide service.

Our application runs well for all the past time before it met a network problem  that the apllication runs on master node is still alive but couldn't connect to the replica and any other host. This last about 5 minutes, during this period the replica "DesignatePrimary" itself so that it can provide service. But when the network is rebuild, we meet some problem when the original master try to re-join the group:

com.sleepycat.je.rep.InsufficientLogException: (JE 6.4.9) Environment must be closed, caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 6.4.9) logic2-10.254.81.237(1):/logic2 INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA logic2-10.254.81.237(1) Originally thrown by HA thread: REPLICA logic2-10.254.81.237(1)

        at com.sleepycat.je.rep.InsufficientLogException.wrapSelf(InsufficientLogException.java:129)

        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1673)

        at com.sleepycat.je.Environment.checkEnv(Environment.java:2486)

        at com.sleepycat.je.rep.ReplicatedEnvironment.getGroup(ReplicatedEnvironment.java:784)

        at com.tencent.hippo.broker.service.impl.DefStoreManagerService.getBrokerGroup(DefStoreManagerService.java:706)

        at com.tencent.hippo.broker.service.impl.DefaultBrokerService$HeartBeatTask.run(DefaultBrokerService.java:1536)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)

Caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 6.4.9) logic2-10.254.81.237(1):/logic2 INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA logic2-10.254.81.237(1) Originally thrown by HA thread: REPLICA logic2-10.254.81.237(1)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupLogRefresh(ReplicaFeederSyncup.java:665)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:335)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:184)

        at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:742)

        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:505)

        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:440)

        at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1746)

we search the je.info file and find something confusing. In the original master node which try to re-join to group as replica because the original replica has DesignatePrimary the je.info shows:

2017-08-09 08:02:48.320 UTC INFO [logic2-10.254.81.237] Replica loop started with master:  logic2-10.254.89.154(2)

2017-08-09 08:02:48.321 UTC INFO [logic2-10.254.81.237] Replica-feeder handshake start

2017-08-09 08:02:48.335 UTC INFO [logic2-10.254.81.237] Replica-feeder  logic2-10.254.89.154 handshake completed.

2017-08-09 08:02:48.336 UTC INFO [logic2-10.254.81.237] Replica syncup after election to verify master: logic2-10.254.89.154(2) elected master: logic2-10.254.89.154(2)

2017-08-09 08:02:48.336 UTC INFO [logic2-10.254.81.237] Replica-feeder  logic2-10.254.89.154 syncup started. Replica range: first=52,024,819,277 last=52,753,505,140 sync=52,753,505,140 txnEnd=52,753,505,140

2017-08-09 08:02:49.151 UTC INFO [logic2-10.254.81.237] matchpointVLSN of 52,753,505,132 precedes a checkpoint end, needs network restore.

2017-08-09 08:02:49.152 UTC INFO [logic2-10.254.81.237] Releasing commit block latch

2017-08-09 08:02:49.152 UTC INFO [logic2-10.254.81.237] Replica-feeder  logic2-10.254.89.154 syncup ended. Elapsed time: 816ms

2017-08-09 08:02:49.152 UTC WARNING [ logic2-10.254.81.237] Exiting inner Replica loop with exception com.sleepycat.je.rep.InsufficientLogException: (JE 6.4.9)  logic2-10.254.81.237(1):/logic2 INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed.

com.sleepycat.je.rep.InsufficientLogException: (JE 6.4.9)  logic2-10.254.81.237(1): /logic2 INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed.

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupLogRefresh(ReplicaFeederSyncup.java:665)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:335)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:184)

        at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:742)

        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:505)

        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:440)

        at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1746)

2017-08-09 08:02:49.152 UTC INFO [ logic2-10.254.81.237] Replica stats - Lag waits: 0 Lag wait time: 0ms.  VLSN waits: 0 Lag wait time: 0ms.

2017-08-09 08:02:49.153 UTC INFO [ logic2-10.254.81.237] RepNode main thread shutting down.

In the original replica which has performed  DesignatePrimary, the je.info shows:

2017-08-09 08:02:48.382 UTC INFO [logic2-10.254.89.154] Feeder accepted connection from [email protected]

2017-08-09 08:02:48.393 UTC INFO [logic2-10.254.89.154] Feeder-replica handshake start

2017-08-09 08:02:48.395 UTC INFO [logic2-10.254.89.154] Feeder-replica  logic2-10.254.81.237 handshake completed.

2017-08-09 08:02:48.395 UTC INFO [logic2-10.254.89.154] Feeder-replica  logic2-10.254.81.237 syncup started. Feeder range: first=52,025,747,787 last=52,753,526,236 sync=52,753,526,236 txnEnd=52,753,526,236

2017-08-09 08:02:49.211 UTC INFO [logic2-10.254.89.154] Feeder-replica  logic2-10.254.81.237 syncup ended. Elapsed time: 816ms

2017-08-09 08:02:49.211 UTC INFO [logic2-10.254.89.154] Matchpoint vlsn 52,753,505,132 requested by node:  logic2-10.254.81.237(1) was outside the VLSN range: [52,025,747,787-52,753,526,236]

2017-08-09 08:02:49.212 UTC INFO [logic2-10.254.89.154] Shutting down feeder for replica  logic2-10.254.81.237 Reason: Matchpoint vlsn 52,753,505,132 requested by node:  logic2-10.254.81.237(1) was outside the VLSN range: [52,025,747,787-52,753,526,236] write time:  0ms Avg write time: 9us

2017-08-09 08:02:49.212 UTC INFO [logic2-10.254.89.154] Feeder Output soft shutdown initiated.

We can find that the original master report the matchpointVLSN of 52,753,505,132 , and the original replica's Feeder range: first=52,025,747,787 last=52,753,526,236 , but in the je.info of the original replica  : Matchpoint vlsn 52,753,505,132 requested by node:  logic2-10.254.81.237(1) was outside the VLSN range: [52,025,747,787-52,753,526,236] make us confused.

Can any one provide any suggestions?

This discussion has been closed.