Skip to Main Content

Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

InsufficientLogException

mayoAug 16 2017 — edited Aug 16 2017

Hi all, we are using BDB JE 6.4.9 in production environment. We bulid a ReplicationEnvironment with two node using HA mode and only the master node can provide service.

Our application runs well for all the past time before it met a network problem  that the apllication runs on master node is still alive but couldn't connect to the replica and any other host. This last about 5 minutes, during this period the replica "DesignatePrimary" itself so that it can provide service. But when the network is rebuild, we meet some problem when the original master try to re-join the group:

com.sleepycat.je.rep.InsufficientLogException: (JE 6.4.9) Environment must be closed, caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 6.4.9) logic2-10.254.81.237(1):/logic2 INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA logic2-10.254.81.237(1) Originally thrown by HA thread: REPLICA logic2-10.254.81.237(1)

        at com.sleepycat.je.rep.InsufficientLogException.wrapSelf(InsufficientLogException.java:129)

        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1673)

        at com.sleepycat.je.Environment.checkEnv(Environment.java:2486)

        at com.sleepycat.je.rep.ReplicatedEnvironment.getGroup(ReplicatedEnvironment.java:784)

        at com.tencent.hippo.broker.service.impl.DefStoreManagerService.getBrokerGroup(DefStoreManagerService.java:706)

        at com.tencent.hippo.broker.service.impl.DefaultBrokerService$HeartBeatTask.run(DefaultBrokerService.java:1536)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)

Caused by: com.sleepycat.je.rep.InsufficientLogException: Environment invalid because of previous exception: (JE 6.4.9) logic2-10.254.81.237(1):/logic2 INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed. Originally thrown by HA thread: REPLICA logic2-10.254.81.237(1) Originally thrown by HA thread: REPLICA logic2-10.254.81.237(1)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupLogRefresh(ReplicaFeederSyncup.java:665)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:335)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:184)

        at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:742)

        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:505)

        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:440)

        at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1746)

we search the je.info file and find something confusing. In the original master node which try to re-join to group as replica because the original replica has DesignatePrimary the je.info shows:

2017-08-09 08:02:48.320 UTC INFO [logic2-10.254.81.237] Replica loop started with master:  logic2-10.254.89.154(2)

2017-08-09 08:02:48.321 UTC INFO [logic2-10.254.81.237] Replica-feeder handshake start

2017-08-09 08:02:48.335 UTC INFO [logic2-10.254.81.237] Replica-feeder  logic2-10.254.89.154 handshake completed.

2017-08-09 08:02:48.336 UTC INFO [logic2-10.254.81.237] Replica syncup after election to verify master: logic2-10.254.89.154(2) elected master: logic2-10.254.89.154(2)

2017-08-09 08:02:48.336 UTC INFO [logic2-10.254.81.237] Replica-feeder  logic2-10.254.89.154 syncup started. Replica range: first=52,024,819,277 last=52,753,505,140 sync=52,753,505,140 txnEnd=52,753,505,140

2017-08-09 08:02:49.151 UTC INFO [logic2-10.254.81.237] matchpointVLSN of 52,753,505,132 precedes a checkpoint end, needs network restore.

2017-08-09 08:02:49.152 UTC INFO [logic2-10.254.81.237] Releasing commit block latch

2017-08-09 08:02:49.152 UTC INFO [logic2-10.254.81.237] Replica-feeder  logic2-10.254.89.154 syncup ended. Elapsed time: 816ms

2017-08-09 08:02:49.152 UTC WARNING [ logic2-10.254.81.237] Exiting inner Replica loop with exception com.sleepycat.je.rep.InsufficientLogException: (JE 6.4.9)  logic2-10.254.81.237(1):/logic2 INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed.

com.sleepycat.je.rep.InsufficientLogException: (JE 6.4.9)  logic2-10.254.81.237(1): /logic2 INSUFFICIENT_LOG: Log files at this node are obsolete. Environment is invalid and must be closed.

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.setupLogRefresh(ReplicaFeederSyncup.java:665)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.verifyRollback(ReplicaFeederSyncup.java:335)

        at com.sleepycat.je.rep.stream.ReplicaFeederSyncup.execute(ReplicaFeederSyncup.java:184)

        at com.sleepycat.je.rep.impl.node.Replica.initReplicaLoop(Replica.java:742)

        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoopInternal(Replica.java:505)

        at com.sleepycat.je.rep.impl.node.Replica.runReplicaLoop(Replica.java:440)

        at com.sleepycat.je.rep.impl.node.RepNode.run(RepNode.java:1746)

2017-08-09 08:02:49.152 UTC INFO [ logic2-10.254.81.237] Replica stats - Lag waits: 0 Lag wait time: 0ms.  VLSN waits: 0 Lag wait time: 0ms.

2017-08-09 08:02:49.153 UTC INFO [ logic2-10.254.81.237] RepNode main thread shutting down.

In the original replica which has performed  DesignatePrimary, the je.info shows:

2017-08-09 08:02:48.382 UTC INFO [logic2-10.254.89.154] Feeder accepted connection from com.sleepycat.je.rep.utilint.net.SimpleDataChannel@7761796a

2017-08-09 08:02:48.393 UTC INFO [logic2-10.254.89.154] Feeder-replica handshake start

2017-08-09 08:02:48.395 UTC INFO [logic2-10.254.89.154] Feeder-replica  logic2-10.254.81.237 handshake completed.

2017-08-09 08:02:48.395 UTC INFO [logic2-10.254.89.154] Feeder-replica  logic2-10.254.81.237 syncup started. Feeder range: first=52,025,747,787 last=52,753,526,236 sync=52,753,526,236 txnEnd=52,753,526,236

2017-08-09 08:02:49.211 UTC INFO [logic2-10.254.89.154] Feeder-replica  logic2-10.254.81.237 syncup ended. Elapsed time: 816ms

2017-08-09 08:02:49.211 UTC INFO [logic2-10.254.89.154] Matchpoint vlsn 52,753,505,132 requested by node:  logic2-10.254.81.237(1) was outside the VLSN range: [52,025,747,787-52,753,526,236]

2017-08-09 08:02:49.212 UTC INFO [logic2-10.254.89.154] Shutting down feeder for replica  logic2-10.254.81.237 Reason: Matchpoint vlsn 52,753,505,132 requested by node:  logic2-10.254.81.237(1) was outside the VLSN range: [52,025,747,787-52,753,526,236] write time:  0ms Avg write time: 9us

2017-08-09 08:02:49.212 UTC INFO [logic2-10.254.89.154] Feeder Output soft shutdown initiated.

We can find that the original master report the matchpointVLSN of 52,753,505,132 , and the original replica's Feeder range: first=52,025,747,787 last=52,753,526,236 , but in the je.info of the original replica  : Matchpoint vlsn 52,753,505,132 requested by node:  logic2-10.254.81.237(1) was outside the VLSN range: [52,025,747,787-52,753,526,236] make us confused.

Can any one provide any suggestions?

Comments

Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Sep 13 2017
Added on Aug 16 2017
0 comments
381 views