3 Replies Latest reply: Jun 11, 2012 4:34 PM by Linda Lee-Oracle RSS

    InsufficientAcksException is thrown on commit by designated primary master

    940250
      Hi,

      I have a 2 node cluster where Master node is a designated primary. I am using je 5.0.48, durability is SYNC,NO_SYNC,SIMPLE_MAJORITY.
      When replica become unavailable due to various reasons (killing, network split, etc) and transaction is committed at the same time, an exception InsufficientAcksException is thrown from Transaction#commit.

      I understand the reasons why the transaction commit throws this exception in multiple node case or 2 node case where master is not designated primary:
      the master in a replicated environment did not receive enough replica acknowledgments, although the commit succeeded locally.

      However, I was surprised to see this exception thrown in 2 node case with a Master designated as a primary.

      Is it a bug or an intended behavior?

      IMHO, InsufficientAcksException should not be thrown on replica unavailability on Transaction#commit as Master should become an active primary and should be able to continue working having transaction committed locally.

      In order to deal with this situation in my application code I had to add a catch block to ignore InsufficientAcksException on commit (for 2 node cluster with a Master designated as a primary).
      However, this looks like a hack for me.
      Can anyone confirm that it is the right thing to do with my HA configuration?

      Here is my exception stack trace:


      com.sleepycat.je.rep.InsufficientAcksException: (JE 5.0.48) Transaction: -20144 VLSN: 70,246, initiated at: 21:28:20. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. M
      issing replica acks: 1. Timeout: 15000ms. FeederState=Node-5001(1)[MASTER]
      No feeders.

      at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:169)
      at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1048)
      at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:149)
      at com.sleepycat.je.txn.Txn.commit(Txn.java:725)
      at com.sleepycat.je.txn.Txn.commit(Txn.java:584)
      at com.sleepycat.je.Transaction.commit(Transaction.java:313)          


      Here is the JE finest logs for the stack trace above:


      <...snipped...>
      120601 21:28:20:252 FINEST [Node-5001] Database.delete txnId=-20143 key=128 0 0 0 0 0 13 18
      120601 21:28:20:253 FINEST [Node-5001] vlsnIndex put 70,242
      120601 21:28:20:253 FINER [Node-5001] Sent to Node-5002: Entry DEL_LN_TX/8 vlsn=70,242 isReplicated="1" txn=-20143
      120601 21:28:20:253 FINER [Node-5001] Delete bin=94 lnIdx=3 oldLnLsn=0x0/0x5256be newLnLsn=0x1/0x9023cc
      120601 21:28:20:253 FINEST [Node-5001] Database.delete txnId=-20143 key=128 0 0 0 0 0 13 18
      120601 21:28:20:253 FINEST [Node-5001] vlsnIndex put 70,243
      120601 21:28:20:253 FINER [Node-5001] Sent to Node-5002: Entry DEL_LN_TX/8 vlsn=70,243 isReplicated="1" txn=-20143
      120601 21:28:20:253 FINER [Node-5001] Delete bin=95 lnIdx=3 oldLnLsn=0x0/0x525752 newLnLsn=0x1/0x9023f6
      120601 21:28:20:253 FINE [Node-5001] Txn -20143 requires: 1 active: 1 replica acks. Commit Policy: SIMPLE_MAJORITY
      120601 21:28:20:255 FINEST [Node-5001] vlsnIndex put 70,244
      120601 21:28:20:255 FINER [Node-5001] Sent to Node-5002: Commit Commit/8 vlsn=70,244 isReplicated="1"
      120601 21:28:20:511 FINE [Node-5001] Ack for: -20143
      120601 21:28:20:511 FINE [Node-5001] Commit: id = -20143 numWriteLocks=2 numReadLocks = 0
      120601 21:28:20:590 FINEST [Node-5001] Database.delete txnId=-20144 key=116 205 166 55 128 170 70 68 59 224 59 216 47 247 102 241 128 0 0 0 0 0 13 19
      120601 21:28:20:591 FINEST [Node-5001] vlsnIndex put 70,245
      120601 21:28:20:591 FINER [Node-5001] Sent to Node-5002: Entry DEL_LN_TX/8 vlsn=70,245 isReplicated="1" txn=-20144
      120601 21:28:20:591 FINER [Node-5001] Delete bin=96 lnIdx=3 oldLnLsn=0x0/0x52609e newLnLsn=0x1/0x90244c
      120601 21:28:20:594 FINE [Node-5001] Txn -20144 requires: 1 active: 1 replica acks. Commit Policy: SIMPLE_MAJORITY
      120601 21:28:20:595 FINEST [Node-5001] vlsnIndex put 70,246
      120601 21:28:20:595 FINER [Node-5001] Sent to Node-5002: Commit Commit/8 vlsn=70,246 isReplicated="1"
      120601 21:28:20:843 INFO [Node-5001] Shutting down feeder for replica Node-5002 Reason: Expected bytes: 6 read bytes: 0 write time: 2,509ms Avg write time: 3,237us
      120601 21:28:21:598 INFO [Node-5001] Feeder output for replica Node-5002 shutdown. feeder VLSN: 70,247 currentTxnEndVLSN: 70,246
      120601 21:28:21:598 FINE [Node-5001] Thread[Feeder Output for Node-5002,5,] has exited.
      120601 21:28:21:598 FINEST [Node-5001] (Node-5002(2))java.nio.channels.SocketChannel[closed] isOpen=false
      120601 21:28:23:909 FINE [Node-5001] InCompress.doCompress called, queue size: 3
      120601 21:28:23:909 FINEST [Node-5001] Delete: in=96 index=0
      120601 21:28:23:909 FINEST [Node-5001] Delete: in=96 index=0
      120601 21:28:23:909 FINEST [Node-5001] Delete: in=96 index=0
      120601 21:28:23:909 FINEST [Node-5001] Delete: in=95 index=0
      120601 21:28:23:909 FINEST [Node-5001] Delete: in=95 index=0
      120601 21:28:23:910 FINEST [Node-5001] Delete: in=95 index=0
      120601 21:28:23:910 FINEST [Node-5001] Delete: in=95 index=0
      120601 21:28:23:910 FINEST [Node-5001] Delete: in=94 index=0
      120601 21:28:23:910 FINEST [Node-5001] Delete: in=94 index=0
      120601 21:28:23:910 FINEST [Node-5001] Delete: in=94 index=0
      120601 21:28:23:910 FINEST [Node-5001] Delete: in=94 index=0
      120601 21:28:28:912 FINE [Node-5001] InCompress.doCompress called, queue size: 1
      120601 21:28:33:914 FINE [Node-5001] InCompress.doCompress called, queue size: 1
      120601 21:28:35:600 INFO [Node-5001] (JE 5.0.48) Transaction: -20144 VLSN: 70,246, initiated at: 21:28:20. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 15000ms. FeederState=Node-5001(1)[MASTER]
      No feeders.

      120601 21:28:35:600 FINE [Node-5001] Commit: id = -20144 numWriteLocks=1 numReadLocks = 0
      120601 21:28:38:917 FINE [Node-5001] InCompress.doCompress called, queue size: 1
      120601 21:28:38:917 FINEST [Node-5001] Delete: in=96 index=0
      120601 21:28:55:608 INFO [Node-5001] Primary activated; quorum is one.
      120601 21:28:55:608 FINEST [Node-5001] Database.delete txnId=-20145 key=128 0 0 0 0 0 13 19
      120601 21:28:55:609 FINEST [Node-5001] vlsnIndex put 70,247
      120601 21:28:55:609 FINER [Node-5001] Delete bin=94 lnIdx=0 oldLnLsn=0x0/0x525bde newLnLsn=0x1/0x9024ad
      120601 21:28:55:609 FINEST [Node-5001] Database.delete txnId=-20145 key=128 0 0 0 0 0 13 19
      120601 21:28:55:609 FINEST [Node-5001] vlsnIndex put 70,248
      120601 21:28:55:609 FINER [Node-5001] Delete bin=95 lnIdx=0 oldLnLsn=0x0/0x525c72 newLnLsn=0x1/0x9024d7
      120601 21:28:55:610 FINE [Node-5001] Txn -20145 requires: 0 active: 0 replica acks. Commit Policy: SIMPLE_MAJORITY
      120601 21:28:55:612 FINEST [Node-5001] vlsnIndex put 70,249
      120601 21:28:55:612 FINE [Node-5001] Commit: id = -20145 numWriteLocks=2 numReadLocks = 0
      <...snipped...>


      Please, let me know if you need any other information from me in order to investigate this issue.

      Kind regards,
      Alex
        • 1. Re: InsufficientAcksException is thrown on commit by designated primary master
          Linda Lee-Oracle
          Alex,
          However, I was surprised to see this exception thrown in 2 node case with a Master designated as a primary.
          Is it a bug or an intended behavior?
          This is not the intended behavior, and as you should not have to catch and ignore an InsufficientAcksException. At first glance, this seems to be a bug on our part. Seeing the FINE output in the log was very helpful. We noted this section:

          120601 21:28:20:591 FINER [Node-5001] Delete bin=96 lnIdx=3 oldLnLsn=0x0/0x52609e newLnLsn=0x1/0x90244c
          *120601 21:28:20:594 FINE [Node-5001] Txn -20144 requires: 1 active: 1 replica acks. Commit Policy: SIMPLE_MAJORITY*
          120601 21:28:20:595 FINEST [Node-5001] vlsnIndex put 70,246
          120601 21:28:20:595 FINER [Node-5001] Sent to Node-5002: Commit Commit/8 vlsn=70,246 isReplicated="1"
          *120601 21:28:20:843 INFO [Node-5001] Shutting down feeder for replica Node-5002 Reason: Expected bytes: 6 read bytes: 0 write time: 2,509ms Avg write time: 3,237us*
          *120601 21:28:21:598 INFO [Node-5001] Feeder output for replica Node-5002 shutdown. feeder VLSN: 70,247 currentTxnEndVLSN: 70,246*
          120601 21:28:21:598 FINE [Node-5001] Thread[Feeder Output for Node-5002,5,] has exited.
          120601 21:28:21:598 FINEST [Node-5001] (Node-5002(2))java.nio.channels.SocketChannel[closed] isOpen=false
          120601 21:28:23:909 FINE [Node-5001] InCompress.doCompress called, queue size: 3
          120601 21:28:23:909 FINEST [Node-5001] Delete: in=96 index=0
          120601 21:28:23:909 FINEST [Node-5001] Delete: in=96 index=0
          ...

          120601 21:28:23:910 FINEST [Node-5001] Delete: in=94 index=0
          120601 21:28:28:912 FINE [Node-5001] InCompress.doCompress called, queue size: 1
          120601 21:28:33:914 FINE [Node-5001] InCompress.doCompress called, queue size: 1
          *120601 21:28:35:600 INFO [Node-5001] (JE 5.0.48) Transaction: -20144 VLSN: 70,246, initiated at: 21:28:20. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 1. Missing replica acks: 1. Timeout: 15000ms. FeederState=Node-5001(1)[MASTER]*

          which tells us that the replica went down in the middle of the transaction. We think that this is a case where the "activation" of the primary happens after an ack request was sent out, but before it was received.

          We'll want to write a test case for this, and suppress the InsufficientAcksException in this event. Thanks for providing this detailed log. We'll post a confirmation of this hypothesis after we have checked this out. (And we also thank you for the other test case about failure during application schema change, which we still need to work on).

          Regards,

          Linda
          • 2. Re: InsufficientAcksException is thrown on commit by designated primary master
            524761
            Linda Lee wrote:
            We'll post a confirmation of this hypothesis after we have checked this out.
            This was indeed a bug. We've made a fix, which will be in the next release of JE (our internal SR #21536).
            • 3. Re: InsufficientAcksException is thrown on commit by designated primary master
              Linda Lee-Oracle
              Alex,

              This bug fix has been included in today's release of JE 5.0.55 and JE 4.1.21. Thank you for reporting the problem and including the detailed logs, which made it possible to diagnose the bug.

              Regards,

              The JE team.