7 Replies Latest reply: Aug 14, 2012 8:42 PM by Greybird-Oracle RSS

    Intermittent Lock Timeout Exceptions with JE 5.0.58

    954435
      Hi,

      During tests, the system continues to experience Lock Timeout problems from time to time, even with the latest version 5.0.58. The collection contains data which is typically removed shortly after inserted, so records usually have a short lifetime. There are multiple writer threads and 1 delete / reader thread.

      Any tips? Note the timeout is already a healthy 3 minutes long, so I don't think that's the problem. Stack trace and details below.

      Maybe the best thing to do is to reduce the lock timeout in order to not block the data pipeline, if a simple retry fixes it?

      Thanks,
      Karl


      ERROR@00:08:03 com.sleepycat.je.LockTimeoutException: (JE 5.0.58) Lock expired. Locker 879535510 -1_dbreader--Thread-5_ThreadLocker: waited for lock on database=persist#ListenerStore#com.procon.data.msg.local.BerkeleyDBMessage LockAddr:
      892328566 LSN=0xffffffff/0x5882af type=READ grant=WAIT_NEW timeoutMillis=180000 startTime=1344323103818 endTime=1344323283850
      Owners: [<LockInfo locker="1242902705 -1_CAL-dbwriter--Thread-18_ThreadLocker" type="WRITE"/>]
      Waiters: []

      com.sleepycat.je.txn.LockManager.newLockTimeoutException(LockManager.java:664)
      com.sleepycat.je.txn.LockManager.makeTimeoutMsgInternal(LockManager.java:623)
      com.sleepycat.je.txn.SyncedLockManager.makeTimeoutMsg(SyncedLockManager.java:97)
      com.sleepycat.je.txn.LockManager.lockInternal(LockManager.java:390)
      com.sleepycat.je.txn.LockManager.lock(LockManager.java:276)
      com.sleepycat.je.txn.BasicLocker.lockInternal(BasicLocker.java:118)
      com.sleepycat.je.txn.Locker.lock(Locker.java:443)
      com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2589)
      com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2390)
      com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:2118)
      com.sleepycat.je.Cursor.searchInternal(Cursor.java:2666)
      com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2576)
      com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2430)
      com.sleepycat.je.Cursor.search(Cursor.java:2397)
      com.sleepycat.je.Cursor.readPrimaryAfterGet(Cursor.java:3703)
      com.sleepycat.je.SecondaryCursor.readPrimaryAfterGet(SecondaryCursor.java:1470)
      com.sleepycat.je.SecondaryCursor.retrieveNext(SecondaryCursor.java:1448)
      com.sleepycat.je.SecondaryCursor.getNext(SecondaryCursor.java:560)
      com.sleepycat.util.keyrange.RangeCursor.doGetNext(RangeCursor.java:897)
      com.sleepycat.util.keyrange.RangeCursor.getNext(RangeCursor.java:451)
      com.sleepycat.persist.BasicCursor.next(BasicCursor.java:80)
      com.sleepycat.persist.BasicIterator.hasNext(BasicIterator.java:49)
      com.procon.data.msg.local.BerkeleyDBMessageStore.queryMessages(BerkeleyDBMessageStore.java:498)
      com.procon.listener.DatabaseReader.runMain(DatabaseReader.java:161)
      com.procon.base.BasicRunnable.run(BasicRunnable.java:274)
      java.lang.Thread.run(Thread.java:662)
        • 1. Re: Intermittent Lock Timeout Exceptions with JE 5.0.58
          Greybird-Oracle
          Karl,

          With a 3 minute timeout, you may have a true deadlock (which are sometimes expected) or a bug in your app where you neglect to close a cursor or close a transaction. Increasing the lock timeout is usually not the solution, but the first step is to diagnose what is causing it, and you haven't given us enough information to do that. You haven't said what isolation modes you're using, what transactions are active, how many records are involved in each transactions, etc.

          I suggest that you read:
          http://docs.oracle.com/cd/E17277_02/html/TransactionGettingStarted/index.html
          in particular the Concurrency chapter.

          Also see:
          http://www.oracle.com/technetwork/database/berkeleydb/je-faq-096044.html#HowdoIdebugalocktimeout
          for how to get more information about this particular lock conflict.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
          • 2. Re: Intermittent Lock Timeout Exceptions with JE 5.0.58
            954435
            Hi Mark,

            Thanks for your quick reply.

            The Application code model does not use transactions. The application writes new records, updates them one time (based on primary key), reads them (based on state), and then deletes them (based on primary key).

            This is because the effective use-case is a mem-cache (with of course the disk backup provided by BerkeleyDB JE).

            I will enable the additional LockTimeout debugging data next, and post the results.

            thanks,
            -karl

            Edited by: kgierach1 on Aug 8, 2012 12:39 PM
            • 3. Re: Intermittent Lock Timeout Exceptions with JE 5.0.58
              Greybird-Oracle
              Are there secondary keys (secondary indexes)?
              • 4. Re: Intermittent Lock Timeout Exceptions with JE 5.0.58
                954435
                Yes, there are 3 secondary indexes in this collection:

                /* accessors */
                PrimaryIndex<Long,BerkeleyDBMessage> messageById;
                SecondaryIndex<Integer,Long,BerkeleyDBMessage> messageByState;
                SecondaryIndex<Integer,Long,BerkeleyDBMessage> messageByFwdState;
                SecondaryIndex<Date,Long,BerkeleyDBMessage> messageByDate;

                thanks,
                -karl
                • 5. Re: Intermittent Lock Timeout Exceptions with JE 5.0.58
                  Greybird-Oracle
                  When using secondary databases and when performing concurrent reads and writes, lock timeouts due to deadlocks are expected and must be handled by your application by retrying.

                  See:
                  Problem with SecondaryDatabase

                  You can confirm that a deadlock between a primary and secondary record is occurring using the debugging information described in the FAQ:
                  http://www.oracle.com/technetwork/database/berkeleydb/je-faq-096044.html#HowdoIdebugalocktimeout

                  --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                  • 6. Re: Intermittent Lock Timeout Exceptions with JE 5.0.58
                    954435
                    Thanks Mark,

                    I have reproduced this with dumping the lock table. I did not see any evidence of deadlock in that printout (between threads). I'm happy to provide more details if necessary.

                    That said, the application retry strategy seems to be working OK, but I'd prefer to avoid the lock timeouts if possible.

                    -karl
                    • 7. Re: Intermittent Lock Timeout Exceptions with JE 5.0.58
                      Greybird-Oracle
                      I have reproduced this with dumping the lock table. I did not see any evidence of deadlock in that printout (between threads). I'm happy to provide more details if necessary.
                      You may misunderstand. The evidence of the deadlock is based on the debug output, you won't see a regular Java deadlock -- not sure if that's what you were looking for.
                      That said, the application retry strategy seems to be working OK, but I'd prefer to avoid the lock timeouts if possible.
                      When reading and writing concurrently, and using secondary databases, you cannot avoid deadlocks, which manifest as lock timeouts. Retrying is the only option.

                      --mark