7 Replies Latest reply: Oct 11, 2012 3:51 PM by llowrey RSS

    Cursor LockTimeoutException

    llowrey
      I have a database that I am treating like a queue. Multiple threads poll this 'queue' by opening cursors and pulling out the first record that hasn't already been claimed by another thread. Multiple writer threads append to the queue (keyed by sequence).

      After a few minutes, or a few hours, I'll get a LockTimeoutException in a reader thread which is caused by a writer thread transaction.

      My logs indicate the activity is:

      reader:
           TransactionConfig conf = new TransactionConfig();
           conf.setReadCommitted(true);
           txn = env.beginTransaction(null, conf);
           database.openCursor(txn, CursorConfig.READ_COMMITTED);
           loop: cursor.getNext(keyEntry, valueEntry, null);

      while the reader is looping through getNext...

      writer:
           TransactionConfig conf = new TransactionConfig();
           conf.setReadCommitted(true);
           txn = env.beginTransaction(null, conf);
           ...
           sequence.get(null, 1);
           database.put(txn, keyEntry, valueEntry);
           txn.commit();

      It seems that it's the moment the writer tries to commit that everything grinds to a halt for the 500ms lock timeout. The reader thread then throws the LockTimeoutException and then the writer finishes it's commit operation and continues.

      The exception looks like this:
      com.sleepycat.je.LockTimeoutException: (JE 5.0.58) Lock expired. Locker 844587133 -1_http_ReadCommittedLocker: waited for lock on database=http LockAddr:109037661 LSN=0x1/0x1d098d type=READ grant=WAIT_NEW timeoutMillis=500 startTime=1349247160593 endTime=1349247161095
      Owners: [<LockInfo locker="1631737567 132015_writer-5_Txn" type="WRITE"/>]
      Waiters: []

      at com.sleepycat.je.txn.LockManager.newLockTimeoutException(LockManager.java:664)
      at com.sleepycat.je.txn.LockManager.makeTimeoutMsgInternal(LockManager.java:623)
      at com.sleepycat.je.txn.SyncedLockManager.makeTimeoutMsg(SyncedLockManager.java:97)
      at com.sleepycat.je.txn.LockManager.lockInternal(LockManager.java:390)
      at com.sleepycat.je.txn.LockManager.lock(LockManager.java:276)
      at com.sleepycat.je.txn.BasicLocker.lockInternal(BasicLocker.java:118)
      at com.sleepycat.je.txn.ReadCommittedLocker.lockInternal(ReadCommittedLocker.java:84)
      at com.sleepycat.je.txn.Locker.lock(Locker.java:443)
      at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2589)
      at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2390)
      at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2221)
      at com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1434)
      at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1561)
      at com.sleepycat.je.Cursor.positionAllowPhantoms(Cursor.java:2368)
      at com.sleepycat.je.Cursor.positionNoDups(Cursor.java:2298)
      at com.sleepycat.je.Cursor.position(Cursor.java:2285)
      at com.sleepycat.je.Cursor.getNext(Cursor.java:1126)



      Should my reader threads be using Serializable Isolation when opening the transaction for reading from the cursor?

      My assumption had been that since I am reading committed data only that the new record added by the writer would not be seen by my reader thread and that the readers would thus not interfere with the writers. Where am I going wrong?
        • 1. Re: Cursor LockTimeoutException
          greybird
          Hi,

          READ_COMMITTED is the right mode for what you're doing, but it won't prevent blocking. getNext will block if the next record is write-locked but not yet committed, and it will timeout if the writer txn takes longer than the timeout interval.

          There are two ways to approach this.

          One approach is to try to reduce the time taken by the write, and perhaps also increase the reader's timeout interval to account for outlier writes (those that take longer than average). For example, I see that you're using the default durability, which is SYNC and requires flushing all the way to the device at commit time. Since you said the problem occurs during the writer commit, this is probably related.

          Another approach is to try to avoid any blocking by readers. From what you said, this is requirement -- is that correct? If so, we can explore ideas for doing that.

          But I'm not clear on exactly how your queue works because you didn't describe deletions. Do the readers delete the records after processing them? Using the same cursor as for reading?

          In any case, please see the instructions in the FAQ for enabling lock debugging. This will cause more info to be output in the exception message and tell us a lot more about what's happening.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
          • 2. Re: Cursor LockTimeoutException
            llowrey
            Thank you for your reply.

            I actually am setting durability to COMMIT_WRITE_NO_SYNC when opening the environment. I've been running on mechanicals so I turned off sync but still want writes on commit so I won't lose data if the JVM crashes. I'll eventually run this on SSDs and will turn on SYNC at that time.

            I'm surprised that a write would take more than 500ms. That seems like a very long time to service a small write request (less than one 4k page).

            The extra logging shows that the writer acquired the lock by doing a put() and the reader is blocked on getNext().

            WRT deletes, once the reader acquires an entry it closes the cursor and aborts the transaction. I figured it'd probably be cheaper to abort the txn than to commit since there were no changes. The reader then works on the item and then, later (probably less than 10ms) will delete the entry by calling delete(). The delete is done in a new transaction which is immediately committed.

            So far I have not seen any lock timeouts related to the delete operation.

            Ideally, when walking the cursor I would like to walk past any locked items, even if it means returning no results. So, I am very interested in your technique to avoid blocking.

            Thank you for your assistance!
            • 3. Re: Cursor LockTimeoutException
              greybird
              Since you have multiple reader threads, in one reader how do you currently determine whether a record, returned by getNext, has been claimed by another reader?
              • 4. Re: Cursor LockTimeoutException
                greybird
                I'm surprised that a write would take more than 500ms. That seems like a very long time to service a small write request (less than one 4k page).
                When using WRITE_NO_SYNC, each txn commit will write the data to the file system, but often the file system buffers it. When JE closes one log file and opens another, it does an explicit fsync. The fsync will flush out all pending data in the file system buffers. This happens within the scope of the txn commit, if we happen to start a new log file as part of that commit. So much more than one 4k page may be written.

                Some file systems can be tuned to trickle out buffered data, to reduce the impact of an explicit fsync. What file system are you using?

                --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
                • 5. Re: Cursor LockTimeoutException
                  llowrey
                  greybird wrote:
                  Since you have multiple reader threads, in one reader how do you currently determine whether a record, returned by getNext, has been claimed by another reader?
                  I'm using a ConcurrentHashMap to track which thread has 'claimed' which record. The reader calls putIfAbsent() which is supposed to be atomic. If another thread has claimed the record the call will return a non-null value.
                  greybird wrote:
                  I'm surprised that a write would take more than 500ms. That seems like a very long time to service a small write request (less than one 4k page).
                  When using WRITE_NO_SYNC, each txn commit will write the data to the file system, but often the file system buffers it. When JE closes one log file and opens another, it does an explicit fsync. The fsync will flush out all pending data in the file system buffers. This happens within the scope of the txn commit, if we happen to start a new log file as part of that commit. So much more than one 4k page may be written.

                  Some file systems can be tuned to trickle out buffered data, to reduce the impact of an explicit fsync. What file system are you using?
                  OK. That makes a lot more sense. The FS is ext3 running on a two disk (raid1) mirror. The OS is RHEL 5.7. If I am not mistaken, linux is still brain damaged and stops the world when a sync is requested and writes out all dirty pages and not just those for the file in question. I plan to run this on a solaris host eventually which does not suffer that particular ailment.

                  The machine currently has:
                  /proc/sys/vm/dirty_background_ratio 10
                  /proc/sys/vm/dirty_ratio 40

                  While this machine only has 4GB of ram, 10% is still a lot. Do you have recommended settings?

                  --Larkin                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
                  • 6. Re: Cursor LockTimeoutException
                    greybird
                    While this machine only has 4GB of ram, 10% is still a lot. Do you have recommended settings?
                    We choose a vm.dirty_background_ratio that gets us closest to 10MB. However, a ratio of 5 is the effective minimum in some kernels.
                    I'm using a ConcurrentHashMap to track which thread has 'claimed' which record. The reader calls putIfAbsent() which is supposed to be atomic. If another thread has claimed the record the call will return a non-null value.
                    I see. Based on what you've told me, I can think of two ways to do it.

                    1) In your current approach, if I understand, you're using a separate Cursor in each reader thread, and you'd like to skip over locked records because those records are being processed by other readers. And you'd also like to skip over a locked record that is being inserted, although I'm not sure why, perhaps just to avoid the lock timeout.

                    The only way to skip over a locked record with a Cursor is to use dirty-read (LockMode.READ_UNCOMMITTED). This will read a record regardless of whether it is committed or locked by another txn.

                    When a reader claims a record (using putIfAbsent in your CHM), then it should read it again with the default lock mode for the cursor (READ_COMMITTED in your case). This will lock it, which will prevent accidentally deleting it in another txn and ensure that it is indeed committed.

                    As I haven't implemented this, I may be overlooking something. But in general it is possible to use dirty-read in this manner.

                    2) Perhaps a simpler approach (and probably best because it is simpler), is to use a single reader thread that adds records to a BlockingQueue (of some appropriate size), and a ThreadPoolExecutor that takes records off the queue and processes them. This is a fairly standard approach, and allows you to easily adjust the size of the thread pool as needed.

                    In this case the reader thread can simply use READ_COMMITTED (as you do currently). It will block when it encounters a record being inserted (and not yet committed), but this is desirable. You may want to use a very large lock timeout, or simply retry when you get the timeout exception.

                    When there are no more records, the reader's cursor will remain positioned on the last record read. A special case is when the reader thread is positioned on a record that is to be deleted. The reader's cursor will hold a lock, preventing the deletion. Perhaps you have a way of dealing with that -- I don't know all the details of what you're doing of course. One approach is to close the cursor and commit the txn when getNext returns NOTFOUND, and then reopen the cursor and restart your scan.

                    Oh, on your earlier comment on abort: When there are no writes in a txn, abort and commit are equivalent -- they do nothing except release the locks held by the txn. Calling abort, as you're doing, is fine.

                    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
                    • 7. Re: Cursor LockTimeoutException
                      llowrey
                      Thank you for all of your help.

                      I increased the lock timeout to 60s and started collecting statistics. I found that I'm seeing some timeouts in the 100-1000ms range but nothing larger. I have been able to run w/o incident all week by applying some of your suggestions and the larger lock timeout. Thanks again!