3 Replies Latest reply: Jul 19, 2012 10:04 AM by Greybird-Oracle RSS

    Lock/isolation with secondary databases

    943066
      Hi all,

      Some more questions from me - sorry for the never-ending stream, but I'm hoping the solutions at least will help others who potentially strike the same or similar issues.

      My environment is a reasonable size (~21M rows, ~10G) and is made up of a primary DB enforcing uniqueness on keys and secondary to provide an alternate sort order for traversal.

      Ostensibly, all writes occur in a single thread on the primary DB (naturally), whilst ostensibly all reads occur on the secondary in multiple threads.

      I noticed that I very quickly got lock timeout errors looking something like:
      -----
      04:45:30.349 [Grizzly(14)] ERROR c.a.w.web.resources.SearchResource - (JE 5.0.55) Lock expired. Locker 902555936 -1_Grizzly(14)_ThreadLocker: waited for lock on database=calendar_routecost LockAddr:2118856719 LSN=0xaf8/0x95e879 type=READ grant=WAIT_NEW timeoutMillis=500 startTime=1342586729848 endTime=1342586730348
      Owners: [<LockInfo locker="2925905 3536_Loader_Txn" type="WRITE"/>]
      Waiters: []

      com.sleepycat.je.LockTimeoutException: (JE 5.0.55) Lock expired. Locker 902555936 -1_Grizzly(14)_ThreadLocker: waited for lock on database=calendar_routecost LockAddr:2118856719 LSN=0xaf8/0x95e879 type=READ grant=WAIT_NEW timeoutMillis=500 startTime=1342586729848 endTime=1342586730348
      Owners: [<LockInfo locker="2925905 3536_Loader_Txn" type="WRITE"/>]
      Waiters: []
      -----

      I've switched all my reads to READ_UNCOMMITTED in an effort to keep the time that records are locked to an absolute minimum, however, in the case above it's obviously WRITE locks in the Loader thread that's causing the lock timeout (the locked database mentioned is the secondary in the example).

      From my reading of the documentation, it appears that all records modified within a transaction retain exclusive locks during the entire lifetime of the transaction - is this correct? If so, doesn't this somewhat defeat the purpose of transactional isolation if a transaction takes an exclusive lock on records whilst modifying them?

      In my case, transactions were large and fairly long-lived (ie: up to 4 seconds) which was well in excess of read timeouts. I found if I drastically reduced my transaction size, I was able to keep transaction times <500ms, which means the lock timeout doesn't occur, however, it seems like a very fragile solution.

      Additionally, I am accessing the secondary DB using a StoredSortedMap, rather than directly through cursors, which from my understanding is transaction-aware providing the DB is configured as being transactional (which mine is).

      So, my questions are:

      - Is this the right approach to avoiding lock timeouts?
      - Is there a way to give readers lock preference to writers? (I'd prefer writes block than reads)
      - Is it better to make transactions tiny and frequent (ie: thousands per second) to avoid this?
      - Is there an advantage to switching off the Collections interface to the DB?
        • 1. Re: Lock/isolation with secondary databases
          Greybird-Oracle
          I've switched all my reads to READ_UNCOMMITTED in an effort to keep the time that records are locked to an absolute minimum, however, in the case above it's obviously WRITE locks in the Loader thread that's causing the lock timeout (the locked database mentioned is the secondary in the example).
          Using read-uncommitted is a good way to reduce contention, if you can really live with the fact that the data you read is not committed and its transaction may be aborted (undone).
          From my reading of the documentation, it appears that all records modified within a transaction retain exclusive locks during the entire lifetime of the transaction - is this correct? If so, doesn't this somewhat defeat the purpose of transactional isolation if a transaction takes an exclusive lock on records whilst modifying them?
          On the contrary, locking is what isolates the transactions. Holding the write lock until the end of the txn is textbook two-phase locking, although some databases use other techniques. Have you read the "Writing Transactional Applications" guide that's in our docs? It explains locking and how to write your application to deal with lock exceptions.
          In my case, transactions were large and fairly long-lived (ie: up to 4 seconds) which was well in excess of read timeouts. I found if I drastically reduced my transaction size, I was able to keep transaction times <500ms, which means the lock timeout doesn't occur, however, it seems like a very fragile solution.
          In all databases, the smaller the transaction the better. If you can reduce your txn size, the real question is why wouldn't you.

          If you have good reasons for having a long transaction, adjusting the lock timeout may be necessary. The default of 500ms is just that, a default, and there is nothing wrong with changing it.
          Additionally, I am accessing the secondary DB using a StoredSortedMap, rather than directly through cursors, which from my understanding is transaction-aware providing the DB is configured as being transactional (which mine is).
          Yes, the collections API uses the per-thread txn (if you've set one) and uses auto-commit otherwise.
          - Is this the right approach to avoiding lock timeouts?
          In addition to what I said above, you will probably need to do retries, as described in the guide I mentioned.
          - Is there a way to give readers lock preference to writers? (I'd prefer writes block than reads)
          No, there isn't.
          - Is it better to make transactions tiny and frequent (ie: thousands per second) to avoid this?
          Absolutely.
          - Is there an advantage to switching off the Collections interface to the DB?
          In general, no.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
          • 2. Re: Lock/isolation with secondary databases
            943066
            Hi Mark,

            That's super-helpful, thank you. I've shrunk txn size down to very small - I guess the reason I didn't originally is I had it stuck in my head (from SQL land) that many tiny transactions are bad (ie: with postgres, you'd checkpoint, rather than begin/commit many small txns).

            Glad to hear the fix is so easy! :)

            Btw, how should txn's in read-only cursors be handled, and is there a dramatic difference to cursors with an txn specified vs not if it's not writing anything?

            It seems slightly weird to go:

            ------
            txn = dbEnv.beginTransaction(null, null);
            Cursor cur = db.openCursor(txn, null);

            // .. read some things with cursor

            cur.close();
            txn.commit(); // ? or abort or does it matter?
            ------

            Cheers!

            fb.
            • 3. Re: Lock/isolation with secondary databases
              Greybird-Oracle
              Btw, how should txn's in read-only cursors be handled, and is there a dramatic difference to cursors with an txn specified vs not if it's not writing anything?
              By read-only cursors I think you just mean read-only transactions (whether cursors are used or not). When you commit or abort, no commit/abort entry is written to the log -- it is optimized away. There is no difference between commit and abort in this case.

              --mark