4 Replies Latest reply: Aug 8, 2008 6:45 PM by Greybird-Oracle RSS

    Questions on concurrency control

    571477
      Hi all,

      I've tried to run a benchmark that mimics the TPC-B but with the isolation level serializable the number of aborts is extremely high. To understand why I started browsing the Sleepycat's code and I could not understand the following points:

      1 - In method Cursor.java:searchExactAndRangeLock, there is
      /* The keyChange value is independent of the status value. */
      noNextKeyFound = !result.keyChange;
      /* Lock the EOF node if no more records, whether or not more dups. */
      if (noNextKeyFound) {
      cursorImpl.lockEofNode(LockType.RANGE_READ);

      Is this right ? I mean every time a "get" is issued the end of file is locked even when the key exists.
      Could I replace noNextKeyFound = !result.keyChange; by noNextKeyFound = (result.status != OperationStatus.SUCCESS); ?

      2 - Cursor.java:putNoNotify
      DatabaseEntry nextKey = new DatabaseEntry
      (key.getData(), key.getOffset(), key.getSize());
      DatabaseEntry nextData = new DatabaseEntry
      (data.getData(), data.getOffset(), data.getSize());
      nextKey.setPartial(0, 0, true);
      nextData.setPartial(0, 0, true);

      nextKeyCursor.lockNextKeyForInsert(key, data, nextKey, nextData);

      The value of "nextKey" does not always match the next key available in the database. By the way, I've changed the signature of the method nextKeyCursor.lockNextKeyForInsert to return which was supposed to be next key value.

      3 - I did not see how phantoms generated by deletes are avoided.


      Cheers.
        • 1. Re: Questions on concurrency control
          Greybird-Oracle
          Hi Alfranio,

          It's great that you're doing this type of testing and that you've dug into the source code -- we appreciate it, and I'm sure others will benefit from your work.
          1 - In method Cursor.java:searchExactAndRangeLock,
          there is
          /* The keyChange value is independent of the status
          value. */
          noNextKeyFound = !result.keyChange;
          /* Lock the EOF node if no more records, whether or
          not more dups. */
          if (noNextKeyFound) {
          cursorImpl.lockEofNode(LockType.RANGE_READ);

          Is this right ? I mean every time a "get" is issued
          the end of file is locked even when the key exists.
          Could I replace noNextKeyFound = !result.keyChange;
          by noNextKeyFound = (result.status !=
          OperationStatus.SUCCESS); ?
          I've done a little testing and it looks like there is a problem here. I'm not sure yet whether the fix is as simple as you suggest. I will be digging deeper and adding more cases to our test suite, and I'll get back to you on the results sometime next week.
          2 - Cursor.java:putNoNotify
          DatabaseEntry nextKey = new DatabaseEntry
          (key.getData(), key.getOffset(), key.getSize());
          DatabaseEntry nextData = new DatabaseEntry
          (data.getData(), data.getOffset(), data.getSize());
          nextKey.setPartial(0, 0, true);
          nextData.setPartial(0, 0, true);

          nextKeyCursor.lockNextKeyForInsert(key, data,
          nextKey, nextData);

          The value of "nextKey" does not always match the next
          key available in the database. By the way, I've
          changed the signature of the method
          nextKeyCursor.lockNextKeyForInsert to return which
          was supposed to be next key value.
          Can you give an example of what you're seeing? -- The keys in the database, the key being inserted, and the nextKey that is locked (incorrectly). Or if you have one, please send a test program that shows the problem.
          3 - I did not see how phantoms generated by deletes
          are avoided.
          In JE, a deleted record is left in the Btree until after the transaction ends, and the deleted record remains locked. Other transactions attempting to insert a record with the same key are blocked. This is true no matter what isolation level is used. Does this answer your question?

          Thanks again for contributing your efforts on this.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
          • 2. Re: Questions on concurrency control
            571477
            Hi Mark,
            Can you give an example of what you're seeing? --
            The keys in the database, the key being inserted, and
            the nextKey that is locked (incorrectly). Or if you
            have one, please send a test program that shows the
            problem.
            A simple application is sequentially inserting the following keys on behalf of a single transaction: k1, k2, k4 and k5.

            The output of the first run:
            range-01: k1 -- k1
            range-02: k1 -- EOF
            range-01: k2 -- k2
            range-02: k2 -- k1
            range-01: k4 -- k4
            range-02: k4 -- k2
            range-01: k5 -- k5
            range-02: k5 -- k4

            The output of successive runs:
            range-01: k1 --
            range-02: k1 -- k2
            range-01: k2 --
            range-02: k2 -- k4
            range-01: k4 -- k4
            range-02: k4 -- k4
            range-01: k5 -- k5
            range-02: k5 -- k4

            I don't understand these values.. :(

            By the way, I don't know how to attach files by means of the interface provided in the forum, so I am copying the modifications done to the source code. Let me know if you need any else.

            Cheers.

            alfranio

            === modified file 'src/com/sleepycat/je/Cursor.java'
            --- src/com/sleepycat/je/Cursor.java 2008-06-12 16:21:24 +0000
            +++ src/com/sleepycat/je/Cursor.java 2008-08-04 19:08:15 +0000
            @@ -1375,6 +1375,31 @@
            }
            }

            + private CursorImpl getCursor(CursorImpl orig) throws DatabaseException {
            + CursorImpl dup = null;
            +
            + if (orig != null) {
            + dup = orig.cloneCursor(true);
            + }
            + return (dup);
            + }
            +
            + private OperationStatus rangeLockCurrentPosition(DatabaseEntry key, DatabaseEntry data, CursorImpl orig)
            + throws DatabaseException {
            + OperationStatus status = OperationStatus.NOTFOUND;
            + CursorImpl dup = getCursor(orig);
            +
            + if (dup != null) {
            + try {
            + status = dup.getCurrent(key, data, LockType.NONE);
            + } finally {
            + dup.close();
            + }
            + }
            +
            + return (status);
            + }
            +
            /**
            * Performs the put operation but does not notify triggers (does not
            * perform secondary updates). Prevents phantoms.
            @@ -1399,7 +1424,45 @@
            nextKeyCursor = new CursorImpl(dbImpl, nextKeyLocker);
            /* Perform eviction for user cursors. */
            nextKeyCursor.setAllowEviction(true);
            - nextKeyCursor.lockNextKeyForInsert(key, data);
            +
            + DatabaseEntry nextKey = new DatabaseEntry
            + (key.getData(), key.getOffset(), key.getSize());
            + DatabaseEntry nextData = new DatabaseEntry
            + (data.getData(), data.getOffset(), data.getSize());
            + nextKey.setPartial(0, 0, true);
            + nextData.setPartial(0, 0, true);
            +
            + nextKeyCursor.lockNextKeyForInsert(key, data, nextKey, nextData);
            +
            + if (nextKey.getData() == null) {
            + try {
            + System.out.println("range-01: " + new String(key.getData(),"UTF-8") + " -- EOF");
            + } catch (UnsupportedEncodingException e) {
            + e.printStackTrace(System.err);
            + }
            + } else {
            + try {
            + System.out.println("range-01: " + new String(key.getData(),"UTF-8") + " -- " + new String(nextKey.getData(),"UTF-8"));
            + } catch (UnsupportedEncodingException e) {
            + e.printStackTrace(System.err);
            + }
            + }
            +
            + DatabaseEntry endKey = new DatabaseEntry();
            + DatabaseEntry endData = new DatabaseEntry();
            + if (rangeLockCurrentPosition(endKey,endData,nextKeyCursor) != OperationStatus.SUCCESS) {
            + try {
            + System.out.println("range: " + new String(key.getData(),"UTF-8") + " -- EOF");
            + } catch (UnsupportedEncodingException e) {
            + e.printStackTrace(System.err);
            + }
            + } else {
            + try {
            + System.out.println("range: " + new String(key.getData(),"UTF-8") + " -- " + new String(endKey.getData(),"UTF-8"));
            + } catch (UnsupportedEncodingException e) {
            + e.printStackTrace(System.err);
            + }
            + }
            }

            /* Perform the put operation. */

            === modified file 'src/com/sleepycat/je/dbi/CursorImpl.java'
            --- src/com/sleepycat/je/dbi/CursorImpl.java 2008-06-12 16:21:24 +0000
            +++ src/com/sleepycat/je/dbi/CursorImpl.java 2008-08-04 18:28:04 +0000
            @@ -955,15 +955,10 @@
            * records following the given key and datum, lock the special EOF node
            * for the databaseImpl.
            */
            - public void lockNextKeyForInsert(DatabaseEntry key, DatabaseEntry data)
            + public OperationStatus lockNextKeyForInsert(DatabaseEntry key, DatabaseEntry data,
            + DatabaseEntry nextKey, DatabaseEntry nextData)
            throws DatabaseException {
            -
            - DatabaseEntry tempKey = new DatabaseEntry
            - (key.getData(), key.getOffset(), key.getSize());
            - DatabaseEntry tempData = new DatabaseEntry
            - (data.getData(), data.getOffset(), data.getSize());


            - tempKey.setPartial(0, 0, true);
            - tempData.setPartial(0, 0, true);
            + OperationStatus status = OperationStatus.NOTFOUND;
            boolean lockedNextKey = false;

            /* Don't search for data if duplicates are not configured. */
            @@ -988,7 +983,6 @@
            * If we didn't match the key, skip over duplicates to the next
            * key with getNextNoDup.
            */
            - OperationStatus status;
            if ((searchResult & EXACT_KEY) != 0) {
            status = getNext
            (tempKey, tempData, LockType.RANGE_INSERT, true, true);
            • 3. Re: Questions on concurrency control
              Greybird-Oracle
              1 - In method Cursor.java:searchExactAndRangeLock,
              there is
              /* The keyChange value is independent of the
              status
              value. */
              noNextKeyFound = !result.keyChange;
              /* Lock the EOF node if no more records, whether
              or
              not more dups. */
              if (noNextKeyFound) {
              cursorImpl.lockEofNode(LockType.RANGE_READ);

              Is this right ? I mean every time a "get" is
              issued
              the end of file is locked even when the key
              exists.
              Could I replace noNextKeyFound =
              !result.keyChange;
              by noNextKeyFound = (result.status !=
              OperationStatus.SUCCESS); ?
              I've done a little testing and it looks like there is
              a problem here. I'm not sure yet whether the fix is
              as simple as you suggest. I will be digging deeper
              and adding more cases to our test suite, and I'll get
              back to you on the results sometime next week.
              Alfranio, it turns out that you are exactly correct. This is a bug that reduces concurrency for Serializable isolation, and the problem code is exactly what you have pointed out.

              We've added to our test suite to thoroughly cover this issue and potential related issues, and we've added a fix that will be in our next JE 3.3 patch release. For now, I'll copy the patch itself below.

              Next I'll take a look at your issue (2) and get back to on that in the next few days. When that is resolved, I'll make a JE jar with these fixes available on request.

              Thanks again!
              --mark
              Index: src/com/sleepycat/je/Cursor.java
              ===================================================================
              RCS file: /a/CVSROOT/je/src/com/sleepycat/je/Cursor.java,v
              retrieving revision 1.216
              diff -c -w -r1.216 Cursor.java
              *** src/com/sleepycat/je/Cursor.java    10 Jun 2008 02:52:08 -0000      1.216
              --- src/com/sleepycat/je/Cursor.java    7 Aug 2008 01:00:22 -0000
              ***************
              *** 1698,1704 ****
                            SearchMode.SET_RANGE : SearchMode.BOTH_RANGE;
              
                        KeyChangeStatus result = null;
              -         boolean noNextKeyFound;
              
                        CursorImpl dup =
                            beginRead(false /* searchAndPosition will add cursor */);
              --- 1698,1703 ----
              ***************
              *** 1714,1722 ****
                                (dup, key, data, searchLockType, advanceLockType, searchMode,
                                 true /*advanceAfterRangeSearch*/);
                
              -             /* The keyChange value is independent of the status value. */
              -             noNextKeyFound = !result.keyChange;
              - 
                            /* If the key changed, then we do not have an exact match. */
                            if (result.keyChange && result.status == OperationStatus.SUCCESS) {
                                result.status = OperationStatus.NOTFOUND;
              --- 1713,1718 ----
              ***************
              *** 1726,1733 ****
                                         result.status == OperationStatus.SUCCESS);
                        }
              
              !         /* Lock the EOF node if no more records, whether or not more dups. */
              !         if (noNextKeyFound) {
                            cursorImpl.lockEofNode(LockType.RANGE_READ);
                        }
              
              --- 1722,1732 ----
                                         result.status == OperationStatus.SUCCESS);
                        }
              
              !         /*
              !          * Lock the EOF node if there was no exact match and we did not
              !          * range-lock the next record.
              !          */
              !         if (result.status != OperationStatus.SUCCESS && !result.keyChange) {
                            cursorImpl.lockEofNode(LockType.RANGE_READ);
                        }
              • 4. Re: Questions on concurrency control
                Greybird-Oracle
                Alfranio, on your issue (2), I recreated the situation you described and everything looks OK -- I don't see a problem.

                In the code you added, the wrong results were printed. First, if CursorImpl.lockNextKeyForInsert does not successfully lock the next key, then it is invalid to print out the contents of the DatabaseEntry params -- these are only meaningful if the operation succeeds. Second, if you call setPartial, the entry won't be returned, which is why you were seeing a null data array in some cases.

                I changed your range-01 code so it works with CursorImpl. And I deleted your range-02 code (actually this was just called "range" in the diff you sent, I don't think you sent the diff you were running with). The results are exactly as you would except.

                The initial run prints:

                range: k1 -- EOF
                range: k2 -- EOF
                range: k4 -- EOF
                range: k5 -- EOF

                And subsequent runs print:

                range: k1 -- k2
                range: k2 -- k4
                range: k4 -- k5
                range: k5 -- EOF

                Below is the diff I used, which I believe is equivalent to what your code was attempting. Below that is the test program I used.

                Please let me know whether this answers your questions and whether you agree that this is correct.
                Thanks,
                --mark
                Index: src/com/sleepycat/je/Cursor.java
                ===================================================================
                RCS file: /a/CVSROOT/je/src/com/sleepycat/je/Cursor.java,v
                retrieving revision 1.216.2.1
                diff -c -w -r1.216.2.1 Cursor.java
                *** src/com/sleepycat/je/Cursor.java    7 Aug 2008 17:04:46 -0000       1.216.2.1
                --- src/com/sleepycat/je/Cursor.java    8 Aug 2008 23:31:41 -0000
                ***************
                *** 11,16 ****
                --- 11,17 ----
                  import java.util.logging.Level;
                  import java.util.logging.Logger;
                
                + import com.sleepycat.bind.tuple.StringBinding;
                  import com.sleepycat.je.dbi.CursorImpl;
                  import com.sleepycat.je.dbi.DatabaseImpl;
                  import com.sleepycat.je.dbi.GetMode;
                ***************
                *** 1399,1405 ****
                                  nextKeyCursor = new CursorImpl(dbImpl, nextKeyLocker);
                                  /* Perform eviction for user cursors. */
                                  nextKeyCursor.setAllowEviction(true);
                !                 nextKeyCursor.lockNextKeyForInsert(key, data);
                              }
                
                              /* Perform the put operation. */
                --- 1400,1421 ----
                                  nextKeyCursor = new CursorImpl(dbImpl, nextKeyLocker);
                                  /* Perform eviction for user cursors. */
                                  nextKeyCursor.setAllowEviction(true);
                ! 
                !                 DatabaseEntry nextKey = new DatabaseEntry
                !                        (key.getData(), key.getOffset(), key.getSize());
                !                 DatabaseEntry nextData = new DatabaseEntry
                !                        (data.getData(), data.getOffset(), data.getSize());
                ! 
                !                 if (nextKeyCursor.lockNextKeyForInsert
                !                     (key, data, nextKey, nextData)) {
                !                     System.out.println
                !                         ("range: " + StringBinding.entryToString(key) +
                !                          " -- " + StringBinding.entryToString(nextKey));
                !                 } else {
                !                     System.out.println
                !                         ("range: " + StringBinding.entryToString(key) +
                !                          " -- EOF");
                !                 }
                              }
                
                              /* Perform the put operation. */
                Index: src/com/sleepycat/je/dbi/CursorImpl.java
                ===================================================================
                RCS file: /a/CVSROOT/je/src/com/sleepycat/je/dbi/CursorImpl.java,v
                retrieving revision 1.348
                diff -c -w -r1.348 CursorImpl.java
                *** src/com/sleepycat/je/dbi/CursorImpl.java    20 May 2008 03:27:34 -0000      1.348
                --- src/com/sleepycat/je/dbi/CursorImpl.java    8 Aug 2008 23:31:44 -0000
                ***************
                *** 955,969 ****
                       * records following the given key and datum, lock the special EOF node
                       * for the databaseImpl.
                       */
                !     public void lockNextKeyForInsert(DatabaseEntry key, DatabaseEntry data)
                          throws DatabaseException {
                
                -         DatabaseEntry tempKey = new DatabaseEntry
                -             (key.getData(), key.getOffset(), key.getSize());
                -         DatabaseEntry tempData = new DatabaseEntry
                -             (data.getData(), data.getOffset(), data.getSize());
                -         tempKey.setPartial(0, 0, true);
                -         tempData.setPartial(0, 0, true);
                          boolean lockedNextKey = false;
                
                          /* Don't search for data if duplicates are not configured. */
                --- 955,966 ----
                       * records following the given key and datum, lock the special EOF node
                       * for the databaseImpl.
                       */
                !     public boolean lockNextKeyForInsert(DatabaseEntry key,
                !                                         DatabaseEntry data,
                !                                         DatabaseEntry tempKey,
                !                                         DatabaseEntry tempData)
                          throws DatabaseException {
                
                          boolean lockedNextKey = false;
                
                          /* Don't search for data if duplicates are not configured. */
                ***************
                *** 1011,1016 ****
                --- 1008,1014 ----
                          if (!lockedNextKey) {
                              lockEofNode(LockType.RANGE_INSERT);
                          }
                +         return lockedNextKey;
                      }
                
                      /**
                -----------------------------------------
                import java.io.File;
                import com.sleepycat.bind.tuple.StringBinding;
                import com.sleepycat.je.Database;
                import com.sleepycat.je.DatabaseConfig;
                import com.sleepycat.je.DatabaseEntry;
                import com.sleepycat.je.Environment;
                import com.sleepycat.je.EnvironmentConfig;
                import com.sleepycat.je.OperationStatus;
                import com.sleepycat.je.Transaction;
                import com.sleepycat.je.TransactionConfig;
                
                public class Test {
                    public static void main(String[] args) throws Exception {
                        EnvironmentConfig envConfig = new EnvironmentConfig();
                        envConfig.setAllowCreate(true);
                        envConfig.setTransactional(true);
                        Environment env = new Environment(new File("./tmp"), envConfig);
                
                        DatabaseConfig dbConfig = new DatabaseConfig();
                        dbConfig.setAllowCreate(true);
                        dbConfig.setTransactional(true);
                        Database db = env.openDatabase(null, "foo", dbConfig);
                
                        TransactionConfig txnConfig = new TransactionConfig();
                        txnConfig.setSerializableIsolation(true);
                        Transaction txn = env.beginTransaction(null, txnConfig);
                
                        put(db, "k1", "1");
                        put(db, "k2", "2");
                        put(db, "k4", "4");
                        put(db, "k5", "5");
                
                        txn.abort();
                        db.close();
                        env.close();
                    }
                
                    static void put(Database db, String key, String data) throws Exception {
                        DatabaseEntry keyEntry = new DatabaseEntry();
                        DatabaseEntry dataEntry = new DatabaseEntry();
                        StringBinding.stringToEntry(key, keyEntry);
                        StringBinding.stringToEntry(data, dataEntry);
                        OperationStatus status = db.put(null, keyEntry, dataEntry);
                        if (status != OperationStatus.SUCCESS) {
                            System.out.println(status);
                        }
                    }
                }