This discussion is archived
6 Replies Latest reply: Oct 30, 2011 1:29 PM by greybird RSS

DB hangs and CPU 100% when SecondaryCursor.getSearchKeyRange

894743 Newbie
Currently Being Moderated
DB hangs and CPU 100% when SecondaryCursor.getSearchKeyRange
Here's the thread dump when DB is stuck


"10746124@qtp-25919971-36" prio=10 tid=0x8f5c8000 nid=0x4115 runnable [0x8c54b000]
java.lang.Thread.State: RUNNABLE
at com.sleepycat.je.dbi.CursorImpl.hashCode(CursorImpl.java:262)
at java.util.HashMap.put(HashMap.java:372)
at java.util.HashSet.add(HashSet.java:200)
at com.sleepycat.je.utilint.TinyHashSet.add(TinyHashSet.java:93)
at com.sleepycat.je.tree.BIN.addCursor(BIN.java:453)
at com.sleepycat.je.dbi.CursorImpl.addCursor(CursorImpl.java:609)
at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:2187)
at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2097)
at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2067)
at com.sleepycat.je.Cursor.search(Cursor.java:1935)
at com.sleepycat.je.SecondaryCursor.search(SecondaryCursor.java:1363)
at com.sleepycat.je.SecondaryCursor.getSearchKeyRange(SecondaryCursor.java:1175)

...below is just a one time call to getSearchKeyRange
  • 1. Re: DB hangs and CPU 100% when SecondaryCursor.getSearchKeyRange
    894743 Newbie
    Currently Being Moderated
    This happens on 4.1.7 and 4.1.10
  • 2. Re: DB hangs and CPU 100% when SecondaryCursor.getSearchKeyRange
    greybird Expert
    Currently Being Moderated
    There is no known bug that will cause the SecondaryCursor.getSearchKeyRange to loop, so you will need to give us enough information to reproduce this, in order for us to see what is happening.

    I only see one thread in the dump running, is that correct? The thread is runnable and it must be looping. Please take 5 or more full (complete stack) thread dumps while it is looping, so we can see where the loop is occurring.

    Also please send your environment config settings, and the SecondaryConfig and DatabaseConfig settings for the secondary and primary database.

    Even better, if you can reproduce this in a small test program, then send the source code.

    Thanks,
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 3. Re: DB hangs and CPU 100% when SecondaryCursor.getSearchKeyRange
    894743 Newbie
    Currently Being Moderated
    This seem to happen when DB has problematic secondary indices. I've traced some of the code, the thread is looping at the while(true) loop at SecondaryCursor line 1363 as in http://www.docjar.com/html/api/com/sleepycat/je/SecondaryCursor.java.html

    The reason for indefinite loop seems to be a failed readPrimaryAfterGet() call at 1535(SecondaryCursor.java), where the secondary key generated does not match the key found (due to a corrupted secondary index).

    A truncate and rebuild of the secondary index will resolve the problem, but it would be helpful if corrupted index can be detected and thrown exception as in line 1507, so that we can automatically truncate and rebuild the corrupted index. Because right now the reason for such corrupted secondary index is unknown.
  • 4. Re: DB hangs and CPU 100% when SecondaryCursor.getSearchKeyRange
    greybird Expert
    Currently Being Moderated
    You are right that an exception should be thrown by JE in this case, if possible. I'll look into doing this.

    Please confirm that you are using READ_UNCOMMITTED -- is that correct?

    Also, are you using transactions for write operations? That should prevent secondary corruption.

    Thanks for reporting this.
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 5. Re: DB hangs and CPU 100% when SecondaryCursor.getSearchKeyRange
    894743 Newbie
    Currently Being Moderated
    1. Yes, I'm using READ_UNCOMMITTED for better performance.
    2. Yes, I'm using transactions to write data. We have found one reason for corrupted index is that we used time in the creation of secondary index, and changing server timezone will cause a mismatch between indices created now and before. We're still investigating other possible causes.

    Thanks!!
    Jeff
  • 6. Re: DB hangs and CPU 100% when SecondaryCursor.getSearchKeyRange
    greybird Expert
    Currently Being Moderated
    This will be fixed in JE 5.0. In the change log you'll see this ticket number: [#20822].
    Thanks for reporting this.
    --mark                                                                                                                                                                                                                                                           

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points