DB hangs and CPU 100% when SecondaryCursor.getSearchKeyRange
Here's the thread dump when DB is stuck
"10746124@qtp-25919971-36" prio=10 tid=0x8f5c8000 nid=0x4115 runnable [0x8c54b000]
...below is just a one time call to getSearchKeyRange
There is no known bug that will cause the SecondaryCursor.getSearchKeyRange to loop, so you will need to give us enough information to reproduce this, in order for us to see what is happening.
I only see one thread in the dump running, is that correct? The thread is runnable and it must be looping. Please take 5 or more full (complete stack) thread dumps while it is looping, so we can see where the loop is occurring.
Also please send your environment config settings, and the SecondaryConfig and DatabaseConfig settings for the secondary and primary database.
Even better, if you can reproduce this in a small test program, then send the source code.
This seem to happen when DB has problematic secondary indices. I've traced some of the code, the thread is looping at the while(true) loop at SecondaryCursor line 1363 as in http://www.docjar.com/html/api/com/sleepycat/je/SecondaryCursor.java.html
The reason for indefinite loop seems to be a failed readPrimaryAfterGet() call at 1535(SecondaryCursor.java), where the secondary key generated does not match the key found (due to a corrupted secondary index).
A truncate and rebuild of the secondary index will resolve the problem, but it would be helpful if corrupted index can be detected and thrown exception as in line 1507, so that we can automatically truncate and rebuild the corrupted index. Because right now the reason for such corrupted secondary index is unknown.
1. Yes, I'm using READ_UNCOMMITTED for better performance.
2. Yes, I'm using transactions to write data. We have found one reason for corrupted index is that we used time in the creation of secondary index, and changing server timezone will cause a mismatch between indices created now and before. We're still investigating other possible causes.