This discussion is archived
12 Replies Latest reply: Oct 11, 2012 3:06 PM by greybird RSS

DiskOrderedScan hangs under concurrent load

vinothchandar Newbie
Currently Being Moderated
Hi,

I have been testing DOScan with live traffic simultaneously hitting the database. At high enough throughput (~700/s), the scan simply stalls. I have multiple threads doing live traffic (get + updates).
one thread scanning the keys, with the following config. For each such key it obtains in the scan, it does a get()

DiskOrderedCursorConfig docc = new DiskOrderedCursorConfig();
docc.setInternalMemoryLimit(64 * 1024 * 1024);
docc.setKeysOnly(true);
docc.setMaxSeedMillisecs(500);

I stack dumped the hung process. The following is the stack dump of DOProducer and scanning thread.

"DiskOrderedScan Producer Thread for Thread[voldemort-niosocket-server2,5,main]" daemon prio=10 tid=0x00007f930498c000 nid=0x30d1 waiting on condition [0x00007f92fa25f000]
java.lang.Thread.State: TIMED_WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <0x00000002bae00078> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
     at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
     at java.util.concurrent.ArrayBlockingQueue.offer(ArrayBlockingQueue.java:287)
     at com.sleepycat.je.dbi.DiskOrderedCursorImpl$2.processLSN(DiskOrderedCursorImpl.java:415)
     at com.sleepycat.je.dbi.SortedLSNTreeWalker.callProcessLSNHandleExceptions(SortedLSNTreeWalker.java:581)
     at com.sleepycat.je.dbi.SortedLSNTreeWalker.processResidentChild(SortedLSNTreeWalker.java:488)
     at com.sleepycat.je.dbi.DiskOrderedCursorImpl$DiskOrderedCursorTreeWalker.processResidentChild(DiskOrderedCursorImpl.java:236)
     at com.sleepycat.je.dbi.SortedLSNTreeWalker.accumulateLSNs(SortedLSNTreeWalker.java:463)
     at com.sleepycat.je.dbi.DiskOrderedCursorImpl$DiskOrderedCursorTreeWalker.walkInternal(DiskOrderedCursorImpl.java:206)
     at com.sleepycat.je.dbi.SortedLSNTreeWalker.walk(SortedLSNTreeWalker.java:315)
     at com.sleepycat.je.dbi.DiskOrderedCursorImpl$1.run(DiskOrderedCursorImpl.java:358)

"voldemort-niosocket-server2" daemon prio=10 tid=0x00007f951cc31800 nid=0x5738 waiting on condition [0x00007f93fc4c3000]
java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <0x00000003021a79d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
     at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
     at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
     at com.sleepycat.je.latch.SharedLatch.acquireShared(SharedLatch.java:149)
     at com.sleepycat.je.tree.IN.latchShared(IN.java:484)
     at com.sleepycat.je.tree.Tree.getRootINInternal(Tree.java:2060)
     at com.sleepycat.je.tree.Tree.getRootIN(Tree.java:2036)
     at com.sleepycat.je.tree.Tree.search(Tree.java:1215)
     at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:2069)
     at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2666)
     at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2576)
     at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2430)
     at com.sleepycat.je.Cursor.search(Cursor.java:2397)
     - locked <0x000000029823d400> (a com.sleepycat.je.Cursor)
     at com.sleepycat.je.Database.get(Database.java:1042)
     at voldemort.store.bdb.BdbStorageEngine.get(BdbStorageEngine.java:233)
     at voldemort.store.bdb.BdbStorageEngine.get(BdbStorageEngine.java:73)
     at voldemort.server.protocol.admin.FetchEntriesStreamRequestHandler.handleRequest(FetchEntriesStreamRequestHandler.java:69)
     at voldemort.server.niosocket.AsyncRequestHandler.handleStreamRequestInternal(AsyncRequestHandler.java:305)
     at voldemort.server.niosocket.AsyncRequestHandler.handleStreamRequest(AsyncRequestHandler.java:240)
     at voldemort.server.niosocket.AsyncRequestHandler.write(AsyncRequestHandler.java:203)
     at voldemort.utils.SelectorManagerWorker.run(SelectorManagerWorker.java:100)
     at voldemort.utils.SelectorManager.run(SelectorManager.java:194)
     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:662)

The live traffic threads have the following stack

"voldemort-niosocket-server50" daemon prio=10 tid=0x00007f951cbf6000 nid=0x572e waiting on condition [0x00007f93fc8c7000]
java.lang.Thread.State: WAITING (parking)
     at sun.misc.Unsafe.park(Native Method)
     - parking to wait for <0x00000003021a79d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
     at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
     at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:941)
     at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1261)
     at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:594)
     at com.sleepycat.je.latch.SharedLatch.acquireShared(SharedLatch.java:149)
     at com.sleepycat.je.tree.IN.latchShared(IN.java:484)
     at com.sleepycat.je.tree.Tree.getRootINInternal(Tree.java:2060)
     at com.sleepycat.je.tree.Tree.getRootIN(Tree.java:2036)
     at com.sleepycat.je.tree.Tree.search(Tree.java:1215)
     at com.sleepycat.je.dbi.CursorImpl.searchAndPosition(CursorImpl.java:2069)
     at com.sleepycat.je.Cursor.searchInternal(Cursor.java:2666)
     at com.sleepycat.je.Cursor.searchAllowPhantoms(Cursor.java:2576)
     at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:2430)
     at com.sleepycat.je.Cursor.search(Cursor.java:2397)
     - locked <0x00000002997ab480> (a com.sleepycat.je.Cursor)
     at com.sleepycat.je.Database.get(Database.java:1042)
     at voldemort.store.bdb.BdbStorageEngine.get(BdbStorageEngine.java:233)
     at voldemort.store.bdb.BdbStorageEngine.get(BdbStorageEngine.java:73)
     at voldemort.store.rebalancing.RedirectingStore.get(RedirectingStore.java:136)
     at voldemort.store.rebalancing.RedirectingStore.get(RedirectingStore.java:60)
     at voldemort.store.invalidmetadata.InvalidMetadataCheckingStore.get(InvalidMetadataCheckingStore.java:105)
     at voldemort.store.invalidmetadata.InvalidMetadataCheckingStore.get(InvalidMetadataCheckingStore.java:41)
     at voldemort.store.DelegatingStore.get(DelegatingStore.java:60)
     at voldemort.store.stats.StatTrackingStore.get(StatTrackingStore.java:66)
     at voldemort.store.stats.StatTrackingStore.get(StatTrackingStore.java:39)
     at voldemort.server.protocol.vold.VoldemortNativeRequestHandler.handleGet(VoldemortNativeRequestHandler.java:311)
     at voldemort.server.protocol.vold.VoldemortNativeRequestHandler.handleRequest(VoldemortNativeRequestHandler.java:63)
     at voldemort.server.niosocket.AsyncRequestHandler.read(AsyncRequestHandler.java:130)
     at voldemort.utils.SelectorManagerWorker.run(SelectorManagerWorker.java:98)
     at voldemort.utils.SelectorManager.run(SelectorManager.java:194)
     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
     at java.lang.Thread.run(Thread.java:662)

All the other JE background threads - Cleaner, INCompressor, Checkpointer are all timed_waiting.
may be I am misconfiguring or misusing the cursor? This is very reproducible in with my dataset. So will be glad to do more incremental debugging to figure out what's going on.

Thanks
Vinoth
  • 1. Re: DiskOrderedScan hangs under concurrent load
    greybird Expert
    Currently Being Moderated
    Hi Vinoth,
    I have been testing DOScan with live traffic simultaneously hitting the database. At high enough throughput (~700/s), the scan simply stalls. I have multiple threads doing live traffic (get + updates). one thread scanning the keys, with the following config. For each such key it obtains in the scan, it does a get()
    In your post I see the JE DiskOrderedCursor producer thread, but I don't see your thread that is using the DiskOrderedCursor and calling getNext. As far as I can tell, the scan is stalled and other operations are blocked because DiskOrderedCursor.getNext is not being called. The producer thread is blocked because its queue is full, because it is not being emptied by DiskOrderedCursor.getNext.

    From the call to processResidentChild I know that the in-cache portion of the Btree is being scanned. During this initial "seeding" period, all other access to the Btree is blocked as described in the doc. This is why the other threads are blocked.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  • 2. Re: DiskOrderedScan hangs under concurrent load
    vinothchandar Newbie
    Currently Being Moderated
    Hi Mark,

    The voldemort-niosocket-server2 right below the DOProducer thread is actually the scanning thread. Both the scan thread and online threads are blocked on db.get().
    I had this doubt that it was deadlocking due to the way we have implemented our scan.

    while (docursor.getNext(k,v)){ // this has valueEntry.setPartial(true) to the key alone
    if (filter(k)){
    db.get(k)
    }
    }

    This was basically done so that we fetch the value only when its an interesting key. I think in this case, the DO Producer is blocked on the queue as you mentioned, holding a shared lock on the btree. But the scanning thread also blocks on the db.get() unable to come out and call doccursor.getNext(). Do you think the DO Producer should release the lock on the btree if blocked on the queue? it anyway has no work to do until the queue is drained by the scanning thread right?

    Let me change this to just obtain k,v for everything, avoiding the get() in the scanning thread and see how it goes. Will keep you posted.

    Thanks
    Vinoth
  • 3. Re: DiskOrderedScan hangs under concurrent load
    greybird Expert
    Currently Being Moderated
    I understand. As one of the first users of DiskOrderedCursor you've uncovered an important restriction, and one we need to document better.
    Do you think the DO Producer should release the lock on the btree if blocked on the queue?
    Access to the Btree is blocked until the seeding phase is finished. This is very unlikely to change.

    There are two approaches I can recommend.

    1) Call DiskOrderedCursorConfig.setMaxSeedNodes(0). This will reduce initial performance because only the Btree root node will be used to seed the scan. However, if most of the scan does not use the in-cache Btree anyway, the performance impact will be small. This is the simplest approach.

    2) Call DiskOrderedCursorConfig.setMaxSeedNodes and DiskOrderedCursorConfig.setQueueSize to ensure that the queue will not be filled by the initial seeding process. For example, let's say you're using the default DatabaseConfig nodeMaxEntries, 128. This means there can be at most 128 keys queued for each Btree node that is used to seed the scan. Let's say the value you pass to setMaxSeedNodes is N. Then you can call setQueueSize((N + 1) * 128) to guarantee that the queue won't fill during seeding.

    Does that make sense?
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 4. Re: DiskOrderedScan hangs under concurrent load
    greybird Expert
    Currently Being Moderated
    This was basically done so that we fetch the value only when its an interesting key.
    If the ratio of values-fetched/keys-scanned is small, then this could have a big positive performance benefit. But if this ratio is close to 1, it will have a big negative performance impact, since the fetches of the value will be random IOs.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 5. Re: DiskOrderedScan hangs under concurrent load
    greybird Expert
    Currently Being Moderated
    while (docursor.getNext(k,v)){ // this has valueEntry.setPartial(true) to the key alone
    To get this behavior you have to call DiskOrderedCursorConfig.setKeysOnly(true).
  • 6. Re: DiskOrderedScan hangs under concurrent load
    vinothchandar Newbie
    Currently Being Moderated
    Hi mark,

    yes. setPartialKeys() is what I am using and the ratio is 50% at best. So, there is leverage in bringing in the key alone.

    (1) appears simpler and more desirable (I am actually okay with not touching the inmemory cache at all, at the expense of slower scans).

    let me try that and update you.

    Thanks
    Vinoth
  • 7. Re: DiskOrderedScan hangs under concurrent load
    greybird Expert
    Currently Being Moderated
    yes. setPartialKeys() is what I am using
    So we don't confuse other readers, the method name is setKeysOnly.
    --mark                                                                                                                                                                                                                                               
  • 8. Re: DiskOrderedScan hangs under concurrent load
    vinothchandar Newbie
    Currently Being Moderated
    Hi mark,

    Following is the DO Config
    setting docc to keysOnly=true
    lsnBatchSize=9223372036854775807
    internalMemoryLimit=67108864
    queueSize=1000
    maxSeedNodes=0
    maxSeedMillisecs=5

    I still see the scan thread and producer thread both blocked similarly. Should I reduce lsnBatchSize too?

    Thanks
    Vinoth
  • 9. Re: DiskOrderedScan hangs under concurrent load
    vinothchandar Newbie
    Currently Being Moderated
    Okay tried setting LSNBatchSize to 1000 and still no luck.
  • 10. Re: DiskOrderedScan hangs under concurrent load
    greybird Expert
    Currently Being Moderated
    That's unexpected, so I'll have to investigate.
    --mark                                                                                                                                                                                                           
  • 11. Re: DiskOrderedScan hangs under concurrent load
    vinothchandar Newbie
    Currently Being Moderated
    Please let me know if you want me to try something out.
  • 12. Re: DiskOrderedScan hangs under concurrent load
    greybird Expert
    Currently Being Moderated
    Vinoth and I worked off-forum on this. The bug that caused the latch deadlock has been fixed, and the fix will be part of the next JE release. To work around this problem in the mean time, I suggest that anyone using DiskOrderedCursor is careful not to access the database being scanned in their consumer thread (the thread calling DiskOrderedCursor.getNext).

    Thank you, Vinoth, for helping us to resolve this!

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points