3 Replies Latest reply: Jul 20, 2012 8:54 AM by Greybird-Oracle RSS

    Secondary database performance and CacheMode

    943066
      This is somewhat a follow-on thread related to: Lock/isolation with secondary databases

      In the same environment, I'm noticing fairly low-performance numbers on my queries, which are essentially a series of key range-scans on my secondary index.

      Example output:
      -----
      08:07:37.803 BDB - Retrieved 177 entries out of index (177 ranges, 177 iters, 1.000 iters/range) in: 87ms
      08:07:38.835 BDB - Retrieved 855 entries out of index (885 ranges, 857 iters, 0.968 iters/range) in: 346ms
      08:07:40.838 BDB - Retrieved 281 entries out of index (283 ranges, 282 iters, 0.996 iters/range) in: 101ms
      08:07:41.944 BDB - Retrieved 418 entries out of index (439 ranges, 419 iters, 0.954 iters/range) in: 160ms
      08:07:44.285 BDB - Retrieved 2807 entries out of index (2939 ranges, 2816 iters, 0.958 iters/range) in: 1033ms
      08:07:50.422 BDB - Retrieved 253 entries out of index (266 ranges, 262 iters, 0.985 iters/range) in: 117ms
      08:07:52.095 BDB - Retrieved 2838 entries out of index (3021 ranges, 2852 iters, 0.944 iters/range) in: 835ms
      08:07:58.253 BDB - Retrieved 598 entries out of index (644 ranges, 598 iters, 0.929 iters/range) in: 193ms
      08:07:59.912 BDB - Retrieved 143 entries out of index (156 ranges, 145 iters, 0.929 iters/range) in: 32ms
      08:08:00.788 BDB - Retrieved 913 entries out of index (954 ranges, 919 iters, 0.963 iters/range) in: 326ms
      08:08:03.087 BDB - Retrieved 325 entries out of index (332 ranges, 326 iters, 0.982 iters/range) in: 103ms
      -----

      To explain those numbers, a "range" corresponds to a sortedMap.subMap() call (ie: a range scan between a start/end key) and iters is the number of iterations over the subMap results to find the entry we were after (implementation detail).

      In most cases, the iters/range is close to 1, which means that only 1 key is traversed per subMap() call - so, in essence, 500 entries means 500 ostensibly random range-scans, taking only the first item out of each rangescan.

      However, it seems kind of slow - 2816 entries is taking 1033ms, which means we're really seeing a key/query rate of ~2700 keys/sec.

      Here's performance profile output of this process happening (via jvisualvm): https://img.skitch.com/20120718-rbrbgu13b5x5atxegfdes8wwdx.jpg

      Here's stats output after it running for a few minutes:
      -----
      I/O: Log file opens, fsyncs, reads, writes, cache misses.
      bufferBytes=3,145,728
      endOfLog=0x143b/0xd5b1a4
      nBytesReadFromWriteQueue=0
      nBytesWrittenFromWriteQueue=0
      nCacheMiss=1,954,580
      nFSyncRequests=11
      nFSyncTime=12,055
      nFSyncTimeouts=0
      nFSyncs=11
      nFileOpens=602,386
      nLogBuffers=3
      nLogFSyncs=96
      nNotResident=1,954,650
      nOpenFiles=100
      nRandomReadBytes=6,946,009,825
      nRandomReads=2,577,442
      nRandomWriteBytes=1,846,577,783
      nRandomWrites=1,961
      nReadsFromWriteQueue=0
      nRepeatFaultReads=317,585
      nSequentialReadBytes=2,361,120,318
      nSequentialReads=653,138
      nSequentialWriteBytes=262,075,923
      nSequentialWrites=257
      nTempBufferWrites=0
      nWriteQueueOverflow=0
      nWriteQueueOverflowFailures=0
      nWritesFromWriteQueue=0
      Cache: Current size, allocations, and eviction activity.
      adminBytes=248,252
      avgBatchCACHEMODE=0
      avgBatchCRITICAL=0
      avgBatchDAEMON=0
      avgBatchEVICTORTHREAD=0
      avgBatchMANUAL=0
      cacheTotalBytes=2,234,217,972
      dataBytes=2,230,823,768
      lockBytes=224
      nBINsEvictedCACHEMODE=0
      nBINsEvictedCRITICAL=0
      nBINsEvictedDAEMON=0
      nBINsEvictedEVICTORTHREAD=0
      nBINsEvictedMANUAL=0
      nBINsFetch=7,104,094
      nBINsFetchMiss=575,490
      nBINsStripped=0
      nBatchesCACHEMODE=0
      nBatchesCRITICAL=0
      nBatchesDAEMON=0
      nBatchesEVICTORTHREAD=0
      nBatchesMANUAL=0
      nCachedBINs=575,857
      nCachedUpperINs=8,018
      nEvictPasses=0
      nINCompactKey=268,311
      nINNoTarget=107,602
      nINSparseTarget=468,257
      nLNsFetch=1,771,930
      nLNsFetchMiss=914,516
      nNodesEvicted=0
      nNodesScanned=0
      nNodesSelected=0
      nRootNodesEvicted=0
      nThreadUnavailable=0
      nUpperINsEvictedCACHEMODE=0
      nUpperINsEvictedCRITICAL=0
      nUpperINsEvictedDAEMON=0
      nUpperINsEvictedEVICTORTHREAD=0
      nUpperINsEvictedMANUAL=0
      nUpperINsFetch=11,797,499
      nUpperINsFetchMiss=8,280
      requiredEvictBytes=0
      sharedCacheTotalBytes=0
      Cleaning: Frequency and extent of log file cleaning activity.
      cleanerBackLog=0
      correctedAvgLNSize=87.11789
      estimatedAvgLNSize=82.74727
      fileDeletionBacklog=0
      nBINDeltasCleaned=2,393,935
      nBINDeltasDead=239,276
      nBINDeltasMigrated=2,154,659
      nBINDeltasObsolete=35,516,504
      nCleanerDeletions=96
      nCleanerEntriesRead=9,257,406
      nCleanerProbeRuns=0
      nCleanerRuns=96
      nClusterLNsProcessed=0
      nINsCleaned=299,195
      nINsDead=2,651
      nINsMigrated=296,544
      nINsObsolete=247,703
      nLNQueueHits=2,683,648
      nLNsCleaned=5,856,844
      nLNsDead=88,852
      nLNsLocked=29
      nLNsMarked=5,767,969
      nLNsMigrated=23
      nLNsObsolete=641,166
      nMarkLNsProcessed=0
      nPendingLNsLocked=1,386
      nPendingLNsProcessed=1,415
      nRepeatIteratorReads=0
      nToBeCleanedLNsProcessed=0
      totalLogSize=10,088,795,476
      Node Compression: Removal and compression of internal btree nodes.
      cursorsBins=0
      dbClosedBins=0
      inCompQueueSize=0
      nonEmptyBins=0
      processedBins=22
      splitBins=0
      Checkpoints: Frequency and extent of checkpointing activity.
      lastCheckpointEnd=0x143b/0xaf23b3
      lastCheckpointId=850
      lastCheckpointStart=0x143a/0xf604ef
      nCheckpoints=11
      nDeltaINFlush=1,718,813
      nFullBINFlush=398,326
      nFullINFlush=483,103
      Environment: General environment wide statistics.
      btreeRelatchesRequired=205,758
      Locks: Locks held by data operations, latching contention on lock table.
      nLatchAcquireNoWaitUnsuccessful=0
      nLatchAcquiresNoWaitSuccessful=0
      nLatchAcquiresNoWaiters=0
      nLatchAcquiresSelfOwned=0
      nLatchAcquiresWithContention=0
      nLatchReleases=0
      nOwners=2
      nReadLocks=2
      nRequests=10,571,692
      nTotalLocks=2
      nWaiters=0
      nWaits=0
      nWriteLocks=0
      -----

      My database(s) are sizeable, but on an SSD in a machine with more RAM than DB size (16GB vs 10GB). I have CacheMode.EVICT_LN turned on, however, am thinking this may be harmful. I have tried turning it on, but it doesn't seem to make a dramatic difference.

      Really, I only want the secondary DB cached (as this is where all the read-queries happen), however, I'm not sure if it's (meaningfully) possible to only cache a secondary DB, as presumably it needs to look up the primary DB's leaf-nodes to return data anyway.

      Additionally, the updates to the DB(s) tend to be fairly large - ie: potentially modifying ~500,000 entries at a time (which is about 2.3% of the DB), which I'm worried tends to blow the secondary DB cache (tho don't know how to prove one way or another).

      I understand different CacheModes can be set on separate databases (and even at a cursor level), however, it's somewhat opaque as to how this works in practice.

      I've tried to run DbCacheSize, but a combination of variable length keys combined with key-prefixing being enabled makes it almost impossible to get meaningful numbers out of it (or at the very least, rather confusing :)

      So, my questions are:

      - Is this actually slow in the first place (ie: 2700 random keys/sec)?
      - Can I speed this up with caching? (I've failed so far)
      - Is it possible (or useful) to cache a secondary DB in preference to the primary?
      - Would switching from using a StoredSortedMap to raw (possibly reusable) cursors give me a significant advantage?

      Thanks so much in advance,

      fb.
        • 1. Re: Secondary database performance and CacheMode
          Greybird-Oracle
          nBINsFetchMiss=575,490
          The first step in tuning the JE cache, as related to performance, is to ensure that nBINsFetchMiss goes to zero. That tells you that you've sized your cache large enough to hold all internal nodes (I know you have lots of memory, but we need to prove that by looking at the stats).

          If all your internal nodes are in cache, that means your entire secondary DB is in cache, because you've configured duplicates (right?). A dup DB does not keep its LNs in cache, so it consists of nothing but internal nodes in cache.

          If you're using EVICT_LN (please do!), you also want to make sure that nEvictPasses=0, and I see that it is.

          Here are some random hints:

          + In general always use getStats(new StatsConfig().setClear(true)). If you don't clear the stats every time interval, then they are cumulative and it's almost impossible to correlate them to what's going on in that time interval.

          + If you're starting with a non-empty env, first load the entire data set and clear the stats, so the fetches for populating the cache don't show up in subsequent stats.

          + If you're having trouble using DbCacheSize, you may want to find out experimentally how much cache is needed to hold the internal nodes, for a given data set in your app. You can do this simply by reading your data set into cache. When nEvictPasses becomes non-zero, the cache has overflowed. This is going to be much more accurate than DbCacheSize anyway.

          + When you measure performance, you need to collect the JE stats (as you have) plus all app performance info (txn rate, etc) for the same time interval. They need to be correlated. The full set of JE environment settings, database settings, and JVM params is also needed.

          On the question of using StoredSortedMap.subMap vs a Cursor directly, there may be an optimization you can make, if your LNs are not in cache, and they're not if you're using EVICT_LN, or if you're not using EVICT_LN but not all LNs fit. However, I think you can make the same optimization using StoredSortedMap.

          Namely when using a key range (whatever API is used), it is necessary to read one key past the range you want, because that's the only way to find out whether there are more keys in the range. If you use subMap or the Cursor API in the most obvious way, this will not only have to find the next key outside the range but will also fetch its LN. I'm guessing this is part of the reason you're seeing a lower operation rate than you might expect. (However, note that you're actually getting double the rate you mention from a JE perspective, because each secondary read is actually two JE reads, counting the secondary DB and primary DB.)

          Before I write a bunch more about how to do that optimization, I think it's worth confirming that the extra LN is being fetched. If you do the measurements as I described, and you're using EVICT_LN, you should be able to get the ratio of LNs fetched (nLNsFetchMiss) to the number of range lookups. So if there is only one key in the range, and I'm right about reading one key beyond it, you'll see double LNs fetched as number of operations.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
          • 2. Re: Secondary database performance and CacheMode
            943066
            Hi Mark,

            Brilliant/useful reply, as per usual :)

            The one-key-past-the-rangeend issue is definitely something I hadn't considered - I've ended up switching to the Cursor API which seems to be slightly faster and consequently wrote all the when-to-stop-iterating code, so I should have realised this :)

            I'm wondering if there's a more efficient way of deferring LN loading on a per cursor move basis, ie:

            - Position cursor at start of range
            - Move cursor forward
            - If key is within range, fetch LN (otherwise, bail)
            - Repeat

            I know you can avoid LN loading via setPartial(), however, it's unclear from the doc whether a call to cursor.getNext() with a setPartial(0, 0, true), then a subsequent call to cursor.getCurrent() to retrieve the LN causes the key to be re-read (I guess I could test that and eyeball the numbers).

            The difference in my particular query pattern is that it's quite possible the ratio of cursor.getNext()'s vs actual data I want is only ~50%.

            If it's possible to not read the key twice, this could be a much more efficient approach with my particular workload.

            Any insight/opinions you might have as to whether that's a vaguely feasible approach would be very welcome :)
            • 3. Re: Secondary database performance and CacheMode
              Greybird-Oracle
              I know you can avoid LN loading via setPartial(), however, it's unclear from the doc whether a call to cursor.getNext() with a setPartial(0, 0, true), then a subsequent call to cursor.getCurrent() to retrieve the LN causes the key to be re-read (I guess I could test that and eyeball the numbers).
              That's what I was going to suggest (using setPartial(true,0,0) for the data param), so you figured it out on your own. If you call getCurrent with a plain DatabaseEntry (no setPartial(true, 0, 0)) then it will return the data, so it has to read the LN (if it's not in cache).
              The difference in my particular query pattern is that it's quite possible the ratio of cursor.getNext()'s vs actual data I want is only ~50%.
              Yes, but would be good to prove that with stats.
              If it's possible to not read the key twice, this could be a much more efficient approach with my particular workload.
              The key will be returned by getCurrent but it is in cache and this just involves a byte array copy.
              Any insight/opinions you might have as to whether that's a vaguely feasible approach would be very welcome
              Yes, it is. Also if you use setPartial(true, 0, 0) on the original data param, when calling getSearchKey, you can avoid reading an LN when there are no keys in the range. Of course if the first key found is in range, you'll have to call getCurrent to get the data.

              --mark