This discussion is archived
10 Replies Latest reply: Apr 11, 2012 9:40 AM by greybird RSS

Real world application with large data sets - memory issues with Berkley DB

928588 Newbie
Currently Being Moderated
So i am trying to use Berkley to manipulate large sets of data (we talking not terrabytes, but about 30-100G of stuff, cumulative).

Basic process is going like this
- get updates for data (for sake of simplicity - imagine big logs from some server)
- get portion of existing data that is relevant to update section, from Berkley
- apply change to portion, store results in temporary new Berkley
- repeat for each update section, slowly growing new tree
- once all is done do walk new tree, injecting it into main Berkley (removing old keys first).

No other process accessing db during update cycle.

Memory is about 40G heap, CMS is running, JDK1.7. Log files are at 1G. Box is Linux with latest kernel, and (alas) - ext3 on it.

Shared caches are at 30G mark (recommended by app) , but once file gets to about 15G -25G and then to 30 - system becomes gradually slower and basically dropping dead, with CMS going into nearly permanent failure and full GCs.. So single change cycle goes from being within a minute all the way to 45+ minutes.

It is almost like Berkley cant control its memory very well or clear caches out. If i restart - few cycles it will be fine and then after about dozen it will go into half-dead mode again.

Is there any way to try to help situation (like spiffy way to merge two trees? purge caches of Berkley more efficiently?) ? Or shall i give up and move to mySQL cluster or some other BTree implementation?

Thank you.
  • 1. Re: Real world application with large data sets - memory issues with Berkley DB
    Linda Lee Journeyer
    Currently Being Moderated
    BDB JE is used in production to manipulate some very large datasets, including those in the terabytes.

    To figure out your performance characteristics, you'll want to take two approaches. You need to look at JE environment stats to see what the system is doing -- is it incurring a lot of cache misses, for example? If some stat looks interesting, how does it correlate what your application is doing over time, and to the data set size? On the other side, you'll want to use utilities like DbCache to estimate your optimal cache size. Although you've described your hardware characteristics, and general application overview, it doesn't really give us any idea of what BDB JE itself is doing, and we are really only able to help if you ask a pretty specific question.

    The FAQ performance section is a good place to start. Also, there are some fairly length conversations in the forum about cache setting and tuning that can be helpful.
  • 2. Re: Real world application with large data sets - memory issues with Berkley DB
    greybird Expert
    Currently Being Moderated
    Your data set is not too large for a single machine with the amount of memory you've given. GC cost is a common problem and what we're recommending is to use JE 5 with CacheMode.EVICT_LN, and size your cache such that all BINs fit in memory, plus say 30% for GC working room, and leave the remainder for LNs in the file system cache (not the Java heap). Also use compressed oops and a heap size < 32 GB.

    See the javadoc for DbCacheSize.

    There are several other threads on the forum discussing this. Search for EVICT_LN.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 3. Re: Real world application with large data sets - memory issues with Berkley DB
    928588 Newbie
    Currently Being Moderated
    Thank you for responses. I tried all suggestions.
    So lets see.
    31G heap, compressed oops, CMS, initial occupation 55%

    So far.

    - berkley kindly ignores any kind of limit i am imposing on cache sizes. Not only it still keeps percentage setting at 60, around when i specifically set size to be 10g, 20g or 30g (found in env config). It keeps growing pas this without releasing memory back. Survivor space goes all the way to heap limit and blows up Java memory. I am using two storages (one is about 35G now in disk space, another is 4G), using shared cache set to be to 10/20/30. Log file size is 1G.
    - there is something fishy about Nodes/Leafs (once node data is done - i dont really need it again till next processing cycle) . I tried both BIN and LN eviction - didnt have any effect at all.

    OOMing :(

    I did look through old threads, and it looks like no one really ever made any progress on this, and they just wondering off, after getting suggestion on tune up and using LN, without actually reporting they got stuff running.
  • 4. Re: Real world application with large data sets - memory issues with Berkley DB
    greybird Expert
    Currently Being Moderated
    I appreciate that you are having a frustrating experience. In our tests, the cache size is not exceeded in this way, and setting the cache size does actually work.

    If you can send me a standalone test program (hopefully a fairly small one) that reproduces the problem, I'll debug it and either let you know what's wrong with the program or fix JE (if there is a bug). Let's resolve this.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 5. Re: Real world application with large data sets - memory issues with Berkley DB
    928588 Newbie
    Currently Being Moderated
    Thank you , Mark. I will try to write short test, that will reproduce it outside of whole system.

    Its really odd, b/c process is running absolutely fine when it gets log feeds every few hours. I mean i see it running for a week or two without leaking or exploding.
    But if i am trying to collect feeds that were accumulated for a while (and yes, doing purge and cleaning is engaged) in single run - it starts to get into that state when cache just goes up 'till it pops memory.
  • 6. Re: Real world application with large data sets - memory issues with Berkley DB
    928588 Newbie
    Currently Being Moderated
    Well after more attempts and using JE5 (we started this with JE), we got memory under control, but completely lost speed. Here are stats from processing single log..

    : Log file opens, fsyncs, reads, writes, cache misses.
         bufferBytes=104,857,600
              Total memory currently consumed by log buffers, in bytes.
         endOfLog=0x6888/0x34529d4b
              The location of the next entry to be written to the log.
         nBytesReadFromWriteQueue=0
              Number of bytes read to fulfill file read operations by reading out of the pending write queue.
         nBytesWrittenFromWriteQueue=2,345,426,495
              Number of bytes written from the pending write queue.
         nCacheMiss=36,145
              Total number of requests for database objects which were not in memory.
         nFSyncRequests=1,888
              Number of fsyncs requested through the group commit manager for actions such as transaction commits and checkpoints.
         nFSyncTime=67,929
              Total fsync time in msstat
         nFSyncTimeouts=0
              Number of fsyncs requests submitted to the group commit manager for actions such as transaction commmits and checkpoints which timed out.
         nFSyncs=1,888
              Number of fsyncs issued through the group commit manager for actions such as transaction commits and checkpoints. A subset of nLogFsyncs.
         nFileOpens=690
              Number of times a log file has been opened.
         nLogBuffers=100
              Number of log buffers currently instantiated.
         nLogFSyncs=1,938
              Total number of fsyncs of the JE log. This includes those fsyncs recorded under the nFsyncs stat
         nNotResident=36,562
              Number of request for database objects not contained within the in memory data structure.
         nOpenFiles=100
              Number of files currently open in the file cache.
         nRandomReadBytes=41,856,511,577
              Number of bytes read which required respositioning the disk head more than 1MB from the previous file position.
         nRandomReads=39,500
              Number of disk reads which required respositioning the disk head more than 1MB from the previous file position.
         nRandomWriteBytes=16,360,342,696
              Number of bytes written which required respositioning the disk head more than 1MB from the previous file position.
         nRandomWrites=10,399
              Number of disk writes which required respositioning the disk head by more than 1MB from the previous file position.
         nReadsFromWriteQueue=0
              Number of file read operations which were fulfilled by reading out of the pending write queue.
         nRepeatFaultReads=4,220
              Number of reads which had to be repeated when faulting in an object from disk because the read chunk size controlled by je.log.faultReadSize is too small.
         nSequentialReadBytes=30,042,407,393
              Number of bytes read which did not require respositioning the disk head more than 1MB from the previous file position.
         nSequentialReads=4,588
              Number of disk reads which did not require respositioning the disk head more than 1MB from the previous file position.
         nSequentialWriteBytes=37,368,736,877
              Number of bytes written which did not require respositioning the disk head more than 1MB from the previous file position.
         nSequentialWrites=19,826
              Number of disk writes which did not require respositioning the disk head by more than 1MB from the previous file position.
         nTempBufferWrites=8,387
              Number of writes which had to be completed using the temporary marshalling buffer because the fixed size log buffers specified by je.log.totalBufferBytes and je.log.numBuffers were not large enough.
         nWriteQueueOverflow=924
              Number of write operations which would overflow the Write Queue.
         nWriteQueueOverflowFailures=0
              Number of write operations which would overflow the Write Queue and could not be queued.
         nWritesFromWriteQueue=2,506
              Number of file write operations executed from the pending write queue.
    Cache: Current size, allocations, and eviction activity.
         adminBytes=103,760
              Number of bytes of JE cache used for log cleaning metadata and other administrative structure, in bytes.
         avgBatchCACHEMODE=0
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         avgBatchCRITICAL=2
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         avgBatchDAEMON=2
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         avgBatchEVICTORTHREAD=2
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         avgBatchMANUAL=0
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         cacheTotalBytes=2,814,379,522
              Total amount of JE cache in use, in bytes.
         dataBytes=2,709,417,694
              Amount of JE cache used for holding data, keys and internal Btree nodes, in bytes.
         lockBytes=468
              Number of bytes of JE cache used for holding locks and transactions, in bytes.
         nBINsEvictedCACHEMODE=0
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsEvictedCRITICAL=1,566
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsEvictedDAEMON=1,346
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsEvictedEVICTORTHREAD=16,883
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsEvictedMANUAL=0
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsFetch=298,763
              Number of BINs (bottom internal nodes) requested by btree operations. Can be used to gauge cache hit/miss ratios.
         nBINsFetchMiss=14,718
              Number of BINs (bottom internal nodes) requested by btree operations that were not in cache. Can be used to gauge cache hit/miss ratios.
         nBINsStripped=36
              The number of BINs for which the child LNs have been removed (stripped) and are no longer in the cache. BIN stripping is the most efficient form of eviction.
         nBatchesCACHEMODE=0
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nBatchesCRITICAL=303
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nBatchesDAEMON=241
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nBatchesEVICTORTHREAD=3,253
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nBatchesMANUAL=0
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nCachedBINs=1,282
              Number of BINs (bottom internal nodes) in cache. The cache holds INs and BINS, so this indicates the proportion used by each type of node. When used on shared environment caches, will only be visible via StatConfig.setFast(false)
         nCachedUpperINs=458
              Number of upper INs (non-bottom internal nodes) in cache. The cache holds INs and BINS, so this indicates the proportion used by each type of node. When used on shared environment caches, will only be visible via StatConfig.setFast(false)
         nEvictPasses=6,460
              Number of eviction passes, an indicator of the eviction activity level.
         nINCompactKey=451
              Number of INs that use a compact key representation to minimize the key object representation overhead.
         nINNoTarget=211
              Number of INs that use a compact representation when none of its child nodes arein the cache.
         nINSparseTarget=1,117
              Number of INs that use a compact sparse array representation to point to child nodes in the cache.
         nLNsFetch=64,384
              Number of LNs (data records) requested by btree operations. Can be used to gauge cache hit/miss ratios.
         nLNsFetchMiss=10,437
              Number of LNs (data records) requested by btree operations that were not in cache. Can be used to gauge cache hit/miss ratios.
         nNodesEvicted=19,791
              Number of nodes selected and removed from the cache.
         nNodesScanned=2,207,754
              Number of nodes scanned in order to select the eviction set, an indicator of eviction overhead.
         nNodesSelected=20,064
              Number of nodes which pass the first criteria for eviction, an indicator of eviction efficiency. nNodesExplicitlyEvicted plus nBINsStripped will roughly equal nNodesSelected. nNodesSelected will be somewhat larger than the sum because some selected nodes don't pass a final screening.
         nRootNodesEvicted=0
              Number of database root nodes evicted.
         nSharedCacheEnvironments=2
              Number of Environments sharing the cache.
         nThreadUnavailable=40,533
              Number of eviction tasks that were submitted to the background evictor pool, but were refused because all eviction threads were busy. The user may want to change the size of the evictor pool through the je.evictor.*threads properties.
         nUpperINsEvictedCACHEMODE=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsEvictedCRITICAL=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsEvictedDAEMON=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsEvictedEVICTORTHREAD=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsEvictedMANUAL=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsFetch=292,496
              Number of Upper INs (non-bottom internal nodes) requested by btree operations. Can be used to gauge cache hit/miss ratios.
         nUpperINsFetchMiss=15
              Number of Upper INs (non-bottom internal nodes) requested by btree operations that were not in cache. Can be used to gauge cache hit/miss ratios.
         requiredEvictBytes=0
              Number of bytes we need to evict in order to get under budget.
         sharedCacheTotalBytes=2,975,769,549
              Total amount of the shared JE cache in use, in bytes.
    Cleaning: Frequency and extent of log file cleaning activity.
         cleanerBackLog=83
              Number of files to be cleaned to reach the target utilization.
         correctedAvgLNSize=2173.706
              The corrected average LN size, for LNs whose obsolete size is not determined. Used to calculate true utilization.
         estimatedAvgLNSize=44273.105
              The estimated, or uncorrected, average LN size, for LNs whose obsolete size is not determined. Compare to correctedAvgLNSize.
         fileDeletionBacklog=0
              Number of files that are ready to be deleted.
         nBINDeltasCleaned=14,024
              Accumulated number of BINDeltas cleaned.
         nBINDeltasDead=10,486
              Accumulated number of BINDeltas that were not found in the tree anymore (deleted).
         nBINDeltasMigrated=3,538
              Accumulated number of BINDeltas migrated.
         nBINDeltasObsolete=819,964
              Accumulated number of BINDeltas obsolete.
         nCleanerDeletions=36
              Number of cleaner file deletions this session.
         nCleanerEntriesRead=185,914
              Accumulated number of log entries read by the cleaner.
         nCleanerProbeRuns=0
              Number of cleaner runs for probing utilization.
         nCleanerRuns=36
              Number of cleaner runs, including probe runs.
         nClusterLNsProcessed=0
              Accumulated number of LNs processed because they qualify for clustering.
         nINsCleaned=368
              Accumulated number of INs cleaned.
         nINsDead=186
              Accumulated number of INs that were not found in the tree anymore (deleted).
         nINsMigrated=182
              Accumulated number of INs migrated.
         nINsObsolete=21,099
              Accumulated number of INs obsolete.
         nLNQueueHits=10,386
              Accumulated number of LNs processed without a tree lookup.
         nLNsCleaned=41,811
              Accumulated number of LNs cleaned.
         nLNsDead=3,950
              Accumulated number of LNs that were not found in the tree anymore (deleted).
         nLNsLocked=0
              Accumulated number of LNs encountered that were locked.
         nLNsMarked=37,861
              Accumulated number of LNs that were marked for migration during cleaning.
         nLNsMigrated=5
              Accumulated number of LNs that were marked for migration during cleaning.
         nLNsObsolete=93,645
              Accumulated number of LNs obsolete.
         nMarkLNsProcessed=0
              Accumulated number of LNs processed because they were previously marked for migration.
         nPendingLNsLocked=0
              Accumulated number of pending LNs that could not be locked for migration because of a long duration application lock.
         nPendingLNsProcessed=0
              Accumulated number of LNs processed because they were previously locked.
         nRepeatIteratorReads=36
              Number of attempts to read a log entry larger than the read buffer size during which the log buffer couldn't be grown enough to accommodate the object.
         nToBeCleanedLNsProcessed=5
              Accumulated number of LNs processed because they are soon to be cleaned.
         totalLogSize=132,620,780,210
              Approximation of the total log size in bytes.
    Node Compression: Removal and compression of internal btree nodes.
         cursorsBins=6
              Number of BINs encountered by the INComprssor that had cursors referring to them when the compresor ran.
         dbClosedBins=0
              Number of BINs encountered by the INCompressor that had their database closed between the time they were put on the compressor queue and when the compressor ran.
         inCompQueueSize=1
              Number of entries in the INCompressor queue when the getStats() call was made.
         nonEmptyBins=0
              Number of BINs encountered by the INCompressor that were not actually empty when the compressor ran.
         processedBins=159
              Number of BINs that were successfully processed by the INCompressor.
         splitBins=0
              Number of BINs encountered by the INCompressor that were split between the time they were put on the comprssor queue and when the compressor ran.
    Checkpoints: Frequency and extent of checkpointing activity.
         lastCheckpointEnd=0x6888/0x340e8283
              Location in the log of the last checkpoint end.
         lastCheckpointId=815,157
              Id of the last checkpoint.
         lastCheckpointStart=0x6888/0x33333916
              Location in the log of the last checkpont start.
         nCheckpoints=1,887
              Total number of checkpints run so far.
         nDeltaINFlush=37,318
              Accumulated number of Delta INs flushed to the log.
         nFullBINFlush=7,994
              Accumulated number of full BINs flushed to the log.
         nFullINFlush=29,815
              Accumulated number of full INs flushed to the log.
    Environment: General environment wide statistics.
         btreeRelatchesRequired=8,208
              Returns the number of btree latch upgrades required while operating on this Environment. A measurement of contention.
    Locks: Locks held by data operations, latching contention on lock table.
         nLatchAcquireNoWaitUnsuccessful=0
              Number of unsuccessful acquireNoWait() calls.
         nLatchAcquiresNoWaitSuccessful=0
              Number of times acquireNoWait() was called when the latch was successfully acquired.
         nLatchAcquiresNoWaiters=0
              Number of times the latch was acquired without contention.
         nLatchAcquiresSelfOwned=0
              Number of times the latch was acquired it was already owned by the caller.
         nLatchAcquiresWithContention=0
              Number of times the latch was acquired when it was already owned by another thread.
         nLatchReleases=0
              Number of latch releases.
         nOwners=3
              Number of lock owners in lock table.
         nReadLocks=3
              Number of read locks currently held.
         nRequests=406,558
              Number of times a lock request was made.
         nTotalLocks=3
              Number of locks current in lock table.
         nWaiters=0
              Number of transactions waiting for a lock.
         nWaits=4
              Number of times a lock request blocked.
         nWriteLocks=0
              Number of write locks currently held.
  • 7. Re: Real world application with large data sets - memory issues with Berkley DB
    greybird Expert
    Currently Being Moderated
    To find a performance issue, you'll need to look at trends in the stats (not just one set), and correlate them with the measured performance of your app. We won't be able to help you with just the information you've given. If you do some thorough analysis, and ask very specific questions about specific correlations, we will try to answer.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 8. Re: Real world application with large data sets - memory issues with Berkley DB
    928588 Newbie
    Currently Being Moderated
    I understand :) Thats why here is second log, after some data was processing in "catch up mode"

    JBoss      
    JMX MBean Operation Result showEnvironmentStats()
    Back to Agent View           Back to MBean View           Reinvoke MBean Operation

    I/O: Log file opens, fsyncs, reads, writes, cache misses.
         bufferBytes=104,857,600
              Total memory currently consumed by log buffers, in bytes.
         endOfLog=0x7889/0x2beeb05e
              The location of the next entry to be written to the log.
         nBytesReadFromWriteQueue=0
              Number of bytes read to fulfill file read operations by reading out of the pending write queue.
         nBytesWrittenFromWriteQueue=0
              Number of bytes written from the pending write queue.
         nCacheMiss=3,390
              Total number of requests for database objects which were not in memory.
         nFSyncRequests=2
              Number of fsyncs requested through the group commit manager for actions such as transaction commits and checkpoints.
         nFSyncTime=38
              Total fsync time in msstat
         nFSyncTimeouts=0
              Number of fsyncs requests submitted to the group commit manager for actions such as transaction commmits and checkpoints which timed out.
         nFSyncs=2
              Number of fsyncs issued through the group commit manager for actions such as transaction commits and checkpoints. A subset of nLogFsyncs.
         nFileOpens=677
              Number of times a log file has been opened.
         nLogBuffers=100
              Number of log buffers currently instantiated.
         nLogFSyncs=3
              Total number of fsyncs of the JE log. This includes those fsyncs recorded under the nFsyncs stat
         nNotResident=3,391
              Number of request for database objects not contained within the in memory data structure.
         nOpenFiles=100
              Number of files currently open in the file cache.
         nRandomReadBytes=422,061,056
              Number of bytes read which required respositioning the disk head more than 1MB from the previous file position.
         nRandomReads=4,005
              Number of disk reads which required respositioning the disk head more than 1MB from the previous file position.
         nRandomWriteBytes=48,593
              Number of bytes written which required respositioning the disk head more than 1MB from the previous file position.
         nRandomWrites=2
              Number of disk writes which required respositioning the disk head by more than 1MB from the previous file position.
         nReadsFromWriteQueue=0
              Number of file read operations which were fulfilled by reading out of the pending write queue.
         nRepeatFaultReads=23
              Number of reads which had to be repeated when faulting in an object from disk because the read chunk size controlled by je.log.faultReadSize is too small.
         nSequentialReadBytes=2,749,205,070
              Number of bytes read which did not require respositioning the disk head more than 1MB from the previous file position.
         nSequentialReads=636
              Number of disk reads which did not require respositioning the disk head more than 1MB from the previous file position.
         nSequentialWriteBytes=25,890,292
              Number of bytes written which did not require respositioning the disk head more than 1MB from the previous file position.
         nSequentialWrites=5
              Number of disk writes which did not require respositioning the disk head by more than 1MB from the previous file position.
         nTempBufferWrites=3
              Number of writes which had to be completed using the temporary marshalling buffer because the fixed size log buffers specified by je.log.totalBufferBytes and je.log.numBuffers were not large enough.
         nWriteQueueOverflow=0
              Number of write operations which would overflow the Write Queue.
         nWriteQueueOverflowFailures=0
              Number of write operations which would overflow the Write Queue and could not be queued.
         nWritesFromWriteQueue=0
              Number of file write operations executed from the pending write queue.
    Cache: Current size, allocations, and eviction activity.
         adminBytes=180,469
              Number of bytes of JE cache used for log cleaning metadata and other administrative structure, in bytes.
         avgBatchCACHEMODE=0
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         avgBatchCRITICAL=0
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         avgBatchDAEMON=0
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         avgBatchEVICTORTHREAD=0
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         avgBatchMANUAL=0
              Average units of work done by one eviction pass. Along with the number of batch size, it serves as an indicator of what part of the system is doing eviction work.
         cacheTotalBytes=227,242,347
              Total amount of JE cache in use, in bytes.
         dataBytes=122,203,810
              Amount of JE cache used for holding data, keys and internal Btree nodes, in bytes.
         lockBytes=468
              Number of bytes of JE cache used for holding locks and transactions, in bytes.
         nBINsEvictedCACHEMODE=0
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsEvictedCRITICAL=0
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsEvictedDAEMON=0
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsEvictedEVICTORTHREAD=0
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsEvictedMANUAL=0
              Number of BINs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nBINsFetch=16,621
              Number of BINs (bottom internal nodes) requested by btree operations. Can be used to gauge cache hit/miss ratios.
         nBINsFetchMiss=7,253
              Number of BINs (bottom internal nodes) requested by btree operations that were not in cache. Can be used to gauge cache hit/miss ratios.
         nBINsStripped=0
              The number of BINs for which the child LNs have been removed (stripped) and are no longer in the cache. BIN stripping is the most efficient form of eviction.
         nBatchesCACHEMODE=0
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nBatchesCRITICAL=0
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nBatchesDAEMON=0
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nBatchesEVICTORTHREAD=0
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nBatchesMANUAL=0
              Number of attempts to evict, by type of evictor. Along with average batch size, it serves as an indicator of what part of the system is doing eviction work.
         nCachedBINs=7,254
              Number of BINs (bottom internal nodes) in cache. The cache holds INs and BINS, so this indicates the proportion used by each type of node. When used on shared environment caches, will only be visible via StatConfig.setFast(false)
         nCachedUpperINs=418
              Number of upper INs (non-bottom internal nodes) in cache. The cache holds INs and BINS, so this indicates the proportion used by each type of node. When used on shared environment caches, will only be visible via StatConfig.setFast(false)
         nEvictPasses=0
              Number of eviction passes, an indicator of the eviction activity level.
         nINCompactKey=2,223
              Number of INs that use a compact key representation to minimize the key object representation overhead.
         nINNoTarget=830
              Number of INs that use a compact representation when none of its child nodes arein the cache.
         nINSparseTarget=6,551
              Number of INs that use a compact sparse array representation to point to child nodes in the cache.
         nLNsFetch=5,165
              Number of LNs (data records) requested by btree operations. Can be used to gauge cache hit/miss ratios.
         nLNsFetchMiss=2,249
              Number of LNs (data records) requested by btree operations that were not in cache. Can be used to gauge cache hit/miss ratios.
         nNodesEvicted=0
              Number of nodes selected and removed from the cache.
         nNodesScanned=0
              Number of nodes scanned in order to select the eviction set, an indicator of eviction overhead.
         nNodesSelected=0
              Number of nodes which pass the first criteria for eviction, an indicator of eviction efficiency. nNodesExplicitlyEvicted plus nBINsStripped will roughly equal nNodesSelected. nNodesSelected will be somewhat larger than the sum because some selected nodes don't pass a final screening.
         nRootNodesEvicted=0
              Number of database root nodes evicted.
         nSharedCacheEnvironments=2
              Number of Environments sharing the cache.
         nThreadUnavailable=0
              Number of eviction tasks that were submitted to the background evictor pool, but were refused because all eviction threads were busy. The user may want to change the size of the evictor pool through the je.evictor.*threads properties.
         nUpperINsEvictedCACHEMODE=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsEvictedCRITICAL=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsEvictedDAEMON=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsEvictedEVICTORTHREAD=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsEvictedMANUAL=0
              Number of upper INs evicted from the cache, using the specified eviction source. As a subset of nNodesEvicted, it is an indicator of what eviction is targeting and the activity that is instigating eviction
         nUpperINsFetch=13,589
              Number of Upper INs (non-bottom internal nodes) requested by btree operations. Can be used to gauge cache hit/miss ratios.
         nUpperINsFetchMiss=351
              Number of Upper INs (non-bottom internal nodes) requested by btree operations that were not in cache. Can be used to gauge cache hit/miss ratios.
         requiredEvictBytes=0
              Number of bytes we need to evict in order to get under budget.
         sharedCacheTotalBytes=619,821,291
              Total amount of the shared JE cache in use, in bytes.
    Cleaning: Frequency and extent of log file cleaning activity.
         cleanerBackLog=477
              Number of files to be cleaned to reach the target utilization.
         correctedAvgLNSize=222.83202
              The corrected average LN size, for LNs whose obsolete size is not determined. Used to calculate true utilization.
         estimatedAvgLNSize=2496.3906
              The estimated, or uncorrected, average LN size, for LNs whose obsolete size is not determined. Compare to correctedAvgLNSize.
         fileDeletionBacklog=0
              Number of files that are ready to be deleted.
         nBINDeltasCleaned=0
              Accumulated number of BINDeltas cleaned.
         nBINDeltasDead=0
              Accumulated number of BINDeltas that were not found in the tree anymore (deleted).
         nBINDeltasMigrated=0
              Accumulated number of BINDeltas migrated.
         nBINDeltasObsolete=1,412
              Accumulated number of BINDeltas obsolete.
         nCleanerDeletions=0
              Number of cleaner file deletions this session.
         nCleanerEntriesRead=15,092
              Accumulated number of log entries read by the cleaner.
         nCleanerProbeRuns=0
              Number of cleaner runs for probing utilization.
         nCleanerRuns=2
              Number of cleaner runs, including probe runs.
         nClusterLNsProcessed=0
              Accumulated number of LNs processed because they qualify for clustering.
         nINsCleaned=0
              Accumulated number of INs cleaned.
         nINsDead=0
              Accumulated number of INs that were not found in the tree anymore (deleted).
         nINsMigrated=0
              Accumulated number of INs migrated.
         nINsObsolete=696
              Accumulated number of INs obsolete.
         nLNQueueHits=15
              Accumulated number of LNs processed without a tree lookup.
         nLNsCleaned=57
              Accumulated number of LNs cleaned.
         nLNsDead=8
              Accumulated number of LNs that were not found in the tree anymore (deleted).
         nLNsLocked=0
              Accumulated number of LNs encountered that were locked.
         nLNsMarked=49
              Accumulated number of LNs that were marked for migration during cleaning.
         nLNsMigrated=0
              Accumulated number of LNs that were marked for migration during cleaning.
         nLNsObsolete=5,071
              Accumulated number of LNs obsolete.
         nMarkLNsProcessed=0
              Accumulated number of LNs processed because they were previously marked for migration.
         nPendingLNsLocked=0
              Accumulated number of pending LNs that could not be locked for migration because of a long duration application lock.
         nPendingLNsProcessed=0
              Accumulated number of LNs processed because they were previously locked.
         nRepeatIteratorReads=0
              Number of attempts to read a log entry larger than the read buffer size during which the log buffer couldn't be grown enough to accommodate the object.
         nToBeCleanedLNsProcessed=0
              Accumulated number of LNs processed because they are soon to be cleaned.
         totalLogSize=560,296,188,988
              Approximation of the total log size in bytes.
    Node Compression: Removal and compression of internal btree nodes.
         cursorsBins=0
              Number of BINs encountered by the INComprssor that had cursors referring to them when the compresor ran.
         dbClosedBins=0
              Number of BINs encountered by the INCompressor that had their database closed between the time they were put on the compressor queue and when the compressor ran.
         inCompQueueSize=0
              Number of entries in the INCompressor queue when the getStats() call was made.
         nonEmptyBins=0
              Number of BINs encountered by the INCompressor that were not actually empty when the compressor ran.
         processedBins=0
              Number of BINs that were successfully processed by the INCompressor.
         splitBins=0
              Number of BINs encountered by the INCompressor that were split between the time they were put on the comprssor queue and when the compressor ran.
    Checkpoints: Frequency and extent of checkpointing activity.
         lastCheckpointEnd=0x7889/0x2be858f4
              Location in the log of the last checkpoint end.
         lastCheckpointId=967,412
              Id of the last checkpoint.
         lastCheckpointStart=0x7889/0x2be7ad9a
              Location in the log of the last checkpont start.
         nCheckpoints=2
              Total number of checkpints run so far.
         nDeltaINFlush=31
              Accumulated number of Delta INs flushed to the log.
         nFullBINFlush=3
              Accumulated number of full BINs flushed to the log.
         nFullINFlush=25
              Accumulated number of full INs flushed to the log.
    Environment: General environment wide statistics.
         btreeRelatchesRequired=40
              Returns the number of btree latch upgrades required while operating on this Environment. A measurement of contention.
    Locks: Locks held by data operations, latching contention on lock table.
         nLatchAcquireNoWaitUnsuccessful=0
              Number of unsuccessful acquireNoWait() calls.
         nLatchAcquiresNoWaitSuccessful=0
              Number of times acquireNoWait() was called when the latch was successfully acquired.
         nLatchAcquiresNoWaiters=0
              Number of times the latch was acquired without contention.
         nLatchAcquiresSelfOwned=0
              Number of times the latch was acquired it was already owned by the caller.
         nLatchAcquiresWithContention=0
              Number of times the latch was acquired when it was already owned by another thread.
         nLatchReleases=0
              Number of latch releases.
         nOwners=3
              Number of lock owners in lock table.
         nReadLocks=3
              Number of read locks currently held.
         nRequests=915
              Number of times a lock request was made.
         nTotalLocks=3
              Number of locks current in lock table.
         nWaiters=0
              Number of transactions waiting for a lock.
         nWaits=0
              Number of times a lock request blocked.
         nWriteLocks=0
              Number of write locks currently held.




    -------------

    Where behaviour is very different with JE5 vs JE4 - backlog of cleaner grows up like crazy (whereas with JE4 it was doing that to memory).

    cleanerBackLog=477

    Its almost like because of back to back run, even though cleaner is set to be used - it cant kick in at all. If i stop processing and restart whole application - it slowly starts to engage cleaner threads, and release disk space back.
    Does it mean i need to engage cleaner from code manually? Or there is some setting that i am overlooking completely?

    <prop name="je.env.runCleaner" value="true"/>
    utilization - tried 70/50 (je.cleaner.minUtilization/je.cleaner.minFileUtilization) but dropped it back to default as it didnt have any effect at all, so now its allegedly 50/5 (default)
  • 9. Re: Real world application with large data sets - memory issues with Berkley DB
    928588 Newbie
    Currently Being Moderated
    I also done some running with YJP to see where time goes now -
    flushDirtyNodes - 26% (split almost equally between logSiblings and flushIN)
    and fillFromFile (reading) - 36-46% of total time.
    Both lots go down to bottleneck of using RandomAccessFile read/write operations.

    Flow of process now is that we basically expanding leafs , while nodes remain same (geography points --> records per geo point.. obviously you can have only so many geo points, so data just getting expanded in leafs , which are merely list of records). We took out whole part where it was taking node out and reinjecting it back

    Edited by: 925585 on Apr 11, 2012 6:41 AM
  • 10. Re: Real world application with large data sets - memory issues with Berkley DB
    greybird Expert
    Currently Being Moderated
    Take a look at items [#20588] and [#18633] in the change log. Because of these changes, you may need to increase the number of cleaner threads.

    If the cleaner isn't running at all, and your app is writing (the cleaner is woken by writes), then run DbSpace to look at the utilization.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points