Forum Stats

  • 3,741,762 Users
  • 2,248,473 Discussions
  • 7,861,978 Comments

Discussions

[BDB JE 7.4.5] BDB storage utilization increases continuously due to unexpected data in utility data

User_BZQZ5
User_BZQZ5 Member Posts: 14 Blue Ribbon
edited Aug 3, 2020 1:25PM in Berkeley DB Java Edition

Hi,

We have an application which uses BDB JE 7.4.5 for storage purposes. The application has a spiky load pattern. Periodically, on peak loads the amount of business data stored in BDB JE reaches 12GB (or even more), and, than it shrinks back to zero (though the store is constantly updated with a new data which quickly gets removed ). After, a number of pick load iterations (usually 10-20 during the day), the store continues to grow in size even when business data size is around zero. Eventually, it consume the disk space and the application crashes. On restart, the BDB,JE  store cleans all the files and shrink back to nearly zero size.

The BDB JE is configured with CacheMode.EVICT_LN and a relatively small cache size (under 100MB). We cache business data  internally on application level. Thus, the extra caching on BDB JE level looks as being unnecessary for our use cases.

Here are the BDB JE settings we use

EnvironmentConfig.LOCK_N_LOCK_TABLES: "7"

EnvironmentConfig.STATS_COLLECT: "false"

EnvironmentConfig.FILE_LOGGING_LEVEL: "OFF"

EnvironmentConfig.CONSOLE_LOGGING_LEVEL: "OFF"

EnvironmentConfig.CLEANER_UPGRADE_TO_LOG_VERSION: "-1"

The rest of BDB JE settings are left unchanged including the number of cleaner thread being set to 1.

After several full disk consumption events caused by BDB JE, we changed the number of cleaner threads to 5. It looks like that the increase the number of  cleaner threads mitigated the issue with the disk utilization, but, it caused the performance degradation on BDB JE level. The business threads got blocked by the locks hold inside of BDB JE.

Here are some examples of such threads

"Checkpointer" #212 daemon prio=5 os_prio=0 tid=0x00007fb430362000 nid=0x3d52 runnable [0x00007fb25dede000]

   java.lang.Thread.State: RUNNABLE

at com.sleepycat.je.log.LogUtils.writePackedLong(LogUtils.java:273)

at com.sleepycat.je.dbi.DatabaseImpl.writeToLog(DatabaseImpl.java:2477)

at com.sleepycat.je.tree.MapLN.writeToLog(MapLN.java:315)

at com.sleepycat.je.log.entry.LNLogEntry.writeBaseLNEntry(LNLogEntry.java:865)

at com.sleepycat.je.log.entry.LNLogEntry.writeEntry(LNLogEntry.java:757)

at com.sleepycat.je.log.entry.BaseReplicableEntry.writeEntry(BaseReplicableEntry.java:54)

at com.sleepycat.je.log.entry.LNLogEntry.writeEntry(LNLogEntry.java:123)

at com.sleepycat.je.log.LogManager.marshallIntoBuffer(LogManager.java:740)

at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:632)

at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:476)

- locked <0x00000004a8f9b018> (a java.lang.Object)

at com.sleepycat.je.log.LogManager.logItem(LogManager.java:423)

at com.sleepycat.je.log.LogManager.log(LogManager.java:338)

at com.sleepycat.je.tree.LN.logInternal(LN.java:716)

at com.sleepycat.je.tree.LN.optionalLog(LN.java:419)

at com.sleepycat.je.dbi.CursorImpl.updateRecordInternal(CursorImpl.java:1677)

at com.sleepycat.je.dbi.CursorImpl.updateCurrentRecord(CursorImpl.java:1085)

at com.sleepycat.je.dbi.DbTree$RewriteMapLN.doWork(DbTree.java:847)

at com.sleepycat.je.tree.Tree.withRootLatchedExclusive(Tree.java:513)

at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:815)

at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:737)

at com.sleepycat.je.cleaner.UtilizationProfile.removePerDbMetadata(UtilizationProfile.java:355)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:893)

- locked <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.Cleaner.updateFilesAtCheckpointEnd(Cleaner.java:1022)

at com.sleepycat.je.recovery.Checkpointer.doCheckpoint(Checkpointer.java:910)

- locked <0x00000004a8c6a108> (a com.sleepycat.je.recovery.Checkpointer)

at com.sleepycat.je.recovery.Checkpointer.onWakeup(Checkpointer.java:559)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- <0x00000004a92077f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

- <0x00000004b067b988> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)

"Cleaner-5" #211 daemon prio=5 os_prio=0 tid=0x00007fb430360000 nid=0x3d51 waiting for monitor entry [0x00007fb25dfe0000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:669)

- waiting to lock <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)

- locked <0x00000004a8ceb978> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8ceb978> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"Cleaner-4" #210 daemon prio=5 os_prio=0 tid=0x00007fb43035e800 nid=0x3d50 waiting for monitor entry [0x00007fb25e0e1000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:669)

- waiting to lock <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)

- locked <0x00000004a8cdc4b8> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8cdc4b8> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"Cleaner-3" #209 daemon prio=5 os_prio=0 tid=0x00007fb43035c800 nid=0x3d4f waiting on condition [0x00007fb25e1e2000]

   java.lang.Thread.State: TIMED_WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0x00000004a92077f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)

at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)

at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:61)

at com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)

at com.sleepycat.je.tree.IN.latchShared(IN.java:560)

at com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)

at com.sleepycat.je.tree.Tree.search(Tree.java:2275)

at com.sleepycat.je.tree.Tree.search(Tree.java:2172)

at com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)

at com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1914)

at com.sleepycat.je.dbi.DbTree.getDb(DbTree.java:1628)

at com.sleepycat.je.dbi.DbTree.getDb(DbTree.java:1578)

at com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:1022)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:478)

- locked <0x00000004a8d15a38> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8d15a38> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"Cleaner-2" #208 daemon prio=5 os_prio=0 tid=0x00007fb430371000 nid=0x3d4e waiting for monitor entry [0x00007fb25e2e3000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:669)

- waiting to lock <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)

- locked <0x00000004a8d16fc0> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8d16fc0> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"Cleaner-1" #207 daemon prio=5 os_prio=0 tid=0x00007fb43036f000 nid=0x3d4d waiting for monitor entry [0x00007fb25e3e4000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:669)

- waiting to lock <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)

- locked <0x00000004a8d19be0> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8d19be0> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"IO-1" #203 prio=5 os_prio=0 tid=0x00007fb430355800 nid=0x3d14 waiting on condition [0x00007fb25e9e6000]

   java.lang.Thread.State: TIMED_WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0x000000049a4293e8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)

at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)

at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:61)

at com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)

at com.sleepycat.je.tree.IN.latchShared(IN.java:560)

at com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)

at com.sleepycat.je.tree.Tree.search(Tree.java:2275)

at com.sleepycat.je.tree.Tree.search(Tree.java:2172)

at com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)

at com.sleepycat.je.Cursor.searchExact(Cursor.java:4195)

at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056)

at com.sleepycat.je.Cursor.search(Cursor.java:3858)

- locked <0x000000047f04cf48> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Database.deleteInternal(Database.java:1161)

at com.sleepycat.je.Database.delete(Database.java:1128)

at com.sleepycat.je.Database.delete(Database.java:1230)

    ...

   

"IO-2" #200 prio=5 os_prio=0 tid=0x00007fb43034f800 nid=0x3d11 waiting on condition [0x00007fb25ece8000]

   java.lang.Thread.State: TIMED_WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0x000000049a427418> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)

at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)

at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:61)

at com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)

at com.sleepycat.je.tree.IN.latchShared(IN.java:560)

at com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)

at com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1817)

at com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1748)

at com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1666)

at com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:1637)

at com.sleepycat.je.dbi.CursorImpl.insertRecordInternal(CursorImpl.java:1337)

at com.sleepycat.je.dbi.CursorImpl.insertOrUpdateRecord(CursorImpl.java:1221)

at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2962)

at com.sleepycat.je.Cursor.putNotify(Cursor.java:2800)

at com.sleepycat.je.Cursor.putNoDups(Cursor.java:2647)

at com.sleepycat.je.Cursor.putInternal(Cursor.java:2478)

- locked <0x000000047fe3c400> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Cursor.putInternal(Cursor.java:830)

at com.sleepycat.je.Database.put(Database.java:1574)

at com.sleepycat.je.Database.put(Database.java:1627)

    ...

"IO-3" #197 prio=5 os_prio=0 tid=0x00007fb430349800 nid=0x3d0e waiting for monitor entry [0x00007fb25efec000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:470)

- waiting to lock <0x00000004a8f9b018> (a java.lang.Object)

at com.sleepycat.je.log.LogManager.logItem(LogManager.java:423)

at com.sleepycat.je.log.LogManager.log(LogManager.java:338)

at com.sleepycat.je.tree.LN.logInternal(LN.java:713)

- locked <0x000000046d6e2388> (a com.sleepycat.je.txn.Txn)

at com.sleepycat.je.tree.LN.optionalLog(LN.java:419)

at com.sleepycat.je.dbi.CursorImpl.deleteCurrentRecord(CursorImpl.java:962)

at com.sleepycat.je.Cursor.deleteNoNotify(Cursor.java:2412)

- locked <0x000000046d6e24e8> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Cursor.deleteInternal(Cursor.java:2353)

at com.sleepycat.je.Database.deleteInternal(Database.java:1169)

at com.sleepycat.je.Database.delete(Database.java:1128)

at com.sleepycat.je.Database.delete(Database.java:1230)

    ...

"IO-4" #190 prio=5 os_prio=0 tid=0x00007fb43033b800 nid=0x3d07 waiting for monitor entry [0x00007fb25f6f3000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:470)

- waiting to lock <0x00000004a8f9b018> (a java.lang.Object)

at com.sleepycat.je.log.LogManager.logItem(LogManager.java:423)

at com.sleepycat.je.log.LogManager.log(LogManager.java:338)

at com.sleepycat.je.tree.LN.logInternal(LN.java:713)

- locked <0x000000047f748c90> (a com.sleepycat.je.txn.Txn)

at com.sleepycat.je.tree.LN.optionalLog(LN.java:419)

at com.sleepycat.je.dbi.CursorImpl.deleteCurrentRecord(CursorImpl.java:962)

at com.sleepycat.je.Cursor.deleteNoNotify(Cursor.java:2412)

- locked <0x000000047f748d80> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Cursor.deleteInternal(Cursor.java:2353)

at com.sleepycat.je.Database.deleteInternal(Database.java:1169)

at com.sleepycat.je.Database.delete(Database.java:1128)

at com.sleepycat.je.Database.delete(Database.java:1230)

    ...

"IO-5" #187 prio=5 os_prio=0 tid=0x00007fb430335800 nid=0x3d04 waiting on condition [0x00007fb25f9f6000]

   java.lang.Thread.State: TIMED_WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0x000000049a427418> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)

at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)

at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:61)

at com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)

at com.sleepycat.je.tree.IN.latchShared(IN.java:560)

at com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)

at com.sleepycat.je.tree.Tree.search(Tree.java:2275)

at com.sleepycat.je.tree.Tree.search(Tree.java:2172)

at com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)

at com.sleepycat.je.Cursor.searchExact(Cursor.java:4195)

at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056)

at com.sleepycat.je.Cursor.search(Cursor.java:3858)

- locked <0x000000047793d958> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Database.deleteInternal(Database.java:1161)

at com.sleepycat.je.Database.delete(Database.java:1128)

at com.sleepycat.je.Database.delete(Database.java:1230)

...

The heap dumps collected for the issues were dominated by instances of com.sleepycat.je.cleaner.DbFileSummary referred from com.sleepycat.je.cleaner.DbFileSummaryMap in DatabaseImpl for _jeUtilization, _jeExpiration and our sequence database.

What we noticed, that regardless a number of cleaner threads, the amount of instances of com.sleepycat.je.cleaner.DbFileSummary is constantly growing (for  _jeUtilization, _jeExpiration and our sequence databases). The databases for storing business data do not exhibit such behavior.

For example, on some point we had 188,574 instances of DbFileSummary referred from _jeUtilization, 173585  instances created for _jeExpiration and 202091 instances created for sequence database. In a couple of hours, the number of DbFileSummary instances has instanced for databases _jeUtilization, _jeExpiration and sequence db to 188734, 173752 and 202104 accordingly. On other hand, the number of com.sleepycat.je.cleaner.FileSummary instances was 33 and 21 for the above points of time.

We are currently using BDB JE version 7.4.5. We have not tested the latest version 7.5.11 due to other issues, and, I am not sure whether upgrade to version 7.5.11 could solve our problem with disk usage and performance degradation.

At the moment it is not completely clear for me why storage consumption grows in size even when there is no much data stored in the business databases.

How can I evaluate an optimal number of cleaner threads? Perhaps, using more cleaner threads can solve our problem with disk utilization and performance degradation.

We can collect and provide more details about the reported issue, as it manifests quite often.

Kind Regards,

Alex

Answers

  • Aditya Tripathi-Oracle
    Aditya Tripathi-Oracle Member Posts: 4
    edited Aug 3, 2020 1:25PM

    Hi Alex,

    It will be difficult to say anything conclusively without logs. I have gone through your post once and will try to explain some of your questions:

    #1: "What we noticed, that regardless a number of cleaner threads, the amount of instances of com.sleepycat.je.cleaner.DbFileSummary is constantly growing (for  _jeUtilization, _jeExpiration and our sequence databases)"

    Reason:The JE internal databases _jeUtilization, _jeExpiration are storing tracked utilization data per file (Tracking utilization data involves counting every entry which gets obsoleted and every new entry written) and expiration data per file. This tracking is constantly happening. For eg Utilization Tracking involves tracking any CRUD operation on any database - internal or external, and apart from that there's some metadata information which is written to JDB Files which is being tracked in these databases.  Utilization and Expiration tracking is constantly done in short. These database grow at checkpoint time majorly. And shrink when Cleaner has cleaned some files and after that checkpointing is completed. Note that checkpointer thread should complete for any shrinking of data.

    #2: About 5 cleaner threads ending up interfering with your application threads. What is happening in the thread dump shown is that Checkpointer thread has locked the single Cleaner object and all other Cleaner threads are waiting for that to finish. Meanwhile checkpointer is trying to modify the root of all databases and thus does not permit any operations on any database - that is where your application threads are getting blocked. The reason it needs to take a lock on the root database is because the older JE versions stores per file Utilization in database. Hence to update that this root lock is required and this interferes with your application. This is no more the case in recent JE versions. I will have to find out exactly when was this perFileUtilization was removed from database.

    However, one more possible problem with 5 cleaner thread is that they will all try to lock the single Cleaner object and that will make Checkpointer wait. That is not a good scenario. Checkpointer thread should finish else cleaning does not result in any disk size shrinkage. Also the size of DbFileSummary will not reduce without checkpointing.

    Also in case the cache size is small, it would result in more obsoletion data because the cache tree will be writing many internal nodes to the JDB files which results in more obsoletion - that is more work for cleaner.

    #3: At the moment it is not completely clear for me why storage consumption grows in size even when there is no much data stored in the business databases.

    Reason: We will need log files to say anything on this. Note that data expiration would cause some data to be written but if cleaner and checkpointer finish it should also shrink some data.

    #4:How can I evaluate an optimal number of cleaner threads? Perhaps, using more cleaner threads can solve our problem with disk utilization and performance degradation.

    Difficult to recommend anything. But as explained you can try reducing it to two and check. However, checking logs is important - especially at some time before the point when disk limit is being hit.

Sign In or Register to comment.