Forum Stats

  • 3,770,464 Users
  • 2,253,118 Discussions
  • 7,875,466 Comments

Discussions

space utilization with 5.0.103

user4173612
user4173612 Member Posts: 14
edited Mar 24, 2014 2:55PM in Berkeley DB Java Edition

We recently upgraded from 4.1.27 to 5.0.103 and are seeing some space utilization problems.  We have a DB environment that contains 7 Databases.  This data is updated once an hour by a batch process that runs for a few minutes and then is read only for rest of the time.  We have a staging environment where we are running 5.0.103 and a production environment still running 4.1.27.  The data in both is the same.  The 4.1.27 environment occupies 51G of disk space.  When I run the DbSpace utility it shows:

  FileSize (KB)  % Used

--------  ---------  ------

TOTALS52486001  49

In contrast the 5.0.103 environment occupies 163G and the DbSpace utility shows:

  FileSize (KB)  % Used

--------  ---------  ------

TOTALS   169471834  23

To gather some statistics I stopped the 5.0.103 environment, restarted and ran a task to capture some EnvironmentStats. No other processes were running so no updates or reads.  Here are some of the stats:
cleanerBackLog 120

lastKnownUtilization 60

nBINDeltasCleaned 20058

nCleanerDeletions 28

nCleanerEntriesRead 383665

nCleanerProbeRuns 0

nCleanerRuns 31

An hour later:

cleanerBackLog 122

lastKnownUtilization 60

nBINDeltasCleaned 138440

nCleanerDeletions 184

nCleanerEntriesRead 2388887

nCleanerProbeRuns 0

nCleanerRuns 187

The disk used is now 165G and DbSpace shows:

  FileSize (KB)  % Used

--------  ---------  ------

TOTALS   173304217  22

So after the cleaner running by itself for an hour we seem to have less utilization.  Here are my environment settings:

CacheMode: null

CacheModeStrategy: null

CachePercent: 30

CacheSize: 9019431321

EnvironmentConfig: allowCreate=true

cacheSize=9019431321

txnNoSync=false

txnWriteNoSync=false

{je.env.runCleaner=true, je.log.fileMax=1073741824, je.evictor.nodesPerScan=100, je.log.totalBufferBytes=4194304, je.evictor.lruOnly=false, je.cleaner.minUtilization=60, je.cleaner.threads=3, je.env.isTransactional=false, je.env.isReadOnly=false, je.maxMemoryPercent=30, je.log.faultReadSize=786432, je.log.bufferSize=1048576, je.lock.nLockTables=19, je.log.numBuffers=4, je.log.iteratorReadSize=32768}

Have I misconfigured something so bad as to cause this or is it an issue with 5.0.103 that did not exist in 4.1.27?

Thanks,

Chris

Answers

  • I left the server running idle the rest of the wekend and checked on it this morning.

    cleanerBackLog 132

    lastKnownUtilization 60

    nBINDeltasCleaned 2658242

    nCleanerDeletions 8263

    nCleanerEntriesRead 86085270

    nCleanerProbeRuns 0

    nCleanerRuns 8262

    However the disk space used is 176G and utilization is still only:

      File    Size (KB)  % Used
    --------  ---------  ------
    TOTALS   183752395      22
    (LN size correction factor: 0.017403398)

    Looks like the cleaner threads are just churning and not making up any ground.

  • Hi,

    You just posted the info I was going to ask for: the LN size correction factor.  There is a known issue in JE 5 where this factor can become incorrect.  The fact that the correction factor is such a small number (0.017403398) indicates it may be the problem.  I suggest that you try disabling it by setting EnvironmentConfig.CLEANER_ADJUST_UTILIZATION to false.  For details on why this may be necessary, see the javadoc for this parameter.

    --mark

  • Thanks for the reply,

    I removed my overrides for je.cleaner.threads and je.cleaner.minUtilization so they shoulf be the defaults of 1 and 50.

    I set je.cleaner.adjustUtilization=false and looking at the javadoc I also set je.cleaner.fetchObsoleteSize=true.

    * To guarantee that utilization will be calculated accurately for all
    * workloads and that automatic adjustments will not interfere, {@link
    * #CLEANER_FETCH_OBSOLETE_SIZE} should be set to true and {@link
    * #CLEANER_ADJUST_UTILIZATION} should be set to false.

    After running with these settings for about 45 minutes I'm no seeing any real change is the disk used.  After restarting with new configuration I had:

    cleanerBackLog 117

    lastKnownUtilization 50

    nBINDeltasCleaned 4689

    nCleanerDeletions 14

    nCleanerEntriesRead 189650

    nCleanerRuns 15

    45 minutes later

    cleanerBackLog 117

    lastKnownUtilization 50

    nBINDeltasCleaned 34748

    nCleanerDeletions 106

    nCleanerEntriesRead 1270360

    nCleanerRuns 107

    Disk space used is 173G and utilization is:

      FileSize (KB)  % Used

    --------  ---------  ------

    TOTALS   181576911  22

    (LN size correction factor: 0.013598419)

    Should I have not set CLEANER_FETCH_OBSOLETE_SIZE to true?

    Thanks,

    Chris

  • I suggest you increase number of cleaner threads to a larger number (say 5, increase the cache size to the largest possible value you can afford, and let it run for much longer (several hours).  It may be that the backlog is so large that it needs more resources to catch up.

    If the cleaner is still unable to catch up:

    1) Are you using duplicates (Database.setSortedDuplicates or DPL many-to-x indexes)?  If so, how much of your data set uses duplicates?

    2) Please do a "DbSpace -d -r" and post the last couple hundred lines of the output.

    --mark

  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690

    CLEANER_FETCH_OBSOLETE_SIZE isn't going to make a difference during this catch-up phase, while you're running the cleaner but no application operations.  When you start application operations again, setting CLEANER_FETCH_OBSOLETE_SIZE to true is likely to reduce performance if you're doing "blind" updates or deletions, which are where you don't read the record before you update/delete it.

  • user4173612
    user4173612 Member Posts: 14

    Set je.cleaner.threads=5 and je.maxMemoryPercent=50 running on server with 32G memory and will let it run for a while.

    1) yes, we do use duplicates for some of our Databases.  I earlier ran a task that calculates some statistics for the data as shown below:

    showDbDataStats complete for 7 databases, duration 00:21:14,046

    DbDataStats for ControlDataStore

        records 8

        key bytes 151 min 9 max 28 avg 18

        data bytes 64 min 8 max 8 avg 8

    DbDataStats for DomesticFareRecord.pks

        records 376,356

        key bytes 4,328,094 min 10 max 13 avg 11

        data bytes 6,021,696 min 16 max 16 avg 16

    DbDataStats for DomesticFareRecord.rs

        records 188,178

        key bytes 3,010,848 min 16 max 16 avg 16

        data bytes 1,638,667,440 min 180 max 208,800 avg 8,708

    DbDataStats for InternationalArbitraryRecord.pks

        records 435,488

        key bytes 3,919,392 min 5 max 13 avg 9

        data bytes 6,967,808 min 16 max 16 avg 16

    DbDataStats for InternationalArbitraryRecord.rs

        records 217,744

        key bytes 3,483,904 min 16 max 16 avg 16

        data bytes 986,818,500 min 180 max 276,660 avg 4,532

    DbDataStats for InternationalFareRecord.pks

        records 2,642,664

        key bytes 30,390,636 min 10 max 13 avg 11

        data bytes 42,282,624 min 16 max 16 avg 16

    DbDataStats for InternationalFareRecord.rs

        records 1,321,332

        key bytes 21,141,312 min 16 max 16 avg 16

        data bytes 23,697,909,524 min 326 max 2,098,788 avg 17,934

    All of the databases were created with dbConfig.setSortedDuplicates( true ), but only the databases that end with the name .pks actually use duplicate records.  The .rs Databases do not use duplicates (even thought they were created that way).  When records are updated they are deleted first to prevent duplicates  in those Databases.

    The DbSpace utility threw an exception when running -d -r.

    STG [email protected]:/home/atpdata$ java -jar je-5.0.103.jar DbSpace -h storageengine/intl-all-dom/fares -d -r

    java.lang.ArrayIndexOutOfBoundsException: 10

            at com.sleepycat.je.tree.Key.compareUnsignedBytes(Key.java:214)

            at com.sleepycat.je.dbi.DupKeyData$TwoPartKeyComparator.compare(DupKeyData.java:194)

            at com.sleepycat.je.dbi.DupKeyData$TwoPartKeyComparator.compare(DupKeyData.java:167)

            at com.sleepycat.je.tree.Key.compareKeys(Key.java:188)

            at com.sleepycat.je.tree.IN.findEntry(IN.java:2267)

            at com.sleepycat.je.tree.IN.findEntry(IN.java:2188)

            at com.sleepycat.je.tree.IN.findParent(IN.java:3102)

            at com.sleepycat.je.tree.Tree.getParentINForChildIN(Tree.java:784)

            at com.sleepycat.je.tree.Tree.getParentINForChildIN(Tree.java:715)

            at com.sleepycat.je.log.UtilizationFileReader.isINActive(UtilizationFileReader.java:214)

            at com.sleepycat.je.log.UtilizationFileReader.processEntry(UtilizationFileReader.java:144)

            at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:340)

            at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:229)

            at com.sleepycat.je.log.UtilizationFileReader.calcFileSummaryMap(UtilizationFileReader.java:287)

            at com.sleepycat.je.util.DbSpace.print(DbSpace.java:227)

            at com.sleepycat.je.util.DbSpace.main(DbSpace.java:70)

            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

            at java.lang.reflect.Method.invoke(Method.java:606)

            at com.sleepycat.je.utilint.JarMain.main(JarMain.java:99)

  • Greybird-Oracle
    Greybird-Oracle Member Posts: 2,690

    The problem is very likely due to configuring duplicates for the databases that do not contain duplicates and contain large data sizes.  This happened to be OK in JE 4.1, but will cause severe performance problems (including the cleaning problem you're experiencing) in JE 5.  This was discussed here in more detail:

    Clarification on 4.x handling of duplicates

    I'm afraid to use JE 5 you'll need to change the .rs databases so that duplicates are not configured.  You can only change this setting when the databases do not yet exist.  To convert existing data, one way is to run DbDump to export the .rs DBs to text files, change the duplicates line at the top of the tex files, and import them into an empty environment with DbLoad.  An empty environment is not mandatory, but be sure to at least remove those databases (Environment.removeDatabase) before the DbLoad.

    The exception you got from DbSpace may be a bug, but I'm not sure.  Did you run the DbPreUpgrade_4_1 before upgrading to JE 5, as described in the JE 5 change log?

    In any case, you can run DbSpace again with just -d (no -r) to avoid the exception.  However, I don't think this is necessary now, since I understand the problem -- it's due to the DBs configured for duplicates.

    --mark

This discussion has been closed.