2 Replies Latest reply on Jan 25, 2016 5:58 PM by keithwall

    Cleaner making no progress / lnSizeCorrectionFactor became very small

    keithwall

      Hello

       

      We have an application using JE 5.0.104.  We recently experienced a problem where a JE transaction log continued to grow in size (3GB/hour).  We eliminated the possibility of an application defect by comparing the application's own accounting records against getDatabaseStats, so we know that the application was not somehow forgetting to delete records.


      Invoking cleanLog() from JEMonitor had no effect on JDB files and returned 0.  getEnvironmentStats() revealed that lnSizeCorrectionFactor had fallen to 0.079.   Turning on logging for com.sleepycat.je.cleaner.Cleaner gave us:

       

      2016-01-18 17:10:58,128 DEBUG [Cleaner-1] logging.Log4jLoggingHandler (Log4jLoggingHandler.java:83) - [myvhost] Chose lowest utilized file for cleaning. fileChosen: 0x3ca92d lnSizeCorrectionFactor: 0.07944859 totalUtilization: 2 bestFileUtilization: 0 isProbe: true

      2016-01-18 17:10:58,183 DEBUG [Cleaner-1] logging.Log4jLoggingHandler (Log4jLoggingHandler.java:83) - [myvhost] CleanerRun 1801 ends on file 0x3ca92d probe=true invokedFromDaemon=true finished=true fileDeleted=false nEntriesRead=1902 nINsObsolete=0 nINsCleaned=0 nINsDead=0 nINsMigrated=0 nBINDeltasObsolete=0 nBINDeltasCleaned=0 nBINDeltasDead=0 nBINDeltasMigrated=0 nLNsObsolete=0 nLNsCleaned=0 nLNsDead=0 nLNsMigrated=0 nLNsMarked=0 nLNQueueHits=0 nLNsLocked=0 logSummary=<CleanerLogSummary endFileNumAtLastAdjustment="0x4c44fe" initialAdjustments="5" recentLNSizesAndCounts="Cor:9191/58-Est:6998244/1386 Cor:6407/6-Est:7969221/636 Cor:5936/24-Est:7418832/444 Cor:16683/56-Est:8912936/434 Cor:17058/89-Est:8143993/523 Cor:9191/58-Est:6998244/1386 Cor:6407/6-Est:7969221/636 Cor:5936/24-Est:7418832/444 Cor:16683/56-Est:8912936/434 Cor:17058/89-Est:8143993/523 "> inSummary=<INSummary totalINCount="402" totalINSize="160621" totalBINDeltaCount="89" totalBINDeltaSize="20532" obsoleteINCount="299" obsoleteINSize="151932" obsoleteBINDeltaCount="88" obsoleteBINDeltaSize="20435"/> estFileSummary=<summary totalCount="2192" totalSize="9850987" totalINCount="491" totalINSize="181153" totalLNCount="1409" totalLNSize="9656676" maxLNSize="4317683" obsoleteINCount="387" obsoleteLNCount="975" obsoleteLNSize="1512683" obsoleteLNSizeCounted="886" getObsoleteSize="3054503" getObsoleteINSize="142782" getObsoleteLNSize="2898563" getMaxObsoleteSize="9805672" getMaxObsoleteLNSize="9649732" getAvgObsoleteLNSizeNotCounted="15571.688"/> recalcFileSummary=<summary totalCount="2192" totalSize="9850987" totalINCount="491" totalINSize="181153" totalLNCount="1409" totalLNSize="9656676" maxLNSize="0" obsoleteINCount="387" obsoleteLNCount="975" obsoleteLNSize="1529741" obsoleteLNSizeCounted="975" getObsoleteSize="1685681" getObsoleteINSize="142782" getObsoleteLNSize="1529741" getMaxObsoleteSize="1685681" getMaxObsoleteLNSize="1529741" getAvgObsoleteLNSizeNotCounted="NaN"/> lnSizeCorrection=0.07944859 newLnSizeCorrection=0.07944859 estimatedUtilization=69 correctedUtilization=82 recalcUtilization=83 correctionRejected=false

       

      On restarting the application, the JE transaction log shrank back to normal size, but the situation recurred a few hours later and continued to remanifest after subsequent restarts.  So far our attempts to understand/reproduce the exact trigger point has been unsuccessful.  The single JE environemnt is being ultimately used by many different applications and all have different traffic profiles (frequency/size of records etc).

       

      We have been monitoring getEnvironmentStats() in production and see that once lnSizeCorrectionFactor starts to fall below 1, it normally it continues to fall, and once it hits 0.5 cleaning ceases and we see the unbounded transaction log growth.


      The application is reasonably mature and has been running in production for many years without seeing this problem.  Seemingly the appearance of the problem correlates with a change that  a) upgraded from JE 5.0.73 to JE 5.0.104 b) switched to CacheMode.EVICT_LN and reduced je.maxMemory (our application caches the data itself in a higher tier, so the JE caches of leaves was redundant from our perspective).


      Prompted by Re: Disk bloat in Je 5.0.73 and the changes in 5.0.97, we are testing our application with EnvironmentConfig.CLEANER_ADJUST_UTILIZATION turned off.    So far all seems well.  We realise that without je.cleaner.fetchObsoleteSize=true the cleaner will be assuming that all records in a log are average size and thus may over clean/under clean, but we don't want to pay the cost of the additional read.


      If you could comment on any nuisances of cleaner behaviour with CacheMode.EVICT_LN and the drawbacks for using EnvironmentConfig.CLEANER_ADJUST_UTILIZATION turned off in isolation that would be appreciated.


      Many thanks in advance