4 Replies Latest reply: Nov 27, 2012 2:03 PM by Greybird-Oracle RSS

    CleanerBacklog with lazy_migration=false

    vinothchandar
      Hi,

      We recently rewrote the BDB storage layer for Voldemort (as advised by the JE team) and seeing good improvements.. Idea is to move as much data as possible off the heap, and hence we attempted to set lazy_migration to off.
      However, we noticed that for some databases with pretty low puts/sec (~100) and write patterns (a small set of keys written over and over again for different periods of time), the cleaner (1) does not seem to keep up.
      We have plenty of IOPS though.

      I have tried two things so far, that seem to help .

      1. If I turn lazy_migration on however, backlog disappears right away. So wondering if there is a known cleaning issue in 4.1.x.

      2. Setting CLEANER_BACKGROUND_PROACTIVE_MIGRATION on with lazy_migration=off, also helps. Some help on what exactly happens (esp its impact on foreground operations) when this is turned on would help.

      3. BDB5 does not seem to have this issue. Again, it would be nice to know what improvements one can expect in BDB5 in terms of log cleaning, to make sense of this.

      Some advice/clarification appreciated.

      Thanks
      Vinoth
        • 1. Re: CleanerBacklog with lazy_migration=false
          Greybird-Oracle
          Hi Vinoth,

          Welcome back from vacation.
          We recently rewrote the BDB storage layer for Voldemort (as advised by the JE team) and seeing good improvements.. Idea is to move as much data as possible off the heap, and hence we attempted to set lazy_migration to off.
          As you know I believe lazy migration should always be off. It has no advantage and several disadvantages.
          1. If I turn lazy_migration on however, backlog disappears right away. So wondering if there is a known cleaning issue in 4.1.x.
          I suspect it is not a cleaning issue, it is simply that JE 4.1 is much less efficient -- there is much more metadata writing -- and therefore more cleaner threads are needed to keep up.

          With lazy migration on, you've moved much of the cleaner's work to the checkpointer, and also the evictor and app threads if there is eviction. A single cleaner thread may be able to better keep up with lazy migration on, but that's because other threads are assisting. It is likely that overall throughtput is lower and/or checkpoints are taking a long time.
          2. Setting CLEANER_BACKGROUND_PROACTIVE_MIGRATION on with lazy_migration=off, also helps. Some help on what exactly happens (esp its impact on foreground operations) when this is turned on would help.
          Same thing. With proactive migration, the evictor threads and app threads are assisting with cleaning during eviction. It seems very likely that you have a lot of eviction.
          3. BDB5 does not seem to have this issue. Again, it would be nice to know what improvements one can expect in BDB5 in terms of log cleaning, to make sense of this.
          JE 5 writes much less metadata (Btree internal nodes) so less cleaning is necessary and less cleaner threads are necessary to keep up.
          Some advice/clarification appreciated.
          Perhaps with a low write rate performance is not a big consideration and you'd prefer to use lazy migration in order for the cleaner to keep up. But in general it is much more efficient to increase the number of cleaner threads and turn off lazy migration.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
          • 2. Re: CleanerBacklog with lazy_migration=false
            vinothchandar
            Hi Mark,

            Yes. Good to be back :) .

            What you say makes sense, let me try lazy_migration=off and increase cleaners to see if it helps. On the side, I am going to scope out a way to get us to BDB5. Will keep you posted on this.

            Below are our cleaner/checkpointer configs.. Please let me know if you see anything odd..

            je.maxMemory=10737418240
            je.log.faultReadSize=2048
            je.env.isTransactional=true
            je.cleaner.maxBatchFiles=0
            je.cleaner.minUtilization=50
            je.cleaner.lookAheadCacheSize=8192
            je.lock.nLockTables=1
            je.cleaner.threads=1
            je.log.iteratorReadSize=8192
            je.lock.timeout=500 MILLISECONDS
            je.checkpointer.wakeupInterval=30000000
            je.checkpointer.highPriority=false
            je.checkpointer.bytesInterval=20971520
            je.cleaner.lazyMigration=true
            je.cleaner.minFileUtilization=0
            je.txn.durability=NO_SYNC,NO_SYNC,SIMPLE_MAJORITY
            je.sharedCache=true
            je.env.fairLatches=false
            je.log.fileMax=62914560

            Thanks
            Vinoth
            • 3. Re: CleanerBacklog with lazy_migration=false
              vinothchandar
              Hi mark,

              I am testing BDB5 cleaning now and I see very good improvement in terms of amount of cleanerentries read.
              For a store, we have I see it dropping 20x.

              Thanks
              Vinoth
              • 4. Re: CleanerBacklog with lazy_migration=false
                Greybird-Oracle
                Awesome! Thanks for letting us know.
                --mark