I am trying to understand or atleast hypothesize what's going on , in this particular perf degradation I am currently investigating..
We have a BDB env with a single database, which was previously receiving close to 100 puts/sec and doing fine (i.e no CleanerBacklogs) with 5GB of cache.
When the store was n't needed anymore (0 get/sec and 0 puts/sec for 4 days) we forced the cache back down to 512MB and we saw the cleanerBacklog grow to like 50. (with lots of checkpointing too)
Then, we reverted the memory limit, it grows back to 5GB and now its happy..
This confuses me to no end..
I have tested BDB5 and it seems to perform much better.
But still, this bothers me since it seems like a very basic behavioral difference and I wan't to understand what the problem is and if it affects BDB5 also..
I'm not sure, Vinoth, but the first suspect is that recovery is doing a lot of work when re-opening the environment with the smaller cache, and this starts a cycle of eviction/cleaning.
Are you doing a normal shutdown before re-opening with the smaller cache?
In any case perhaps the JE 5 change that impacts this situation is this one:
Improvements were made to recovery (Environment open) performance by changing the behavior of checkpoints in certain cases. Recovery should always be very quick after the following types of checkpoints:
CheckpointConfig.setMinimizeRecoveryTime(true)is used along with an explicit checkpoint performed by calling the
Environment.closeis called, since it performs a final checkpoint.
In addition, a problem was fixed where periodic checkpoints (performed by the checkpointer thread or by calling
Environment.checkpoint) would cause long recovery times under certain circumstances. As a part of this work, the actions invoked by ReplicatedEnvironment.shutdownGroup() were streamlined to use the setMinimizeRecoveryTime() option and to reduce spurious timeouts during the shutdown processing. [#19559]