I'm using bdb JE-4.1.17 as the data backend for the Project Voldemort 1.3.0 key value store. I have a write heavy application where the same key is updated several times a day. The data set is roughly 100G when in a compact and cleaned state, but will over the day grow to become almost 350G (which happens to be the disk size). When this happens, I need to shut down my application and stop Voldemort, which after a few minutes will reclaim all the "extra growth" space. I guess what happens is that obsolete data is discarded and released back to the filesystem.
After googling this issue, I noticed that the recommendation for write-heavy applications which big caches (mine is roughly 30G), is to use the bdb.checkpointer.high.priority=true setting. After making this change, I notice that disk space is continously kept tidy, but _only during the day of my last restart_. Come the new day (after 00 o'clock), growth will start again and consume all available diskspace rapidly until my next shutdown and restart.
Any insights / tips on how to deal with this would be highly appreciated! My server.properties of Voldemort is available here: http://shorttext.com/Nr4t3 perhaps it contributes to understanding what goes on.
I'm sorry that this older message was never replied to! (We sometimes have problems with the forum where email notifications don't occur, and I suspect that's what happened, but in any case it's our fault and I apologize.)
Are you still having problems with log cleaning? Be sure to look at the JE statistics to see if checkpointing and cleaning are happening promptly. The cleaner backlog stat in particular will tell you if the cleaner is getting behind.
I noticed a few things in your properties files:
- cleaner threads is 1, and should be increased when the cleaner is not keeping up.
- checkpoint interval is 2GB which is too large. log files are only deleted after a checkpoint. try 500 MB.