In practice, workloads tend to have hotsets, as mentioned in the faq. My understanding is that as long as the cache is big enough to house the hotset (which should naturally happen due to LRU), we should be good?Yes. My intention in mentioning the item in the FAQ was to try to discourage you from running in a mode where internal nodes in the Btree are not resident in the cache, for the hot data, especially data being written. If you're not aware of the issue in the FAQ, you might believe that as long as disk IO is fast enough (for example using an SSD), then keeping the internal nodes in the cache is not necessary.
inputs: records=178680261 keySize=78 dataSize=-1 nodeMax=512 binMax=512 density=80% overhead=10%This output indicates you're not running JE 5 yet. You'll see different values from DbCacheSize in JE 5. The JE 5 DbCacheSize javadoc also talks about key size and a config param that can reduce the amount of memory used by internal nodes by quite a bit. Reducing the amount of memory needed has a big impact of course.
That said, when planning for the future, (which is what I am doing right now), its hard to predict how this hotset will grow. Workloads I deal with are almost always read heavy (hence Bdb cleaners do not come into play). Hence I was trying to estimate the worst case latency by going to disk. I am effectively trying to come up with a capacity model here and if you add GC to the picture, wow, its hard!I see, that makes sense, I believe your thinking is correct. Yeah, it is hard, sorry I don't have more suggestions.