This discussion is archived
5 Replies Latest reply: Dec 10, 2012 4:42 PM by greybird RSS

Understanding the correlation between cache size and cleaner performance

vinothchandar Newbie
Currently Being Moderated
Hi,

I am trying to size our caches using a capacity model, that takes into consideration the workload we run and database size/shape etc. (This is JE4.1.17. But I think it should principally apply to BDB5 as well)

We provide enough memory to
A -- cache all the Upper INs
B -- hold a complete log file while cleaning (#cleaners=1) -
C -- also additional memory to hold BINs that might be brought in by the cleaner when it migrates LNs.
D -- Plus additional memory to hold dirty BINs from online writes

(I turned off the checkpointer to make things simple).

I have the following questions.

1. Given A, B, C are pretty much static (they don't depend on put rate), how much should I size D to be?

2. When I reintroduce the checkpointer, it could write out the dirty BINs (currently our bytes.interval is 20MB) from D, causing more garbage? Making it harder for Cleaners to catch up?

3. All the background threads seem to be triggered per MB of writes to log. Does this include writes coming from Cleaners/eviction/checkpointing as well? i.e are the cleaners/checkpointer triggered solely based on online traffic or does the cleaner's migration also count as a write?

Some clarification would be great.
,
Thanks
Vinoth
  • 1. Re: Understanding the correlation between cache size and cleaner performance
    vinothchandar Newbie
    Currently Being Moderated
    For 3, from the code, it seems like the background thread writes also are accounted in the threshold to trigger cleaning and checkpointing..
  • 2. Re: Understanding the correlation between cache size and cleaner performance
    greybird Expert
    Currently Being Moderated
    1. Given A, B, C are pretty much static (they don't depend on put rate), how much should I size D to be?
    Dirty BINs are written (made non-dirty) by checkpoints. So you'd want to estimate the number of BINs that are dirtied in between checkpoints.
    2. When I reintroduce the checkpointer, it could write out the dirty BINs (currently our bytes.interval is 20MB) from D, causing more garbage? Making it harder for Cleaners to catch up?
    I don't know what you're asking. Checkpoints are necessary to bound recovery time. Yes, they write information that needs to be cleaned later.
    3. All the background threads seem to be triggered per MB of writes to log. Does this include writes coming from Cleaners/eviction/checkpointing as well? i.e are the cleaners/checkpointer triggered solely based on online traffic or does the cleaner's migration also count as a write?
    Right, all writes count.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 3. Re: Understanding the correlation between cache size and cleaner performance
    vinothchandar Newbie
    Currently Being Moderated
    Hi Mark,
    Dirty BINs are written (made non-dirty) by checkpoints. So you'd want to estimate the number of BINs that are dirtied in between checkpoints.
    Alright. if I give enough memory such that D can hold all dirty BINs during a cleaning cycle, things do well. Thanks for confirming.

    About question 2, what I am basically asking is that would very frequent checkpoints hurt by writing out BINs/INs much more frequently? For example, with enough memory and long enough checkpointer interval, it seems to me that some BIN/IN dirtying by the cleaner could be amortized.

    Thanks
    Vinoth
  • 4. Re: Understanding the correlation between cache size and cleaner performance
    greybird Expert
    Currently Being Moderated
    About question 2, what I am basically asking is that would very frequent checkpoints hurt by writing out BINs/INs much more frequently?
    Yes, the more writing, the lower performance.
    For example, with enough memory and long enough checkpointer interval, it seems to me that some BIN/IN dirtying by the cleaner could be amortized.
    Yes, the checkpoint interval should be as long as possible, but not so long that recovery time will be unacceptable (as required for your app). I suspect your next question will be: How long does recovery take per MB of log? I don't know the answer and it depends on your app, so you'll have to experiment.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 5. Re: Understanding the correlation between cache size and cleaner performance
    vinothchandar Newbie
    Currently Being Moderated
    I suspect your next question will be: How long does recovery take per MB of log? I don't know the answer and it depends on your app, so you'll have to experiment.
    Not really.. :) .. For now, I am totally content on coming up with a model that gives predictable stable cleaner performance.

    [Skip stuff below, if you don't want the details]
    To give you more context, this relates to our multi tenant deployments at linkedin - multiple dbs/envs on a single server. So, the BDB storage rewrite got rid of most of
    our scanning/cache pollution/duplicate woes, but exposed the next bottleneck. Right now, we have all these DBs sharing the cache in an adhoc fashion (sharedCache= true).

    On a particular cluster with a lot of DBs per box, we ran into an issue of cleaners simply doing a lot of IOPS and thus causing very frequent young gen collections.
    This was impacting our 99th latencies (not that the sky fell over).

    I am attempting to come up with a model such that we can run a script and generate a cache size for each DB.
    So, had to dig into all the implementation details. I think I have a fair handle now on the internals.. Thanks for all the help/confirmations.

    Thanks
    Vinoth

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points