This discussion is archived
11 Replies Latest reply: Apr 24, 2012 10:33 AM by vinothchandar RSS

Question on put()  cost

vinothchandar Newbie
Currently Being Moderated
Hi,

I am trying to determine how much IOPS a put() would cost in BDB je, including the cleaner. I have a basic model for an abstract log structured storage engine, which I would like to adjust for BDB.

My workload always does put() for an existing key. Hence, the write should only dirty a DBIN down the tree. BIN and INs should not be modified since the BTree structure is untouched. But, when I print the log files using DbPrintLog, I can see its 65% BIN and 35% IN nodes. Any reason they would be dirtied in this scenario? (PS: I have run the experiment for several hours, starting from an environment meeting file utilization. So, the entries I see logged must be either from recent writes during the time of the experiment)

Thanks
Vinoth
  • 1. Re: Question on put()  cost
    greybird Expert
    Currently Being Moderated
    Hi Vinoth,

    Are you asking about JE 4.1, 5.0, or both? They behave quite differently.

    --mark                                                                                                                                                                                                   
  • 2. Re: Question on put()  cost
    greybird Expert
    Currently Being Moderated
    Oh, you're using JE 4.1, of course, since you've mentioned DBINs.

    You probably have no DBINs in your log because you don't have any true duplicates. As I mentioned earlier, in JE 4.1, if there is a single record per key, the LN appears in a BIN. Does that explain why you see BINs in the log?

    The higher level INs are written by checkpoints because dirtiness propagates up the tree. If a BIN is logged, it's parent IN is dirtied and written during the next checkpoint, etc. Does that explain why you see INs in the log?

    Don't you see BINDeltas in the log? When an LN is written, there should normally be a delta logged instead of a BIN. Deltas are logged by each checkpoint until the delta threshold is exceeded (see related EnvironmentConfig properties), and then a full BIN is logged.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 3. Re: Question on put()  cost
    vinothchandar Newbie
    Currently Being Moderated
    Hi Mark,

    This is what I see from DBPrintLog

    type,total count,provisional count,total bytes,min bytes,max bytes,avg bytes,entries as % of log
    LN,3945,0,506458,58,618,128,0.8051572748026017
    MapLN,4,0,246054,52540,70489,61513,0.3911719591639965
    DupCountLN,748,0,76216,36,577,101,0.12116674404660424
    FileSummaryLN,1092,0,149940,30,24882,137,0.23837175399322766
    IN,949,294,20051668,46,138245,21129,31.877759581498434
    BIN,2131,1214,41263126,57,254394,19363,65.59933119823633
    DIN,1106,283,300534,139,1221,271,0.47778322472055945
    DBIN,1225,1225,273705,149,711,223,0.4351309919082058
    Root,2,0,32506,16244,16262,16253,0.051677419203040274
    CkptStart,1,0,31,31,31,31,4.9283209108910616E-5
    CkptEnd,1,0,70,70,70,70,1.1128466572979817E-4
    Trace,6,0,1364,51,297,227,0.002168461200792067
    FileHeader,2,0,76,38,38,38,1.2082335136378086E-4
    key bytes,,,298869,,,,0.47513623945712924
    data bytes,,,124978,,,,0.19868764219398163

    Total bytes in portion of log read: 62,901,748
    Total number of entries: 11,212

    Per checkpoint interval info:
    lnTxn,ln,mapLNTxn,mapLN,end to end,end to start,start to end,maxLNReplay,ckptEnd,
    0,974,0,2,23590746907,23570000000,0,974, 0x1141ca/0x1d5291b
    0,2971,0,2,-20746907,3517392,0,2971, 0x1141cb/0x0


    I do see DBINs since as I said, my writes are guaranteed to be duplicates. But, I see that INs and BINs dominate the size of the log. That confuses me since, BIN should not be dirtied at all, in my case since I only change the duplicate tree rooted from DIN. Does a change to a DBIN also propagate up to the BIN and the IN??

    Thanks
    Vinoth
  • 4. Re: Question on put()  cost
    greybird Expert
    Currently Being Moderated
    I do see DBINs since as I said, my writes are guaranteed to be duplicates. But, I see that INs and BINs dominate the size of the log. That confuses me since, BIN should not be dirtied at all, in my case since I only change the duplicate tree rooted from DIN.
    When you said "My workload always does put() for an existing key" I thought you meant you were doing an update. Do you mean you're inserting a new record for a key that already exists?

    In JE 4.1, inserts and deletions of a duplicate record will:

    1) Write the LN
    2) Dirty the DBIN, causing checkpoints to later write a series of deltas, then a full DBIN, then a full DIN, etc.
    3) Write the DupCountLN, of which there is one per unique key, and which stores the count of records for that key.
    4) Because the DupCountLN is a child of the BIN, the BIN is dirtied, causing checkpoints to later write a series of deltas, then a full BIN, then a full IN, etc.

    (All of this is completely different and much less expensive in JE 5.)
    Does a change to a DBIN also propagate up to the BIN and the IN??
    Yes, it does, due to both (2) and (4) above.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 5. Re: Question on put()  cost
    vinothchandar Newbie
    Currently Being Moderated
    Cool. That helps.

    Thanks Mark!
  • 6. Re: Question on put()  cost
    greybird Expert
    Currently Being Moderated
    The fact that you're not seeing any deltas is suspicious. Checkpoints normally write deltas, unless you are doing a sync() rather than a checkpoint, or specifying CheckpointConfig.setMinimizeRecoveryTime(true). You should allow the checkpoint to write deltas, to reduce the size occupied by BINs and DBINs.

    Also, how often are you doing checkpoints?

    Also, when doing this type of test, make sure that no eviction is occurring by checking the JE stats. Eviction of INs will throw off your numbers.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 7. Re: Question on put()  cost
    vinothchandar Newbie
    Currently Being Moderated
    Hi Mark,

    We don't do any explicit JE calls to sync or checkpoint or clean from our code. Here are our settings..

    checkpoint.bytes.interval= 20MB
    checkpoint.wakeup.interval = 30s
    checkpointer.high.priority=true (This turns off the lazy migration for us)

    The IN eviction is less than < 2-3%. I see that its checkpointing roughly 1.2 times per sec, I have ~100 puts/sec, Btree has 178M entries. Cleaner runs roughly 2 times per sec.

    Checkpointer logs the dirtied entries, whereas the cleaner logs the "live" entries from a stale file. Just want to confirm that the migration by the cleaner, will indeed be factored in by the checkpointer to determine when to checkpoint..
    More migration means more frequent checkpoints and more checkpoints mean more frequent cleaner runs (not necessarily migration), right?

    Thanks
    Vinoth
  • 8. Re: Question on put()  cost
    greybird Expert
    Currently Being Moderated
    I see that its checkpointing roughly 1.2 times per sec.
    That's an awful lot of checkpointing but still I don't understand why you're not seeing any BIN deltas at all. Could you post your env config?
    Checkpointer logs the dirtied entries, whereas the cleaner logs the "live" entries from a stale file. Just want to confirm that the migration by the cleaner, will indeed be factored in by the checkpointer to determine when to checkpoint..
    Correct.

    Another tidbit is that BIN deltas are not logged by the checkpointer for a BIN/DBIN that was dirtied by the cleaner. This could reduce the frequency of deltas, but it still shouldn't be zero.
    More migration means more frequent checkpoints and more checkpoints mean more frequent cleaner runs (not necessarily migration), right?
    Correct. You may want to bump the checkpoint interval way up and turn off high priority checkpoints -- obviously you don't need this setting to avoid long checkpoints. The byte interval supersedes the wakeup interval.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 9. Re: Question on put()  cost
    vinothchandar Newbie
    Currently Being Moderated
    Hi Mark,

    BDB cache size = 104857600
    BDB je.cleaner.threads = 3
    BDB je.cleaner.minUtilization = 50
    BDB je.cleaner.minFileUtilization = 20
    BDB je.log.fileMax = 62914560
    BDB checkpoint interval 20MB
    sorted.duplicates true
    fair.latches false.
    and checkpointer_high_priority is actually false, my bad.

    Basically, my test make a copy of an enviornment, brings up Voldemort on it, and runs a workload. When I do the print log on the entire set of JDB files, I do see BINDeltas. Its only in the later files (which correspond to the writes I was talking about), that I don't see them.

    Thanks
    Vinoth
  • 10. Re: Question on put()  cost
    greybird Expert
    Currently Being Moderated
    When I do the print log on the entire set of JDB files, I do see BINDeltas. Its only in the later files (which correspond to the writes I was talking about), that I don't see them.
    Oh, so you do see deltas. Perhaps it's just that in the last section the writes are dirtying more than 25% of the entries per BIN. (see EnvironmentConfig.TREE_BIN_DELTA)

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 11. Re: Question on put()  cost
    vinothchandar Newbie
    Currently Being Moderated
    yep. Thanks for the pointer mark.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points