This discussion is archived
1 2 Previous Next 24 Replies Latest reply: Jul 7, 2012 2:22 PM by 909806 RSS

Data Bloat in je 4.1.17

vinothchandar Newbie
Currently Being Moderated
We upgraded to 4.1.17 from 4.0.92 and we are seeing heavy increase in the data size on disk. DBPrintLog reports massive amounts of DBINs and DINs. A cursor walk of the database checking for number of duplicates for each key, shows that the data actually does not contain that many duplicates.

I know we had this conversation about duplicates in 4.x Re: Clarification on 4.x handling of duplicates

Is the increase because we have the duplicate tree for all keys in 4.1.17, immaterial of whether or not they actually have duplicate values?
Any idea what's going on here? Any help to go about debugging this is greatly appreciated.

PS : I will get the DBPrintLog output in a while.

-Vinoth
  • 1. Re: Data Bloat in je 4.1.17
    greybird Expert
    Currently Being Moderated
    Hi Vinoth,

    There is no difference in the way duplicates are handled between JE 4.0 from 4.1. The only real change was in JE 5.0. I suggest looking for other things that may have changed when you upgraded, as a starting point.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 2. Re: Data Bloat in je 4.1.17
    vinothchandar Newbie
    Currently Being Moderated
    DIN,32665083,4671368,260458817830,775,269045,7973,30.20941307672278
    DBIN,32486471,32425816,303375994883,1418,295296,9338,35.18717785536485
    LN,37469028,0,109486004014,379,123639,2922,12.698774988441537
    I kind of suspect its creating the duplicate tree for every LN. The number of LNs and DINs and DBINs kind of match up, that leads me to believe this.


    Full log below..

    type,total count,provisional count,total bytes,min bytes,max bytes,avg bytes,entries as % of log
    LN_TX,16377057,0,54145835690,763,123720,3306,6.280124936339035
    LN,37469028,0,109486004014,379,123639,2922,12.698774988441537
    MapLN,51509,0,10744820391,146342,619497,208600,1.2462419983752446
    NameLN,2,0,70,35,35,35,8.118976093759361E-9
    DelDupLN_TX,16299582,0,52874291725,768,123656,3243,6.132644435567621
    DupCountLN_TX,12745014,0,1605867810,125,126,125,0.18625717655896715
    DupCountLN,16599739,10405296,1726370216,103,104,103,0.20023369303831692
    FileSummaryLN,11286544,0,1058470926,30,44116,93,0.1227671449161905
    IN,1088773,1368,7108546890,46,46713,6528,0.8244874608753923
    BIN,4308737,2885551,54041786680,45,46622,12542,6.268056773128045
    DIN,32665083,4671368,260458817830,775,269045,7973,30.20941307672278
    DBIN,32486471,32425816,303375994883,1418,295296,9338,35.18717785536485
    Root,2999,0,185206615,38919,77527,61756,0.02148125827987277
    Commit,16377046,0,573196610,35,35,35,0.06648242248019869
    CkptStart,1630,0,52156,30,34,31,6.049333102087332E-6
    CkptEnd,1011,0,72790,70,74,71,8.442575283782056E-6
    BINDelta,34450106,0,4786062316,42,11582,138,0.5551132218120938
    DupBINDelta,65,0,436762,975,32168,6719,5.065800338089324E-5
    Trace,20699,0,4994926,50,305,241,5.793383540585297E-4
    FileHeader,22476,0,854088,38,38,38,9.90617150566678E-5
    key bytes,,,4361532885,,,,0.5058740175065757
    data bytes,,,157753659939,,,,18.297117053539786

    Total bytes in portion of log read: 862,177,683,388
    Total number of entries: 232,253,571
  • 3. Re: Data Bloat in je 4.1.17
    vinothchandar Newbie
    Currently Being Moderated
    Hi Mark,

    The changes contain just the new BDB version + some monitoring code So, I am sure nothing else has changed.

    Thanks
    Vinoth
  • 4. Re: Data Bloat in je 4.1.17
    greybird Expert
    Currently Being Moderated
    It looks like around 2/3 LNs has a duplicate set. I thought you had said that was normal.

    If you can create identical data sets with 4.0 and 4.1, and do the printlog summary for both, it would be useful to compare them. Alongside this, it would be good to know how many true duplicates there are, for both data sets.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  • 5. Re: Data Bloat in je 4.1.17
    vinothchandar Newbie
    Currently Being Moderated
    Hi mark,

    Ratio of DINs to DBINs is 32665083.0 / 37469028.0 = 87% . This is way too high.. Maximum duplicate percentage we have is usually around 20% max. And the disk usage keeps moving up as we speak.

    Not sure.

    Thanks
    Vinoth
  • 6. Re: Data Bloat in je 4.1.17
    greybird Expert
    Currently Being Moderated
    This isn't enough info for me to conclude anything. Comparing the details for your 4.0 and 4.1 run would tell more. I'm not saying that there is nothing wrong, only that this is not enough info to figure out what is happening.
  • 7. Re: Data Bloat in je 4.1.17
    vinothchandar Newbie
    Currently Being Moderated
    Hi Mark,

    for every Voldemort put, we do the following BDB operations

    open_txn(null,null)
    // read + delete to delete old ones
    for k in getSearchkey()
    cursor.delete()
    // put
    cursor.put(k, v)
    close_txn

    I did a quick test with inserting 1000 keys and then updating them rightaway (all serially). I printed log to make sure server was deleting for every key.
    With 4.1.17, I see the following
    type,total count,provisional count,total bytes,min bytes,max bytes,avg bytes,entries as % of log
    LN_TX,200,0,9183,42,51,45,14.045579687977975
    MapLN,5,0,216,31,59,43,0.3303762618537779
    NameLN_TX,2,0,83,37,46,41,0.12695013765677576
    DelDupLN_TX,100,0,4867,46,51,48,7.444172529825635
    DupCountLN_TX,100,0,3108,28,33,31,4.7537473233404715
    DupCountLN,100,100,2225,20,24,22,3.4031814010400736
    FileSummaryLN,3,0,1046,76,894,348,1.5998776384215356
    IN,10,0,501,39,66,50,0.7662893851330682
    BIN,11,9,1322,32,826,120,2.022025084123585
    DIN,200,0,17804,83,95,89,27.231569287243804
    DBIN,200,200,18912,76,114,94,28.92627714897522
    Root,6,0,423,45,77,70,0.6469868461303151
    Commit,202,0,5334,25,27,26,8.15845824411135
    CkptStart,3,0,86,27,31,28,0.13153869684918937
    CkptEnd,3,0,141,46,48,47,0.21566228204343835
    Trace,1,0,91,91,91,91,0.139186295503212
    FileHeader,1,0,38,38,38,38,0.05812174977057204
    key bytes,,,384,,,,0.5873355766289385
    data bytes,,,4184,,,,6.3995105536861425

    Total bytes in portion of log read: 65,380
    Total number of entries: 1,147


    With 4.0.92, I see the following

    type,total count,provisional count,total bytes,min bytes,max bytes,avg bytes,entries as % of log
    LN_TX,200,0,9345,42,52,46,40.84710201940729
    MapLN,5,0,211,31,55,42,0.922283416382551
    NameLN_TX,2,0,83,37,46,41,0.36279395052015034
    DelDupLN_TX,100,0,4867,46,51,48,21.273712737127372
    FileSummaryLN,3,0,438,73,289,146,1.914503015997902
    IN,10,0,501,39,66,50,2.1898767374770522
    BIN,11,9,1322,32,826,120,5.778477139610105
    Root,6,0,423,45,77,70,1.8489378442171518
    Commit,202,0,5334,25,27,26,23.314975085234725
    CkptStart,3,0,86,27,31,28,0.37590698487630037
    CkptEnd,3,0,139,46,47,46,0.6075705918349507
    Trace,1,0,91,91,91,91,0.3977620421365504
    FileHeader,1,0,38,38,38,38,0.16609843517790016
    key bytes,,,384,,,,1.6784683975872017
    data bytes,,,4184,,,,18.288311915377218

    Total bytes in portion of log read: 22,878
    Total number of entries: 547

    You can notice that the DBIN and DINs are not present. It could be a problem with the way we call BDB apis too. I will be really glad to provide more information you need

    Thanks
    Vinoth
  • 8. Re: Data Bloat in je 4.1.17
    vinothchandar Newbie
    Currently Being Moderated
    The much more bizzare thing is -- as I do one more update, I see one more DBIN and DIN using printLog.
    When I iterate using searchDupKey(), it just returns one value. I have a small BDB program which has this reproduced.
    I can send you the code if you want to take a look.

    Thanks a lot mark
  • 9. Re: Data Bloat in je 4.1.17
    vinothchandar Newbie
    Currently Being Moderated
    More information.

    1. I don't see the DBINs and DINs when I just do a bunch of inserts. Its only after the updates
    2. As well, if I turn off sortedDuplicates, I don't see this issue (obvious, but trying to narrow it down)
  • 10. Re: Data Bloat in je 4.1.17
    greybird Expert
    Currently Being Moderated
    Vinoth,

    Your evidence is very convincing, thanks for collecting it.

    This could be a side effect of a bug fix for slot reuse in a duplicates DB. Slot reuse occurs when you delete and then insert a key. For example, see [#17252] and [#19026] in the JE 4.1 change log. I haven't looked at these fixes yet to see if they could have such a side effect, but I will.

    It would help if you could send me the source code for the small test you mentioned, so I can see exactly what JE methods you're using.

    Thanks,
    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
  • 11. Re: Data Bloat in je 4.1.17
    greybird Expert
    Currently Being Moderated
    The cause is almost certainly this fix:

    >
    Fixed a bug that causes a EnvironmentFailureException LOG_FILE_NOT_FOUND when using a particular sequence of operations with a Database configured for duplicates (DatabaseConfig.setSortedDuplicates(true)) or a DPL secondary key with a MANY_TO_ONE or MANY_TO_MANY relationship. The sequence of operations that causes the bug is:

    A single record exists (no duplicates) for key A
    New transaction T1 starts
    Key A is deleted in T1
    Two or more duplicates are inserted for key A in T1
    The T1 transaction aborts

    Later, the log file containing the original version of the record is cleaned and deleted, causing an exception when that record is subsequently accessed. An example stack trace is below.

    (JE 4.0.92) var/db/bdb_userNode fetchTarget of 0x3151/0x44b2638 parent
    IN=729921304 IN class=com.sleepycat.je.tree.DBIN
    lastFullVersion=0x3189/0x13616b0 parent.getDirty()=false state=0
    LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is
    invalid and must be closed.
    at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1241)
    at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1300)
    ...
    Thanks to "helg" on OTN for reporting this and working with us to diagnose the problem. [#17252] (fixed in 4.0.114)
    >

    The SR # is wrong in the change log, it should be [#18937]. Note that this fix is in 4.0.114, so if you were to upgrade to the latest 4.0 release you'd have the same problem.

    Of course, the bug described above is not something you're running into right now, but the fix was necessary. The previous behavior was incorrect. For details on the internal change (if you're interested) see:
    src/com/sleepycat/je.tree/Tree.java -- search for [#18937].
    test/com/sleepycat/je/DupSlotReuseTest.java -- the test for this fix, contains a detailed description.

    The new/correct behavior, implemented with this fix, is that a txn that does a delete followed by an insert, with the same key but different data, will add a new slot rather than reusing the slot for the deleted record, meaning that a DIN/DBIN will be created if one does not already exist. This is only true for a duplicates DB of course.

    A potential workaround is to add a duplicate comparator (DatabaseConfig.setDuplicateComparator) that compares only the Voldemort version, which I assume is embedded in the data. This works if only the version is required to distinguish two records with the same key. And if a delete followed by an insert will typically use the same version for the old and new record, i.e., I am assuming that if only one version for a key exists, version 0 (or some other constant) is always used. This is a complex workaround and risky. It will require re-loading all data if the sort order for duplicates will change (and I assume it will). It will take more thought and much testing.

    I don't believe there are any simpler workarounds, but I'll keep thinking.

    When we reworked the implementation of duplicates in JE 5, one of the motivations was to remove all this complexity, and we did: DINs and DBINs no longer exist. I know I sound like a broken record, but I don't believe you're going to stop having problems with duplicates until you move to JE 5.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  • 12. Re: Data Bloat in je 4.1.17
    greybird Expert
    Currently Being Moderated
    On 07/03/2012 11:49 PM, Vinoth Chandar wrote:
    The version will never be the same for a delete followed by an insert. So, I don't think this will work.
    I was afraid of that.
    I am pretty sure the problem is related to the transaction related fix you mentioned in JE forums. I tried placing the deletes and inserts in two separate transactions and the bloat seems to be not happening. Is it possible to have a fix for this rolled out? If yes, would it possibly clean up all the extra DBINs/DINs currently in the log too? Please let us know.
    No, it is not possible to change this behavior, because it is not a bug. It is the correct behavior with duplicates in JE 4.1. If we changed this behavior, it would cause other problems, such as the problem with aborts described in the original bug fix. The incorrect behavior may have resulted in better performance, in the particular case where you are doing the delete and insert in a single txn, but it could result in complete data loss for the environment (LOG_FILE_NOT_FOUND, as in the case of the abort bug). We cannot risk data loss.

    Can you use two separate txns for the delete and insert? That's what I was just about to suggest, since it is another workaround. The drawback of course is that the update won't be atomic -- if you crash in between the two transactions, the original data will remain deleted.

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 13. Re: Data Bloat in je 4.1.17
    greybird Expert
    Currently Being Moderated
    To give you more information about this situation, I'm going to try to describe more about what is happening in the Btree when you run your test.

    Before the txn starts, you have a single LN in a BIN. Let's say this LN has key K1 and data D1. There is no DIN or DBIN for K1.

    The txn starts. You delete the LN with K1/D1. However, the slot in the BIN cannot be deleted until the txn ends (this is fundamental to JE's transaction architecture). So the BIN slot with K1 is marked deleted, but is still present.

    Then you insert another LN with the same key but different data, say K1/D2. While the txn is open, this second LN must coexist with the first LN, because the data is different. In a duplicates database, you cannot overwrite a record with different data, since the data is part of what uniquely identifies the record.

    In order for two slots with key K1 to coexist, the DIN and DBIN must be created. This is what you're seeing.

    (The bug that was fixed [#18937] was that the K1/D2 slot was incorrectly being reused/overwritten by the insert.)

    --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
  • 14. Re: Data Bloat in je 4.1.17
    vinothchandar Newbie
    Currently Being Moderated
    Hi Mark,

    Thanks for the explanation.
    Can you use two separate txns for the delete and insert?
    I am afraid we cannot. The vector clock maintenance is the critical piece of Voldemort's consistency model. So, this is not really an option.
    No, it is not possible to change this behavior, because it is not a bug.
    I am sorry to disagree. Your explanation described why a DIN and DBIN must be created even if K does not have duplicates. But I still don't understand why DBIN and DINs must be continuously created for every such update and never cleaned up. In other words, N transactions like these produce N DINs/DBINs. This will just fill up the disk.

    However, I am working handling duplicates completely in voldemort code, transparent to JE. (there went my July 4th).
    I would like to know about the following.

    -- How can we make the DINs and DBINs go away? Would switching back to 4.0.92 auto fix the bloat we currently have?
    -- Can we get a patch to make the bloat go away?

    Thanks
    Vinoth
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points