1 2 Previous Next 24 Replies Latest reply: Jul 7, 2012 4:22 PM by 909806 RSS

    Data Bloat in je 4.1.17

    vinothchandar
      We upgraded to 4.1.17 from 4.0.92 and we are seeing heavy increase in the data size on disk. DBPrintLog reports massive amounts of DBINs and DINs. A cursor walk of the database checking for number of duplicates for each key, shows that the data actually does not contain that many duplicates.

      I know we had this conversation about duplicates in 4.x Re: Clarification on 4.x handling of duplicates

      Is the increase because we have the duplicate tree for all keys in 4.1.17, immaterial of whether or not they actually have duplicate values?
      Any idea what's going on here? Any help to go about debugging this is greatly appreciated.

      PS : I will get the DBPrintLog output in a while.

      -Vinoth
        • 1. Re: Data Bloat in je 4.1.17
          Greybird-Oracle
          Hi Vinoth,

          There is no difference in the way duplicates are handled between JE 4.0 from 4.1. The only real change was in JE 5.0. I suggest looking for other things that may have changed when you upgraded, as a starting point.

          --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
          • 2. Re: Data Bloat in je 4.1.17
            vinothchandar
            DIN,32665083,4671368,260458817830,775,269045,7973,30.20941307672278
            DBIN,32486471,32425816,303375994883,1418,295296,9338,35.18717785536485
            LN,37469028,0,109486004014,379,123639,2922,12.698774988441537
            I kind of suspect its creating the duplicate tree for every LN. The number of LNs and DINs and DBINs kind of match up, that leads me to believe this.


            Full log below..

            type,total count,provisional count,total bytes,min bytes,max bytes,avg bytes,entries as % of log
            LN_TX,16377057,0,54145835690,763,123720,3306,6.280124936339035
            LN,37469028,0,109486004014,379,123639,2922,12.698774988441537
            MapLN,51509,0,10744820391,146342,619497,208600,1.2462419983752446
            NameLN,2,0,70,35,35,35,8.118976093759361E-9
            DelDupLN_TX,16299582,0,52874291725,768,123656,3243,6.132644435567621
            DupCountLN_TX,12745014,0,1605867810,125,126,125,0.18625717655896715
            DupCountLN,16599739,10405296,1726370216,103,104,103,0.20023369303831692
            FileSummaryLN,11286544,0,1058470926,30,44116,93,0.1227671449161905
            IN,1088773,1368,7108546890,46,46713,6528,0.8244874608753923
            BIN,4308737,2885551,54041786680,45,46622,12542,6.268056773128045
            DIN,32665083,4671368,260458817830,775,269045,7973,30.20941307672278
            DBIN,32486471,32425816,303375994883,1418,295296,9338,35.18717785536485
            Root,2999,0,185206615,38919,77527,61756,0.02148125827987277
            Commit,16377046,0,573196610,35,35,35,0.06648242248019869
            CkptStart,1630,0,52156,30,34,31,6.049333102087332E-6
            CkptEnd,1011,0,72790,70,74,71,8.442575283782056E-6
            BINDelta,34450106,0,4786062316,42,11582,138,0.5551132218120938
            DupBINDelta,65,0,436762,975,32168,6719,5.065800338089324E-5
            Trace,20699,0,4994926,50,305,241,5.793383540585297E-4
            FileHeader,22476,0,854088,38,38,38,9.90617150566678E-5
            key bytes,,,4361532885,,,,0.5058740175065757
            data bytes,,,157753659939,,,,18.297117053539786

            Total bytes in portion of log read: 862,177,683,388
            Total number of entries: 232,253,571
            • 3. Re: Data Bloat in je 4.1.17
              vinothchandar
              Hi Mark,

              The changes contain just the new BDB version + some monitoring code So, I am sure nothing else has changed.

              Thanks
              Vinoth
              • 4. Re: Data Bloat in je 4.1.17
                Greybird-Oracle
                It looks like around 2/3 LNs has a duplicate set. I thought you had said that was normal.

                If you can create identical data sets with 4.0 and 4.1, and do the printlog summary for both, it would be useful to compare them. Alongside this, it would be good to know how many true duplicates there are, for both data sets.

                --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
                • 5. Re: Data Bloat in je 4.1.17
                  vinothchandar
                  Hi mark,

                  Ratio of DINs to DBINs is 32665083.0 / 37469028.0 = 87% . This is way too high.. Maximum duplicate percentage we have is usually around 20% max. And the disk usage keeps moving up as we speak.

                  Not sure.

                  Thanks
                  Vinoth
                  • 6. Re: Data Bloat in je 4.1.17
                    Greybird-Oracle
                    This isn't enough info for me to conclude anything. Comparing the details for your 4.0 and 4.1 run would tell more. I'm not saying that there is nothing wrong, only that this is not enough info to figure out what is happening.
                    • 7. Re: Data Bloat in je 4.1.17
                      vinothchandar
                      Hi Mark,

                      for every Voldemort put, we do the following BDB operations

                      open_txn(null,null)
                      // read + delete to delete old ones
                      for k in getSearchkey()
                      cursor.delete()
                      // put
                      cursor.put(k, v)
                      close_txn

                      I did a quick test with inserting 1000 keys and then updating them rightaway (all serially). I printed log to make sure server was deleting for every key.
                      With 4.1.17, I see the following
                      type,total count,provisional count,total bytes,min bytes,max bytes,avg bytes,entries as % of log
                      LN_TX,200,0,9183,42,51,45,14.045579687977975
                      MapLN,5,0,216,31,59,43,0.3303762618537779
                      NameLN_TX,2,0,83,37,46,41,0.12695013765677576
                      DelDupLN_TX,100,0,4867,46,51,48,7.444172529825635
                      DupCountLN_TX,100,0,3108,28,33,31,4.7537473233404715
                      DupCountLN,100,100,2225,20,24,22,3.4031814010400736
                      FileSummaryLN,3,0,1046,76,894,348,1.5998776384215356
                      IN,10,0,501,39,66,50,0.7662893851330682
                      BIN,11,9,1322,32,826,120,2.022025084123585
                      DIN,200,0,17804,83,95,89,27.231569287243804
                      DBIN,200,200,18912,76,114,94,28.92627714897522
                      Root,6,0,423,45,77,70,0.6469868461303151
                      Commit,202,0,5334,25,27,26,8.15845824411135
                      CkptStart,3,0,86,27,31,28,0.13153869684918937
                      CkptEnd,3,0,141,46,48,47,0.21566228204343835
                      Trace,1,0,91,91,91,91,0.139186295503212
                      FileHeader,1,0,38,38,38,38,0.05812174977057204
                      key bytes,,,384,,,,0.5873355766289385
                      data bytes,,,4184,,,,6.3995105536861425

                      Total bytes in portion of log read: 65,380
                      Total number of entries: 1,147


                      With 4.0.92, I see the following

                      type,total count,provisional count,total bytes,min bytes,max bytes,avg bytes,entries as % of log
                      LN_TX,200,0,9345,42,52,46,40.84710201940729
                      MapLN,5,0,211,31,55,42,0.922283416382551
                      NameLN_TX,2,0,83,37,46,41,0.36279395052015034
                      DelDupLN_TX,100,0,4867,46,51,48,21.273712737127372
                      FileSummaryLN,3,0,438,73,289,146,1.914503015997902
                      IN,10,0,501,39,66,50,2.1898767374770522
                      BIN,11,9,1322,32,826,120,5.778477139610105
                      Root,6,0,423,45,77,70,1.8489378442171518
                      Commit,202,0,5334,25,27,26,23.314975085234725
                      CkptStart,3,0,86,27,31,28,0.37590698487630037
                      CkptEnd,3,0,139,46,47,46,0.6075705918349507
                      Trace,1,0,91,91,91,91,0.3977620421365504
                      FileHeader,1,0,38,38,38,38,0.16609843517790016
                      key bytes,,,384,,,,1.6784683975872017
                      data bytes,,,4184,,,,18.288311915377218

                      Total bytes in portion of log read: 22,878
                      Total number of entries: 547

                      You can notice that the DBIN and DINs are not present. It could be a problem with the way we call BDB apis too. I will be really glad to provide more information you need

                      Thanks
                      Vinoth
                      • 8. Re: Data Bloat in je 4.1.17
                        vinothchandar
                        The much more bizzare thing is -- as I do one more update, I see one more DBIN and DIN using printLog.
                        When I iterate using searchDupKey(), it just returns one value. I have a small BDB program which has this reproduced.
                        I can send you the code if you want to take a look.

                        Thanks a lot mark
                        • 9. Re: Data Bloat in je 4.1.17
                          vinothchandar
                          More information.

                          1. I don't see the DBINs and DINs when I just do a bunch of inserts. Its only after the updates
                          2. As well, if I turn off sortedDuplicates, I don't see this issue (obvious, but trying to narrow it down)
                          • 10. Re: Data Bloat in je 4.1.17
                            Greybird-Oracle
                            Vinoth,

                            Your evidence is very convincing, thanks for collecting it.

                            This could be a side effect of a bug fix for slot reuse in a duplicates DB. Slot reuse occurs when you delete and then insert a key. For example, see [#17252] and [#19026] in the JE 4.1 change log. I haven't looked at these fixes yet to see if they could have such a side effect, but I will.

                            It would help if you could send me the source code for the small test you mentioned, so I can see exactly what JE methods you're using.

                            Thanks,
                            --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
                            • 11. Re: Data Bloat in je 4.1.17
                              Greybird-Oracle
                              The cause is almost certainly this fix:

                              >
                              Fixed a bug that causes a EnvironmentFailureException LOG_FILE_NOT_FOUND when using a particular sequence of operations with a Database configured for duplicates (DatabaseConfig.setSortedDuplicates(true)) or a DPL secondary key with a MANY_TO_ONE or MANY_TO_MANY relationship. The sequence of operations that causes the bug is:

                              A single record exists (no duplicates) for key A
                              New transaction T1 starts
                              Key A is deleted in T1
                              Two or more duplicates are inserted for key A in T1
                              The T1 transaction aborts

                              Later, the log file containing the original version of the record is cleaned and deleted, causing an exception when that record is subsequently accessed. An example stack trace is below.

                              (JE 4.0.92) var/db/bdb_userNode fetchTarget of 0x3151/0x44b2638 parent
                              IN=729921304 IN class=com.sleepycat.je.tree.DBIN
                              lastFullVersion=0x3189/0x13616b0 parent.getDirty()=false state=0
                              LOG_FILE_NOT_FOUND: Log file missing, log is likely invalid. Environment is
                              invalid and must be closed.
                              at com.sleepycat.je.tree.IN.fetchTarget(IN.java:1241)
                              at com.sleepycat.je.tree.BIN.fetchTarget(BIN.java:1300)
                              ...
                              Thanks to "helg" on OTN for reporting this and working with us to diagnose the problem. [#17252] (fixed in 4.0.114)
                              >

                              The SR # is wrong in the change log, it should be [#18937]. Note that this fix is in 4.0.114, so if you were to upgrade to the latest 4.0 release you'd have the same problem.

                              Of course, the bug described above is not something you're running into right now, but the fix was necessary. The previous behavior was incorrect. For details on the internal change (if you're interested) see:
                              src/com/sleepycat/je.tree/Tree.java -- search for [#18937].
                              test/com/sleepycat/je/DupSlotReuseTest.java -- the test for this fix, contains a detailed description.

                              The new/correct behavior, implemented with this fix, is that a txn that does a delete followed by an insert, with the same key but different data, will add a new slot rather than reusing the slot for the deleted record, meaning that a DIN/DBIN will be created if one does not already exist. This is only true for a duplicates DB of course.

                              A potential workaround is to add a duplicate comparator (DatabaseConfig.setDuplicateComparator) that compares only the Voldemort version, which I assume is embedded in the data. This works if only the version is required to distinguish two records with the same key. And if a delete followed by an insert will typically use the same version for the old and new record, i.e., I am assuming that if only one version for a key exists, version 0 (or some other constant) is always used. This is a complex workaround and risky. It will require re-loading all data if the sort order for duplicates will change (and I assume it will). It will take more thought and much testing.

                              I don't believe there are any simpler workarounds, but I'll keep thinking.

                              When we reworked the implementation of duplicates in JE 5, one of the motivations was to remove all this complexity, and we did: DINs and DBINs no longer exist. I know I sound like a broken record, but I don't believe you're going to stop having problems with duplicates until you move to JE 5.

                              --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
                              • 12. Re: Data Bloat in je 4.1.17
                                Greybird-Oracle
                                On 07/03/2012 11:49 PM, Vinoth Chandar wrote:
                                The version will never be the same for a delete followed by an insert. So, I don't think this will work.
                                I was afraid of that.
                                I am pretty sure the problem is related to the transaction related fix you mentioned in JE forums. I tried placing the deletes and inserts in two separate transactions and the bloat seems to be not happening. Is it possible to have a fix for this rolled out? If yes, would it possibly clean up all the extra DBINs/DINs currently in the log too? Please let us know.
                                No, it is not possible to change this behavior, because it is not a bug. It is the correct behavior with duplicates in JE 4.1. If we changed this behavior, it would cause other problems, such as the problem with aborts described in the original bug fix. The incorrect behavior may have resulted in better performance, in the particular case where you are doing the delete and insert in a single txn, but it could result in complete data loss for the environment (LOG_FILE_NOT_FOUND, as in the case of the abort bug). We cannot risk data loss.

                                Can you use two separate txns for the delete and insert? That's what I was just about to suggest, since it is another workaround. The drawback of course is that the update won't be atomic -- if you crash in between the two transactions, the original data will remain deleted.

                                --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                                • 13. Re: Data Bloat in je 4.1.17
                                  Greybird-Oracle
                                  To give you more information about this situation, I'm going to try to describe more about what is happening in the Btree when you run your test.

                                  Before the txn starts, you have a single LN in a BIN. Let's say this LN has key K1 and data D1. There is no DIN or DBIN for K1.

                                  The txn starts. You delete the LN with K1/D1. However, the slot in the BIN cannot be deleted until the txn ends (this is fundamental to JE's transaction architecture). So the BIN slot with K1 is marked deleted, but is still present.

                                  Then you insert another LN with the same key but different data, say K1/D2. While the txn is open, this second LN must coexist with the first LN, because the data is different. In a duplicates database, you cannot overwrite a record with different data, since the data is part of what uniquely identifies the record.

                                  In order for two slots with key K1 to coexist, the DIN and DBIN must be created. This is what you're seeing.

                                  (The bug that was fixed [#18937] was that the K1/D2 slot was incorrectly being reused/overwritten by the insert.)

                                  --mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
                                  • 14. Re: Data Bloat in je 4.1.17
                                    vinothchandar
                                    Hi Mark,

                                    Thanks for the explanation.
                                    Can you use two separate txns for the delete and insert?
                                    I am afraid we cannot. The vector clock maintenance is the critical piece of Voldemort's consistency model. So, this is not really an option.
                                    No, it is not possible to change this behavior, because it is not a bug.
                                    I am sorry to disagree. Your explanation described why a DIN and DBIN must be created even if K does not have duplicates. But I still don't understand why DBIN and DINs must be continuously created for every such update and never cleaned up. In other words, N transactions like these produce N DINs/DBINs. This will just fill up the disk.

                                    However, I am working handling duplicates completely in voldemort code, transparent to JE. (there went my July 4th).
                                    I would like to know about the following.

                                    -- How can we make the DINs and DBINs go away? Would switching back to 4.0.92 auto fix the bloat we currently have?
                                    -- Can we get a patch to make the bloat go away?

                                    Thanks
                                    Vinoth
                                    1 2 Previous Next