1 Reply Latest reply on Jun 28, 2012 5:06 PM by "Andrei Costache, Oracle-Oracle"

    Space utilization of BDB's b-tree access methord.


      Recently I've been doing a little test on BDB(version 5.3.15). I put 5,000,000 ordered records into an environment-based db using b-tree access methord. The keys are 1-5,000,000, and the values are random integers.

      If I am right, these 5,000,000 records should take about 76M space, with each record sizing two ints plusing one pointer(under x64 windows 7). But the size of the database file ends up to be 157M.

      And db_stat shows that the Number of bytes free in tree leaf pages is 63M(65%).

      I noticed that there is a statement in the Berkeley DB Programmer's Reference Guide says "Inserting unordered data into a Btree can result in pages that are only half-full. DB makes ordered (or inverse ordered) insertion the best case, resulting in nearly full-page space utilization.", which doesn't match my test result.

      I wonder if my test is wrong or that statement is not true.

        • 1. Re: Space utilization of BDB's b-tree access methord.
          "Andrei Costache, Oracle-Oracle"
          Hi Spider,

          Well, things are not so simple :) and it would have been great to see the entire output of db_stat.

          So, you have 5M records, the keys are integers, 1 to 5M, with values which are random integers.
          The Btree access method is actually an implementation of B+ tree, that is, data is stored only on leaf pages (or on overflow/duplicate pages), whereas keys are stored on internal pages (we optimize here storing prefixes for these keys) and on leaf pages (where we store the entire keys). Some information on the way we store data, the padding, the slots needed etc, for the Btree access method is here.

          Since it's Windows 7 x64 (64-bit) and you haven't mentioned whether you explicitly set the Btree database page size, I will assume you don't. In this case, when you do not explicitly set it, BDB will use a page size that is equal to the filesystem's block size. The default for NTFS is 4KB, if I'm correct.
          Hence, let's say we have a 4KB page size. If we subtract the page overhead, which is 26 bytes for the Btree access method, we have about 4070 useful bytes on the page.
          The overhead for storing an item on a Btree page is 5 bytes. So, to store a key/value pair we would need (4B + 5B) * 2, which yields 18B. So, you could store about 226 key/data pairs on a Btree leaf page (226 is [4070/18]). Therefore, you would need about 22,124 Btree leaf pages ([5M/226] + 1) to store your 5M records. We optimize the way we store keys on Btree internal pages, but let's be pessimistic and assume we don't extract the prefixes from the keys, so with about 452 keys that we could store on an internal page ([4070/9]) we will need at most 11,062 internal pages [5M/452] -- which again is an exaggerated number (in real world applications we will use about a quarter of these at most). Add the metadata page (page 0 in the Btree, where we store some configuration information, the free page list, the file id etc), and you end up with about 33,187 pages which means a file size of about 130MB. Of course this would be a realistic size if the page fill factor would be around 100%, which most likely will not (you have a 65% fill factor on the leaf pages), so the 157MB file size you see is not surprising.

          I am assuming you specified your own user-defined key comparison function using the DB->set_bt_compare() method in order to have the integer keys properly sorted; see the Btree comparison documentation section, and this FAQ entry. If not, then please do so, as the default lexicographical key comparison function that BDB uses will not sort too well integer keys.

          I hope the above explanations will help you understand the disk space requirements for a Btree database. If you want to obtain a better page fill factor examine the db_stat output and try a different Btree page size.