This content has been marked as final. Show 1 reply
Well, things are not so simple :) and it would have been great to see the entire output of db_stat.
So, you have 5M records, the keys are integers, 1 to 5M, with values which are random integers.
The Btree access method is actually an implementation of B+ tree, that is, data is stored only on leaf pages (or on overflow/duplicate pages), whereas keys are stored on internal pages (we optimize here storing prefixes for these keys) and on leaf pages (where we store the entire keys). Some information on the way we store data, the padding, the slots needed etc, for the Btree access method is here.
Since it's Windows 7 x64 (64-bit) and you haven't mentioned whether you explicitly set the Btree database page size, I will assume you don't. In this case, when you do not explicitly set it, BDB will use a page size that is equal to the filesystem's block size. The default for NTFS is 4KB, if I'm correct.
Hence, let's say we have a 4KB page size. If we subtract the page overhead, which is 26 bytes for the Btree access method, we have about 4070 useful bytes on the page.
The overhead for storing an item on a Btree page is 5 bytes. So, to store a key/value pair we would need (4B + 5B) * 2, which yields 18B. So, you could store about 226 key/data pairs on a Btree leaf page (226 is [4070/18]). Therefore, you would need about 22,124 Btree leaf pages ([5M/226] + 1) to store your 5M records. We optimize the way we store keys on Btree internal pages, but let's be pessimistic and assume we don't extract the prefixes from the keys, so with about 452 keys that we could store on an internal page ([4070/9]) we will need at most 11,062 internal pages [5M/452] -- which again is an exaggerated number (in real world applications we will use about a quarter of these at most). Add the metadata page (page 0 in the Btree, where we store some configuration information, the free page list, the file id etc), and you end up with about 33,187 pages which means a file size of about 130MB. Of course this would be a realistic size if the page fill factor would be around 100%, which most likely will not (you have a 65% fill factor on the leaf pages), so the 157MB file size you see is not surprising.
I am assuming you specified your own user-defined key comparison function using the DB->set_bt_compare() method in order to have the integer keys properly sorted; see the Btree comparison documentation section, and this FAQ entry. If not, then please do so, as the default lexicographical key comparison function that BDB uses will not sort too well integer keys.
I hope the above explanations will help you understand the disk space requirements for a Btree database. If you want to obtain a better page fill factor examine the db_stat output and try a different Btree page size.