Forum Stats

  • 3,840,134 Users
  • 2,262,571 Discussions


Working with large data

529154 Member Posts: 2
edited Aug 21, 2006 10:26PM in Berkeley DB
I have to create a database with specific distribution of key size and data
size. Key size is some few bytes, and data size varies in a great range from
few bytes to some megabytes with average size near 64K. Overall size of a
database filled by key/data pairs is some gigabytes. One can imagine our
key/data pairs form context index (inverted file) for a large set of
Russian/English texts.

Could you recommend the best way to configure such a data storage? I mean the
best random read speed for key/data pairs in a given database and good enough
write speed (for "context index" updating).


  • 526060
    526060 Member Posts: 386

    The most important configuration item in your case will be the cache size configuration. The larger the cache the better performance will be - especially for random read-oriented usage. Documentation about the cache configuration API are here:
    An article about tuning cache size is available here:

    Selecting the format for the database will also have an impact. Given your description I suggest that hash is likely the best solution - since the data access will be random. You should test with both hash and btree. An article describing the benefits/drawbacks of both is here:

    Then you might want to adjust the pagesize - given that your data items are generally large, a bigger page size will probably result in better performance. API here:

    The db_stat utility is a very useful tool for tuning your database. Documentation can be found here:

    If you have any specific questions I will be glad to help.

This discussion has been closed.