Forum Stats

  • 3,840,134 Users
  • 2,262,571 Discussions
  • 7,901,154 Comments

Discussions

Working with large data

529154
529154 Member Posts: 2
edited Aug 21, 2006 10:26PM in Berkeley DB
I have to create a database with specific distribution of key size and data
size. Key size is some few bytes, and data size varies in a great range from
few bytes to some megabytes with average size near 64K. Overall size of a
database filled by key/data pairs is some gigabytes. One can imagine our
key/data pairs form context index (inverted file) for a large set of
Russian/English texts.

Could you recommend the best way to configure such a data storage? I mean the
best random read speed for key/data pairs in a given database and good enough
write speed (for "context index" updating).

Comments

  • 526060
    526060 Member Posts: 386
    Hi,

    The most important configuration item in your case will be the cache size configuration. The larger the cache the better performance will be - especially for random read-oriented usage. Documentation about the cache configuration API are here:
    http://www.sleepycat.com/docs/api_c/db_set_cachesize.html
    An article about tuning cache size is available here:
    http://www.sleepycat.com/newsletters/0511/a31_Perf_Size.html

    Selecting the format for the database will also have an impact. Given your description I suggest that hash is likely the best solution - since the data access will be random. You should test with both hash and btree. An article describing the benefits/drawbacks of both is here:
    http://www.sleepycat.com/docs/ref/am_conf/select.html

    Then you might want to adjust the pagesize - given that your data items are generally large, a bigger page size will probably result in better performance. API here:
    http://www.sleepycat.com/docs/api_c/db_set_pagesize.html

    The db_stat utility is a very useful tool for tuning your database. Documentation can be found here:
    http://www.sleepycat.com/docs/utility/db_stat.html

    If you have any specific questions I will be glad to help.

    Regards,
    Alex
This discussion has been closed.