This discussion is archived
0 Replies Latest reply: Oct 17, 2013 1:15 AM by 300d63e5-65a9-4afe-9cf9-da799c02f131 RSS

Best caching policies for almost-FIFO database?

300d63e5-65a9-4afe-9cf9-da799c02f131 Newbie
Currently Being Moderated

We are trying to use Berkeley DB to support the queues of a crawling process. Records are in the 100M-1B range. They are arranged in a number of FIFO queues. Each queue is defined by a key (a host), and the elements of a queue (multiple values) are URIs prefixed with a increasing timestamp so that they are retrieved in FIFO order.

 

The access pattern to the database is peculiar: we are steadily adding to all queues elements that will be access *much* later (in the sense that the queues tends to be long, and dequeuing takes time). Once in a while, we take a key and we read and remove quickly a burst of, say, 10-20 values (this access is somewhat random).

 

We were wondering if there's any obvious optimization for this kind of access pattern (we are not experts). We have a large cache, but we can barely keep in memory the internal nodes.

 

For instance, maybe it could be good to use EVICT_LN as a cache option when appending, as records will be re-read at a much later time.

 

Thank you for any suggestion!

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points