We are trying to use Berkeley DB to support the queues of a crawling process. Records are in the 100M-1B range. They are arranged in a number of FIFO queues. Each queue is defined by a key (a host), and the elements of a queue (multiple values) are URIs prefixed with a increasing timestamp so that they are retrieved in FIFO order.
The access pattern to the database is peculiar: we are steadily adding to all queues elements that will be access *much* later (in the sense that the queues tends to be long, and dequeuing takes time). Once in a while, we take a key and we read and remove quickly a burst of, say, 10-20 values (this access is somewhat random).
We were wondering if there's any obvious optimization for this kind of access pattern (we are not experts). We have a large cache, but we can barely keep in memory the internal nodes.
For instance, maybe it could be good to use EVICT_LN as a cache option when appending, as records will be re-read at a much later time.