There are no limits on key size, value size, number of records or database size. However, these sizes, along with your access pattern, do affect what resources your application will need.
- key and data size impacts how much of the database can fit into cache. We have seen applications with a full range of key and record sizes, ranging from a few bytes to thousands of bytes, to megabytes. The larger the key, the smaller the amount of the internal better that can fit into cache.
-how much of the database is "hot" matters more than the size of the database itself. We have seen applications that use databases that range from quite small to terabytes.
You may want to look at com.sleepycat.je.util.DbCacheSize to get some guidelines on cache sizing, and to do some estimates of your own. Please also read the getting started guide, in particular this section: http://www.oracle.com/technology/documentation/berkeley-db/je/GettingStartedGuide/logfilesrevealed.html
Presumably there is some internal id for a database that isn't its name, and that id has a maximum (or is a variable length encoding?). While I will only have a few thousand databases at any one time, I'll be regularly creating and removing them. May I assume that over the lifetime of an environment I should be able to create and destroy more than 2^32 databases?
Just to be clear, that 2^31 limit is over the lifetime of an environment, not a limit for the number that exist at any one time?
Knowing this, I can still leverage the benefits of truncateDatabase by recycling databases after emptying them. Good to know.
Assuming it could be optimized, a ranged delete operation would be a great addition. I want to delete 100 to 10k consecutive records at a time, on a continual basis. For now I'll proceed with a single database and individual record deletes. Thanks.
I'm trying to read between the lines and have concluded that you're using key ranges instead of databases, because you can't create enough databases in total over the lifetime of your app. Correct?
Would support for 2^63 databases solve your problem? Not promising anything, just curious.
An optimized range deletion is a nice thing, and we should probably do it in the future. But because of JE's architecture I don't think it will ever be nearly as fast as a Database removal or truncation, which is already optimized.
Also, what is the average size -- number of records, key/data sizes -- of each data set (what you'd like to store in each Database)? If the average size of a database is extremely small, the per-Database overhead may be a big factor.
Yes, 2^63 databases would work.
2^31, given a 3 year hardware lifecycle, and a not unreasonable expected write rate (and therefore expected database turnover) would make using truncateDatabase too risky (within a safety factor of 2x). Alternatively, if the max database id were exposed in stats or something, we could know when it was getting close and initiate an automated re-provisioning process (where the host is wiped and data re-replicated back in).
For the purposes of discussion our records are keyed by "writer id" + "per writer sequence number". After accumulating so much data per writer, it gets moved elsewhere (out of BDB) and deleted, while additional writes happen at the tail. Deleting that as efficiently as possible is preferred. Given the current apis, that translates to using optimally 2 databases per "writer id" (the one we just overflowed and will soon delete, and the one we are now writing to).
However, another concern was the FAQ entry about checkpoint overhead (more than a magnitude worse) when using multiple databases. Our use case would have multiple writes to each db, so wouldn't be as pathological. I was going to write some test code for that scenario to see what it looks like.
We could easily expose the max database id in the stats. I will also give you a quick and dirty hack to obtain it from the Environment, although we would not guarantee that it would be a supported api in future releases.
Environment env = ...'
should do the job for you.
I'm not sure whether the small size of your databases, and the per-database overhead including checkpointing, will outweigh the advantages of database removal over record removal. You're wise to test this for your particular parameters.
Nice. That will let us monitor it, and engineer a work around if need be. I'm passing this info around internally, as I know at least one other team was considering doing something similar.
Is the checkpoint overhead correlated with the number of databases that ever existed, that exist now, or that had activity since the last checkpoint?