I'm trying to optimize our storage backend once more. We currently am able to switch between a simple RandomAccessFile and a BerkeleyDB backend. However I think we don't need a B-tree or something. I'm working on a treebased storage system and due to our simple node to nodePage mapping we know in advance without any index structures where the nodes reside (in which in memory NodePage). For our file backend we simply use offset values and "equip" references with this offset to the NodePages on external storage. Instead of the offset we simply used an integer-counter which is incremented after storing a databaseEntry as keys for referencing the on-disc storage. However I think a B-tree will be too much in this case and maybe another solution is way better. We store several revisions of the data using full dumps, incremental or differential dumps, however everytime storing an additional B-tree for referencing the on-disc data multiplies the cost (as it is most probably not a persistent B-tree (terminology as in persistent functional datastructures)). The latter would be very very cool, however :-) In the file backend we simply add new data at the end and as of now never delete any data before (for instance as in ZFS/Btree copy on write). In the BerkeleyDB backend we simply increment the counter everytime to store new data and reference it.
I want to provide ACID properties. We have developed a database/session/transaction layer with a BerkeleyDB transaction log for persistent storage of changes which are not commited used as a second cache in case not enough main memory or currently a simple Google Guava cache is full (which is also used by the file backend). Maybe it would also be sufficient to use a simple log-file.
However I think it's complex to guarantee rollbacks of partly commited transactions in case of power failures and it's just small open source project started at the university with another focus plus I'm the only commiter working on some simple index structures ;-) I think it's necessary to guarantee consistency of the storage even if something happened during a transaction commit, that's probably why we provide a Java BerkeleyDB backend (the original project Treetank was started by a Ph.D. student and is now maintained by another one, but I've changed so many things that I thought I'd fork it, plus we don't agree on all modifications ;-)).
I'm also not sure how to implement a fail safe commit. At first I thought about a simple lock-file which is deleted afterwards, and a checkpoint mark in the file to remove everything, that is searching the mark and then remove everything until the end of the file if the lock-file still exists. This would also make the other transaction-log unnecessary :-)
Other than that but I assume that is not possibly I try to omit the Btree from BerkeleyDB. ;-) I think BerkeleyDB simply uses RandomAccessFiles, too. So it would be great if one can directly reference stored data by their offset in the file. But I assume I have to implement something for our file based backend.
Ok, if you need transactions, that's a good reason to use a database. The converse question is: What's wrong with your current use of BDB, what problem are you trying to solve with it? I think the answer is that you're trying to improve performance, and you think if BDB could provide an access method where the direct file offset is passed into the access methods, it would be faster. That's not practical, because the physical address of records on disk changes over time, for example, when the log cleaner migrates records forward.
To improve performance with BDB, a better approach is to read the FAQ performance section and (as recommended there) start by running DbCacheSize to see if your cache is large enough to hold all BINs. And if you're not on JE 5 yet, upgrade to JE 5 to get its performance improvements.