This content has been marked as final. Show 4 replies
I want to provide ACID properties. We have developed a database/session/transaction layer with a BerkeleyDB transaction log for persistent storage of changes which are not commited used as a second cache in case not enough main memory or currently a simple Google Guava cache is full (which is also used by the file backend). Maybe it would also be sufficient to use a simple log-file.
However I think it's complex to guarantee rollbacks of partly commited transactions in case of power failures and it's just small open source project started at the university with another focus plus I'm the only commiter working on some simple index structures ;-) I think it's necessary to guarantee consistency of the storage even if something happened during a transaction commit, that's probably why we provide a Java BerkeleyDB backend (the original project Treetank was started by a Ph.D. student and is now maintained by another one, but I've changed so many things that I thought I'd fork it, plus we don't agree on all modifications ;-)).
I'm also not sure how to implement a fail safe commit. At first I thought about a simple lock-file which is deleted afterwards, and a checkpoint mark in the file to remove everything, that is searching the mark and then remove everything until the end of the file if the lock-file still exists. This would also make the other transaction-log unnecessary :-)
Other than that but I assume that is not possibly I try to omit the Btree from BerkeleyDB. ;-) I think BerkeleyDB simply uses RandomAccessFiles, too. So it would be great if one can directly reference stored data by their offset in the file. But I assume I have to implement something for our file based backend.
Ok, if you need transactions, that's a good reason to use a database. The converse question is: What's wrong with your current use of BDB, what problem are you trying to solve with it? I think the answer is that you're trying to improve performance, and you think if BDB could provide an access method where the direct file offset is passed into the access methods, it would be faster. That's not practical, because the physical address of records on disk changes over time, for example, when the log cleaner migrates records forward.
To improve performance with BDB, a better approach is to read the FAQ performance section and (as recommended there) start by running DbCacheSize to see if your cache is large enough to hold all BINs. And if you're not on JE 5 yet, upgrade to JE 5 to get its performance improvements.