I am writing an application in java using Berkeley DB XML (2.5.16). I have to run a test over the application, in which a lot of XQuery Update operations have to be executed. All update operations are insert statements. In the beginning the test works fine. However, as the number of updates increases (over 1000 XQuery Updates), there is also a great increase in time of execution. How can I enhance the performance of updates?
1) The environment configuration is the following: EnvironmentConfig envConfig = new EnvironmentConfig(); envConfig.setAllowCreate(true); envConfig.setInitializeCache(true); envConfig.setCacheSize(512*1024*1024); envConfig.setInitializeLocking(false); envConfig.setInitializeLogging(false); envConfig.setTransactional(false);
2) No transactions are used
3) The container is a node container
4) Auto-indexing is on (I need that)
5) The container includes only one xml of 2,5MB size.
6) In some insert statements 1000 nodes are to be inserted.
7) I am running windows vista on a 4GB Ram pc.
Thank you in advance,
I believe the problem you are experiencing is because you're trying to put all of your nodes into one single XML document. In my experience, you'll get the performance you're looking for if you break up your document into as many "smaller" nodes/documents as you can and do your node insertion that way; conversely, if you need the entire document out of the database, you can use XQuery to stitch it back together - think of it as XML database "normalization" :)
Hope that helps,
Thanks a lot for your reply! However, I have a few more questions.
Given that the xml is about 2.5MB size, is it considered to be that big for Berkeley DB so it needs to be break down into smaller documents?
Also, after having executed ~4000 xquery update insert operations and doubled xml’s size, the performance it’s really getting worse… An insertion may even take 1.5 min, when for each of the first 3000 insertions only less than 0.5 sec is needed… Is there something I am missing in the configuration of Berkeley DB? If I set autoindexing off and try to maintain fewer indexes, is it possible to see significantly better performance? Till now I am just getting the following error when I set my indexes and try to execute consequent insertions over the same node: Exception in thread "main" com.sleepycat.dbxml.XmlException: Error: Sequence does not match type item() - the sequence does not contain items [err:XUDY0027], <query>:1:1, errcode = QUERY_EVALUATION_ERROR+ at com.sleepycat.dbxml.dbxml_javaJNI.XmlQueryExpression_execute__SWIG_1(Native Method) at com.sleepycat.dbxml.XmlQueryExpression.execute(XmlQueryExpression.java:85)
It's not that the file size is too big, I think it has more to do with your intent or what you're trying to do with the document. If you need to perform thousands of insert/updates into a database that only contains a single document, you are not leveraging the benefits of using the database to begin with. Your mileage may vary, and others may disagree.
That said, a couple of other things you might want to look at given your existing architecture:
- make sure all statistics are turned off
- use the WholeDocContainer type instead of the node type
- turn off auto indexing (which you've already done) and make sure you're only indexing the nodes/attributes/metadata you need to be indexing