This discussion is archived
0 Replies Latest reply: Feb 15, 2012 11:44 AM by 917772 RSS

Berkeley XML DB Performance

917772 Newbie
Currently Being Moderated
We need help with maximizing performance of our use of Berkeley DB XML.

*1. Describe the Performance area that you are measuring? What is the current performance? What are your performance goals you hope to achieve?*

I'm using the Berkeley DB XML to insert and query XML in a stream processing system that I'm developing. Multiple processes accessing the same container concurrently.
However, it takes longer to insert the XMLs in the database. I made a test program to sum the time spent on calls to the method "putDocument" to insert 20.000XMLs (320.000 bytes), I got 81s. To save them in an XML file on disk, just 0.035s were spent. How can I reduce the insertion time?

*2. What Berkeley DB XML Version? Any optional configuration flags specified? Are you running with any special patches? Please specify?*

Version dbxml-2.5.16
No special patches.

*3. What Berkeley DB Version? Any optional configuration flags specified? Are you running with any special patches? Please Specify.*

Version db-4.8.26
No special patches.

*4. Processor name, speed and chipset?*

Intel(R) Core(TM)2 Duo CPU E7500 @ 2.93GHz

*5. Operating System and Version?*

Ubuntu 10.04 LTS- the Lucid Lynx.
     
*6. Disk Drive Type and speed?*

Don't have that information

*7. File System Type? (such as EXT2, NTFS, Reiser)*

EXT3

*8. Physical Memory Available?*

3G

*9. Are you using Replication (HA) with Berkeley DB XML? If so, please describe the network you are using, and the number of Replica’s.*

No.

*10. Are you using a Remote Filesystem (NFS) ? If so, for which Berkeley DB XML/DB files?*

No.

*11. What type of mutexes do you have configured? Did you specify –with-mutex=? Specify what you find inn your config.log, search for db_cv_mutex?*

None.

*12. Which API are you using (C++, Java, Perl, PHP, Python, other) ? Which compiler and version?*

C++
Compiler: g++ Version: 4:4.4.3-1ubunt


*13. If you are using an Application Server or Web Server, please provide the name and version?*

No

*14. Please provide your exact Environment Configuration Flags (include anything specified in you DB_CONFIG file)*

u_int32_t envFlags = DB_CREATE|DB_INIT_MPOOL;
u_int32_t envCacheSize = 64*1024*1024;
int dberr;
DB_ENV *dbEnv = 0;
dberr = db_env_create(&dbEnv, 0);
if (dberr == 0) {
     dbEnv->set_cachesize(dbEnv, 0, envCacheSize, 1);
     dberr = dbEnv->open(dbEnv, path2DbEnv.c_str(), envFlags, 0);
}

*15. Please provide your Container Configuration Flags?*

XmlManager db(dbEnv, DBXML_ADOPT_DBENV | DBXML_ALLOW_EXTERNAL_ACCESS);
XmlContainerConfig config;
config.setAllowValidation(true);
config.setContainerType(XmlContainer::WholedocContainer);
XmlContainer container = db.createContainer(theContainer, config);
XmlUpdateContext updateContext = db.createUpdateContext();
XmlIndexSpecification idxSpec = container.getIndexSpecification();
idxSpec.setAutoIndexing(false);
container.setIndexSpecification(idxSpec, updateContext);

*16. How many XML Containers do you have? For each one please specify:*
One.

     *1. The Container Configuration Flags*

     XmlManager db(dbEnv, DBXML_ADOPT_DBENV | DBXML_ALLOW_EXTERNAL_ACCESS);
     XmlContainerConfig config;
     config.setAllowValidation(true);
     config.setContainerType(XmlContainer::WholedocContainer);
     XmlContainer container = db.createContainer(theContainer, config);
     XmlUpdateContext updateContext = db.createUpdateContext();
     XmlIndexSpecification idxSpec = container.getIndexSpecification();
     idxSpec.setAutoIndexing(false);
     container.setIndexSpecification(idxSpec, updateContext);

     *2. How many documents?*
     Many documents.

     *3. What type (node or wholedoc)?*
     whole     

*18. What is the rate of document insertion/update required or expected? Are you doing partial node updates (via XmlModify) or replacing the document?*

Since we are using the database to persist a stream, the access pattern should be bursty, with high demand/low demand alternating peaks. No.


*21. Are you running with Transactions? If so please provide any transactions flags you specify with any API calls.*
No


*23. Do you use AUTO_COMMIT?*
No.


*26. Please include a paragraph describing the performance measurements you have made. Please specifically list any Berkeley DB operations where the performance is currently insuffici*ent.

I made a test program to sum the time spent on calls to the method "putDocument" to insert 20.000XMLs. Code bellow:
...
double temp = 0;
for(int j = 0; j < 20000; j++){
     myXMLDoc.setContent( document1);
     struct timeval t1;
     struct timeval t2;
     gettimeofday(&t1, NULL);
     container.putDocument(myXMLDoc, updateContext, DBXML_GEN_NAME);
     gettimeofday(&t2, NULL);
     temp += (t2.tv_sec - t1.tv_sec) + (t2.tv_usec - t1.tv_usec) * 0.000001;
}
...

*29. Are there any other significant applications running on this system? Are you using Berkeley DB outside of Berkeley DB XML? Please describe the application?*
No.No.


Thanks,
Ana Paula

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points