A key design decision when creating an application that has to process large amounts of XML data is whether to use an API that supports random access or not. APIs that only offer sequential access to the XML data (i.e., the XML infoset) are referred to as "streaming APIs". If an application does require random access to the infoset, two of the most popular options are the use of a DOM API (such as the W3C DOM API) or a binding API (such as JAXB).

So how does the performance of the latest W3C DOM API implementation from Apache Xerces and the latest JAXB 2.0 RI compare? To answer this question, I selected 4 different XML schemas, including 3 standard ones: UBL, FPML and GAML; in addition, I picked about 20 XML instances with sizes ranging from 1K to 924K. All the XML data is read form and written to memory buffers to eliminate I/O. Thus, the inner loop of the test builds a tree (or object model) from a memory buffer and then writes it back into a second memory buffer.

I used the Japex framework with the parameter japex.resultUnit set to "mbps" (or Megabits per Second) to estimate XML performance, and also to make it easy to compare the results with peak throughput available from I/O buses, network cards, etc. So, here are the resulting means:



Although I didn't actually prove this, I suspect that this is very close to or at the peak throughput achievable on this machine with these implemenations. Although, Xerces DOM is on average 20% faster, JAXB's performance is quite impressive, especially when considering all the work that the JAXB unmarshaller has to do which, for many applications, would be needed anyhow after a DOM tree is produced (e.g., converting numeric data into a binary representation).

Another interesting observation is that raw XML power of this server (in this case running 2 software threads) is between 1/4th and 1/5th of its network capacity which is 1 Gbps. Thus, at least when not using a streaming API, it seems very difficult to process XML at "wire speed". In future installments, I will add more Japex drivers for the most popular streaming APIs as well as look at the performance of binary XML encodings.