This content has been marked as final. Show 9 replies
This answer will sound really vague and trite, but in the case of XML processing, it actually carries some more weight than in other cases: it depends entirely on what you want to do with your XML
Currently, I deal with a fair amount of XML in my work, but rarely actually write any Java XML code at all. It's all done with XSLT and binding frameworks. Doesn't make that the right approach for every task, by a long way
well I'm reading this Java XML guide that was written a few years back. It said JDOM allows you to access any part of the DOM tree at any time unlike SAX and it's much simpler than DOM. So I was wondering if there's any other API's out there now that does the same and maybe more. Also, what's the most popular XML API for Java people use? (I guess that could give me an indication of "best")
Package org.w3c.dom is built around language-agnostic guide-lines.
JDOM is a replacement for the DOM API, and it uses Java's features like Collections etc. where applicable, as opposed to an API providing counts and element-getters but no iterators. Therefore JDOM is more convenient and it better fits into a framework with, say, Velocity.
But DOM is only a part of the XML picture.
DOM's are only useful for small (say <=1 Mb) XML documents because DOM is very memory hungry.
The rule of thumb is: minimumDomRam = sizeOfXmlFileOnDisk * 3... and that's a minimum, I've seen that go as high as 6 * sizeOfXmlFile.
DOM keeps the whole of the parsed document-tree in memory, which can be especially problematic in a Java server, which must by it's nature be able to satisfy many concurrent client requests, which means no single user or process can be allowed to "hog resources". So on the server-side we mainly use SAX Parsers, or a "higher-level" XML-Binding technology like XMLBeans (or JAXB apparently, though I've never personally used it, because most of our stuff all predates JAXB's release, and why swap horses now).
<snip>I was writing a treatise on virtual memory, thrashing, and how to avoid it... but you can google those terms and find more instructive articles than anything I could write.</snip>
DOMs' strength is that it facilitates modifing the contents of the document directly in code. So you can build a new DOM document (or deserialize and then modify an existing one) and then just automagically serialize it. Marvelous! (so long as that process only ever handles small XML documents!).
JDOM: (as previously stated) is a java-native-DOM... In my mind it sits between a "real language agnostic W3C DOM" and the XML-binders like XMLBeans. It gives you much of the functionality of a DOM with a significantly smaller memory footprint and much faster (though still not brilliant) processing primitives. But it's still suitable only for small-ish XML documents (say < 2 or maybe 3 Mb, depending all sorts of stuff).
SAX parsers process a "stream of elements" meaning that you only keep each xml-element's contents in memory for as long as you need to process that element... so (if you're sane) you don't ever have the whole document in memory at once, hance you use a ship-load less RAM than you would for the equivalent DOM. SAX's "callbacks" are very fast (as fast as can be). So SAX is the only way to go when you handle even-sometimes-big (say 5 Mb plus) documents.
SAX doesn't facilitate modifying the contents of the document directly, though you can use XSLT's (transforms) to modify existing models, this process is very cumbersome and technically terribly confusterpating (ergo: basically pretty sh1tty), and therefore (IMHO) should (almost always) be avoided.
StAX sort-of sits in between DOM and SAX.
StAX also processes XML as a stream-of-elements. Conceptually speaking: StAX gives you forward-only element-iterator on the document-tree. It differs from SAX in that your program "pulls" the elements as you require them; where-as SAX logically "pushes" the elements to you to deal with as they occure in the document. In some circumstances, "pulling" the data as you need it can dramatically reduce the "statefullness" (the number/size of elements you need to remember at any one time) inherent in the process.
StAX excels at processing large (not huge) documents with relatively complex (i.e. inherently stateful) schemas. The downside of StAX is that (IMHO) your parse-code is more complex than the SAX equivalent. StAX operations are also a tad slower than there SAX equivalents.
If I wrote my own XML-binding-code-generator it would use StAX under the hood.
XML-binders allow you to construct and/or modify an "object graph" (a tree-like-data-structure of Java objects which represent the same "model" as the XML). This graph is logically equivalent to a DOM. You still have the whole object-graph in memory at one time, but because they're "native objects" (as apposed to an abstract model of those objects) there's a lot less overhead; meaning that (depending on the nature of the data therein) an object graph tends to be much much smaller in memory than it's equivalent XML document (GOOD!) and the operations are much much faster because native object offer native accessors and mutators (GOOD! GOOD!).
If anyone strongly disagrees with anything I've said I'm all ears... This is just my opinion, and I stand (or sit, as the case may be) to be corrected.
Edited by: corlettk on 21/03/2009 10:15 ~~ Can't spell, can't type, can't code. Darn!