2 Replies Latest reply: Jun 4, 2007 9:56 AM by 807606 RSS

    Specialized XML parsing

    807606
      I'm writing an app that takes an XML file as input -- typically pretty huge documents (roughly 20MBish).

      Normally, a SAX parser would be a no-brainer over a DOM parser because of the file size, but in this case, for reasons I'll omit for simplicity, a SAX parser is out of the question.

      I've recently been able to pull something off in Ruby (using REXML) that accomplished a DOM-like parsing style, where portions of the document were parsed, but the whole doc wasn't sucked into memory all at once. That seemed to work pretty well, but the Ruby app was only a prototype, and I can't use it in production.

      Does anyone know enough about the Java XML parsing API to tell me if there is a way to take a DOM-ish approach without forfeiting a ton of memory to an enormous Document object? For example, I see that there's a type of Node called DocumentFragment that sounds promising, but I can't easily determine if there's really a way to use it without the initial memory forfeit.

      Thanks in advance.