SAX (xerces) problem
805006Apr 14 2009 — edited Jun 8 2009I have a big problem with Apache Xerces2 Java.
I have to parse and get data from very large xml files (100 MB to 20 GB). Because the files are very large I have to use SAX parser.
If I use internal xerces in any update of jdk/jre 1.6 then whole document gets into memory. I have found a bug report related at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6536111 . I am not sure that fix will solve my problem and fix has not delivered yet. According to the bug report it is going to be delivered with jdk6 update 14 in the mid May 2009.
I thougt maybe the problem is with the internal SAX parser. So I started to use source of xerces. (I use the last version - 2.9.1). At this point I have discovered that parse takes more time and need 24 byte for each node. Sometimes xml files have 80.000.000 nodes. It will take 1,5 - 2 GB of RAM which I don't have. Even if I have RAM that size I can not use it at windows 32 platform. (OS limits)
Has anyone got idea, solution?
Thanks..