Skip to Main Content

Java EE (Java Enterprise Edition) General Discussion

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

SAX (xerces) problem

805006Apr 14 2009 — edited Jun 8 2009
I have a big problem with Apache Xerces2 Java.

I have to parse and get data from very large xml files (100 MB to 20 GB). Because the files are very large I have to use SAX parser.

If I use internal xerces in any update of jdk/jre 1.6 then whole document gets into memory. I have found a bug report related at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6536111 . I am not sure that fix will solve my problem and fix has not delivered yet. According to the bug report it is going to be delivered with jdk6 update 14 in the mid May 2009.

I thougt maybe the problem is with the internal SAX parser. So I started to use source of xerces. (I use the last version - 2.9.1). At this point I have discovered that parse takes more time and need 24 byte for each node. Sometimes xml files have 80.000.000 nodes. It will take 1,5 - 2 GB of RAM which I don't have. Even if I have RAM that size I can not use it at windows 32 platform. (OS limits)

Has anyone got idea, solution?

Thanks..

Comments

Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jul 6 2009
Added on Apr 14 2009
20 comments
299 views