4 Replies Latest reply: Aug 25, 2013 11:19 AM by user5548762 RSS

    SQL*Loader problem - not efficient, parsing error for big xml files

    user5548762

      Hi Experts,

       

      First of all, I would like to store xml files in object relation way. Therefore I created a schema and a table for it (see above).

      I wants to propagate it (by using generated xml files), hence I created a control file for sql loader (see above).

       

      I have two problems for it.

      1, It takes a lot of time. It means I can upload a ~80MB file in 2 hours and a half.

      2, At bigger files, I got the following error messages (OCI-31011: XML parsing failed OCI-19202: Error occurred in XML processing LPX-00243: element attribute value must be enclosed in quotes). It is quite interesting because my xml file is generated and I could generated and uploaded the first and second half of the file.

       

      Can you help me to solve these problems?

       

      Thanks,

      Adam

       

      Control file

      UNRECOVERABLE

      LOAD DATA

      CHARACTERSET UTF8

      INFILE *

      APPEND

      INTO TABLE coll_xml_objrel

      XMLTYPE(xml)

      FIELDS

      (

      ident constant 2

      ,file_name filler char(100)

      ,xml LOBFILE (file_name) TERMINATED BY EOF

      )

      BEGINDATA

      generated1000x10000.xml

      Sql Loader command

      sqlldr.exe username/password@//localhost:1521/SID control='loader.ctl' log='loadr.log' direct=true

      Schema

      <?xml version="1.0" encoding="UTF-8"?>

      <schema targetNamespace="http://www.something.com/shema/simple_searches" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.something.com/shema/simple_searches">

       

          <element name="searches" type="tns:searches_type"></element>

       

          <element name="search" type="tns:search_type"></element>

       

          <element name="results" type="tns:results_type"></element>

         

          <element name="result" type="tns:result_type"></element>

       

          <complexType name="searches_type">

              <sequence>

                  <element ref="tns:search" maxOccurs="unbounded"></element>

              </sequence>

          </complexType>

       

          <complexType name="search_type">

              <sequence>

                  <element ref="tns:results"></element>

              </sequence>

              <attribute ref="tns:id" use="required"></attribute>

              <attribute ref="tns:type" use="required"></attribute>

          </complexType>

       

          <complexType name="results_type">

              <sequence maxOccurs="unbounded">

                  <element ref="tns:result"></element>

              </sequence>

          </complexType>

         

          <complexType name="result_type">

              <attribute ref="tns:id" use="required"></attribute>

          </complexType>

         

          <simpleType name="type_type">

              <restriction base="string">

                  <enumeration value="value1"></enumeration>

                  <enumeration value="value2"></enumeration>

              </restriction>

          </simpleType>

       

          <attribute name="type" type="tns:type_type"></attribute>

         

          <attribute name="id" type="string"></attribute>

      </schema>

      Create table

      create table coll_xml_objrel

      (

      ident Number(20) primary key,

      xml xmltype)

      Xmltype column xml

      store as object relational

      xmlschema "http://www.something.com/schema/simple_searches.xsd"

      Element "searches";

        • 1. Re: SQL*Loader problem - not efficient, parsing error for big xml files
          Richard Harrison .

          Hi,

          It looks like you are running everything on one machine (PC?) perhaps you are just short of memory which is making everything slow and could possibly be causing the parsing error?

           

          ALso i notice you are using UTF8 - is the file saved in utf8/unicode format and is the database set up with utf8 as the default characterset?

           

          Cheers,

          Harry

          • 2. Re: SQL*Loader problem - not efficient, parsing error for big xml files
            odie_63

            Hi,

             

            First of all, you should have posted in the XML DB forum.

             

            Can you help me to solve these problems?

             

            Could you help us helping you ?

             

            • Exact database version ?
            • Sample file, or way to generate one sufficient to reproduce the issue ?
            • How did you register the schema ? Give exact command please.
            • 3. Re: SQL*Loader problem - not efficient, parsing error for big xml files
              user5548762

              Hi Richard,

               

              Thanks for your answer.

               

              Yes, you are right, I am working on a PC and perhaps I ran out of memory (XML input can not be uploaded to memory, therefore the database (or the sqlloader) truncated it, hence the parser gave error messages because the piece of XML was not valid). But I have not read anything about limitation of size of XML files on documentation of XML DB (or sqlloader). And it is a little bit strange for me that a database can not deal with huge data files.

               

              About character sets:

              The documents are in UTF-8 and I am using AL32UTF8 as default Oracle character set.

               

              Cheers,

              Adam

              • 4. Re: SQL*Loader problem - not efficient, parsing error for big xml files
                user5548762

                Hi Odie_63,

                 

                Thanks for your answer.

                 

                I will post this question in the XML DB forum too (edit: I realized that you have done it. Thanks for it).

                 

                1, Version: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

                2, see above

                3, I have registered my schema with using dbms_xmlschema.registerSchema function.

                 

                Cheers,

                Adam

                 

                XML generator:

                 

                import java.io.FileNotFoundException;

                import java.io.FileOutputStream;

                 

                import javax.xml.stream.XMLOutputFactory;

                import javax.xml.stream.XMLStreamException;

                import javax.xml.stream.XMLStreamWriter;

                 

                public class mainGenerator {

                 

                    public static void main(String[] args) throws FileNotFoundException, XMLStreamException {

                        // TODO Auto-generated method stub

                 

                        final long numberOfSearches = 500;

                        final long numberOfResults = 10000;

                      

                        XMLOutputFactory xof = XMLOutputFactory.newFactory();

                        XMLStreamWriter writer = xof.createXMLStreamWriter(new FileOutputStream("C:\\Working\\generated500x10000.xml"));

                      

                        writer.writeStartDocument();

                      

                        writer.writeStartElement("tns","searches", "http://www.something.com/schema/simple_searches");

                        writer.writeNamespace("tns", "http://www.something.com/schema/simple_searches");

                        for (long i = 0; i < numberOfSearches; i++){

                            Long help = new Long(i);

                            writer.writeStartElement("tns","search", "http://www.something.com/schema/simple_searches);

                            writer.writeAttribute("tns", "http://www.something.com/schema/simple_searches", "type", "value1");

                            writer.writeAttribute("tns", "http://www.something.com/schema/simple_searches", "id", help.toString());

                          

                            writer.writeStartElement("tns","results", "http://www.something.com/schema/simple_searches");

                          

                            for (long j = 0; j < numberOfResults; j++){

                                writer.writeStartElement("tns","result", "http://www.something.com/schema/simple_searches");

                                Long helper = new Long(i*numberOfResults+j);

                                writer.writeAttribute("tns", "http://www.something.com/schema/simple_searches", "id", helper.toString());

                                writer.writeEndElement();

                            }

                          

                            writer.writeEndElement();

                          

                            writer.writeEndElement();

                        }

                        writer.writeEndElement();

                      

                        writer.writeEndDocument();

                      

                        writer.close();

                    }

                 

                }

                 

                 

                registerSchema:

                 

                begin

                dbms_xmlschema.registerSchema(

                'http://www.something.com/schema/simple_searches',

                '<?xml version="1.0" encoding="UTF-8"?>

                <schema targetNamespace="http://www.something.com/schema/simple_searches" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="http://www.something.com/schema/simple_searches">

                 

                    <element name="searches" type="tns:searches_type"></element>

                 

                    <element name="search" type="tns:search_type"></element>

                 

                    <element name="results" type="tns:results_type"></element>

                  

                    <element name="result" type="tns:result_type"></element>

                 

                    <complexType name="searches_type">

                        <sequence>

                            <element ref="tns:search" maxOccurs="unbounded"></element>

                        </sequence>

                    </complexType>

                 

                    <complexType name="search_type">

                        <sequence>

                            <element ref="tns:results"></element>

                        </sequence>

                        <attribute ref="tns:id" use="required"></attribute>

                        <attribute ref="tns:type" use="required"></attribute>

                    </complexType>

                 

                    <complexType name="results_type">

                        <sequence maxOccurs="unbounded">

                            <element ref="tns:result"></element>

                        </sequence>

                    </complexType>

                  

                    <complexType name="result_type">

                        <attribute ref="tns:id" use="required"></attribute>

                    </complexType>

                  

                    <simpleType name="type_type">

                        <restriction base="string">

                            <enumeration value="value1"></enumeration>

                            <enumeration value="value2"></enumeration>

                        </restriction>

                    </simpleType>

                 

                    <attribute name="type" type="tns:type_type"></attribute>

                  

                    <attribute name="id" type="string"></attribute>

                </schema>',

                TRUE, TRUE, FALSE, FALSE);

                end