This discussion is archived
4 Replies Latest reply: Sep 10, 2013 11:03 AM by odie_63 RSS

slow extraction in big XML-Files with PL/SQL

MaNö Newbie
Currently Being Moderated

Hello,

 

i have a performance problem with the extraction from attributes in big XML Files. I tested with a size of ~ 30 mb.

 

The XML file is a response of a webservice. This response include some metadata of a document and the document itself. The document is inline embedded with a Base64 conversion.  Here is an example of a XML File i want to analyse:

 

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <ns2:GetDocumentByIDResponse xmlns:ns2="***">
         <ArchivedDocument>
            <ArchivedDocumentDescription version="1" currentVersion="true" documentClassName="Allgemeines Dokument" csbDocumentID="***">
               <Metadata archiveDate="2013-08-01+02:00" documentID="123">
                  <Descriptor type="Integer" name="fachlicheId">
                     <Value>123<Value>
                  </Descriptor>
                  <Descriptor type="String" name="user">
                     <Value>***</Value>
                  </Descriptor>
                  <InternalDescriptor type="Date" ID="DocumentDate">
                     <Value>2013-08-01+02:00</Value>
                  </InternalDescriptor>
             
                  <!-- Here some more InternalDescriptor Nodes -->
              
               </Metadata>
               <RepresentationDescription default="true" description="Description" documentPartCount="1" mimeType="application/octet-stream">
                  <DocumentPartDescription fileName="20mb.test" mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" hashValue=""/>
               </RepresentationDescription>
            </ArchivedDocumentDescription>
            <DocumentPart mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" representationNumber="0">
               <Data fileName="20mb.test">
                  <BinaryData>
                    <!-- Here is the BASE64 converted document -->
                  </BinaryData>
               </Data>
            </DocumentPart>
         </ArchivedDocument>
      </ns2:GetDocumentByIDResponse>
   </soap:Body>
</soap:Envelope>

 

Now i want to extract the filename and the Base64 converted document from this XML response.

 

For the extraction of the filename i use the following command:

v_filename := apex_web_service.parse_xml(v_xml, '//ArchivedDocument/ArchivedDocumentDescription/RepresentationDescription/DocumentPartDescription/@fileName');

 

For the extraction of the binary data i use the following command:

v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');

 

My problem is the performance of this extraction. Here i created some summary of the start and end time for the commands:

 

Start TimeEnd TimeDifferenceCommand
10.09.13 - 15:46:11,40266800010.09.13 - 15:47:21,40789500000:01:10,005227v_filename_bcm := apex_web_service.parse_xml(v_xml, '//ArchivedDocument/ArchivedDocumentDescription/RepresentationDescription/DocumentPartDescription/@fileName');
10.09.13 - 15:47:21,40789500010.09.13 - 15:47:22,33678600000:00:00,928891v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');

 

As you can see the extraction of the filename is slower then the document extraction. For the Extraction of the filename i need ~01

 

I wonder about it and started some tests.

 

I tried to use an exact - non dynamic - filename. So i have this commands:

v_filename := '20mb_1.test';
v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');

 

Under this Conditions the time for the document extraction soar. You can see this in the following table:

 

Start TimeEnd TimeDifferenceCommand
10.09.13 - 16:02:33,21203500010.09.13 - 16:02:33,21254200000:00:00,000507v_filename_bcm := '20mb_1.test';
10.09.13 - 16:02:33,21254200010.09.13 - 16:03:40,34239600000:01:07,129854v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');

 

So i'm looking for a faster extraction out of the xml file. Do you have any ideas? If you need more informations, please ask me.

 

Thank you,

 

Matthias

PS: I use the Oracle 11.2.0.2.0

  • 1. Re: slow extraction in big XML-Files with PL/SQL
    BluShadow Guru Moderator
    Currently Being Moderated

    Have a look at "mdrake"'s answers on this thread:

     

    Re: XML file processing into oracle

     

    You may also like to look at other answers in the XML DB forum, and also the XML DB forum FAQ:

     

    XML DB FAQ

     

    as that is the place where the XML specialists tend to hang out.

     

    Is the "apex_webs_service.parse_xml_clob" a user defined package/function? or is it something that comes with Apex?

  • 2. Re: slow extraction in big XML-Files with PL/SQL
    MaNö Newbie
    Currently Being Moderated

    Hey, first i want to thank your for the fast repost.Unfortunally i don't find an answer for my problem in the links. The "apex_webs_service.parse_xml_clob" function comes with Apex. You can read more about it here: http://docs.oracle.com/cd/E17556_01/doc/apirefs.40/e15519/apex_web_service.htm

  • 3. Re: slow extraction in big XML-Files with PL/SQL
    BluShadow Guru Moderator
    Currently Being Moderated

    Well, based on what you posted, the delay appears to be the first attempt to read contents from the XML, so in your first test the first thing you tried to read was the filename, and that was the 'slow' part, and then when you eliminated that, the document itself appeared to go slower, but that would be because it's the first thing you try to extract from the XML.  Subsequent extractions from the same XML will likely benefit from caching etc.

     

    You may also want to use explicit paths in your XQuery expression rather than starting it with "\\" which will cause it to try and scan the whole XML looking for any matching nodes to start from.  An explicit path will give it direct access to the correct node.  How much benefit that will give I'm not sure without testing... so you can test that.

     

    As for the links I provided, they do offer advice on registering an XML schema, describing the XML, on the database in order to shred the XML against, giving significant performance increases.  If you don't think that'll help then I'm not sure what will, and you'd still be better asking in the XML DB forum.

  • 4. Re: slow extraction in big XML-Files with PL/SQL
    odie_63 Guru
    Currently Being Moderated

    Although using an XML schema is a good advice for an XML-centric application, I think it's a little overkill in this situation.

     

    Here are two approaches you can test :

     

    • Using the DOM interface over your XMLType variable, for example :

     

    DECLARE

     

      v_xml    xmltype := xmltype('<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> 

           <soap:Body> 

              <ns2:GetDocumentByIDResponse xmlns:ns2="***"> 

                 <ArchivedDocument> 

                    <ArchivedDocumentDescription version="1" currentVersion="true" documentClassName="Allgemeines Dokument" csbDocumentID="***"> 

                       <Metadata archiveDate="2013-08-01+02:00" documentID="123"> 

                          <Descriptor type="Integer" name="fachlicheId"> 

                             <Value>123</Value> 

                          </Descriptor> 

                          <Descriptor type="String" name="user"> 

                             <Value>***</Value> 

                          </Descriptor> 

                          <InternalDescriptor type="Date" ID="DocumentDate"> 

                             <Value>2013-08-01+02:00</Value> 

                          </InternalDescriptor> 

                      

                          <!-- Here some more InternalDescriptor Nodes --> 

                       

                       </Metadata> 

                       <RepresentationDescription default="true" description="Description" documentPartCount="1" mimeType="application/octet-stream"> 

                          <DocumentPartDescription fileName="20mb.test" mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" hashValue=""/> 

                       </RepresentationDescription> 

                    </ArchivedDocumentDescription> 

                    <DocumentPart mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" representationNumber="0"> 

                       <Data fileName="20mb.test"> 

                          <BinaryData> 

                            ABC123 

                          </BinaryData> 

                       </Data> 

                    </DocumentPart> 

                 </ArchivedDocument> 

              </ns2:GetDocumentByIDResponse> 

           </soap:Body> 

        </soap:Envelope>');

       

      domDoc    dbms_xmldom.DOMDocument;

      docNode   dbms_xmldom.DOMNode;

      node      dbms_xmldom.DOMNode;

     

      nsmap     varchar2(2000) := 'xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns2="***"';

      xpath_pfx varchar2(2000) := '/soap:Envelope/soap:Body/ns2:GetDocumentByIDResponse/';

     

      istream   sys.utl_characterinputstream;

      buf       varchar2(32767);

      numRead   pls_integer := 1;

     

      filename       varchar2(30);

      base64clob     clob;

     

    BEGIN

     

      domDoc := dbms_xmldom.newDOMDocument(v_xml);

      docNode := dbms_xmldom.makeNode(domdoc);

     

      filename := dbms_xslprocessor.valueOf(

                    docNode

                  , xpath_pfx || 'ArchivedDocument/ArchivedDocumentDescription/RepresentationDescription/DocumentPartDescription/@fileName'

                  , nsmap

                  ) ;

                 

      node := dbms_xslprocessor.selectSingleNode(

                docNode

              , xpath_pfx || 'ArchivedDocument/DocumentPart/Data/BinaryData/text()'

              , nsmap

              ) ;

     

      --create an input stream to read the node content :

      istream := dbms_xmldom.getNodeValueAsCharacterStream(node);

      dbms_lob.createtemporary(base64clob, false);

     

      -- read the content in 32k chunk and append data to the CLOB :

      loop

        istream.read(buf, numRead);

        exit when numRead = 0;

        dbms_lob.writeappend(base64clob, numRead, buf);

      end loop;

     

      -- free resources :

      istream.close();

      dbms_xmldom.freeDocument(domDoc);

     

    END;

    /

     

    • Using a temporary XMLType storage (binary XML) :

     

    create table tmp_xml of xmltype

    xmltype store as securefile binary xml;

     

    insert into tmp_xml values( v_xml );

     

    select x.*

    from tmp_xml t

       , xmltable(

           xmlnamespaces(

             'http://schemas.xmlsoap.org/soap/envelope/' as "soap"

           , '***' as "ns2"

           )

         , '/soap:Envelope/soap:Body/ns2:GetDocumentByIDResponse/ArchivedDocument/DocumentPart/Data'

           passing t.object_value

           columns filename    varchar2(30) path '@fileName'

                 , base64clob  clob         path 'BinaryData'

         ) x

    ;

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points