4 Replies Latest reply: Sep 10, 2013 1:03 PM by odie_63 RSS

    slow extraction in big XML-Files with PL/SQL

    MaNö

      Hello,

       

      i have a performance problem with the extraction from attributes in big XML Files. I tested with a size of ~ 30 mb.

       

      The XML file is a response of a webservice. This response include some metadata of a document and the document itself. The document is inline embedded with a Base64 conversion.  Here is an example of a XML File i want to analyse:

       

      <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
         <soap:Body>
            <ns2:GetDocumentByIDResponse xmlns:ns2="***">
               <ArchivedDocument>
                  <ArchivedDocumentDescription version="1" currentVersion="true" documentClassName="Allgemeines Dokument" csbDocumentID="***">
                     <Metadata archiveDate="2013-08-01+02:00" documentID="123">
                        <Descriptor type="Integer" name="fachlicheId">
                           <Value>123<Value>
                        </Descriptor>
                        <Descriptor type="String" name="user">
                           <Value>***</Value>
                        </Descriptor>
                        <InternalDescriptor type="Date" ID="DocumentDate">
                           <Value>2013-08-01+02:00</Value>
                        </InternalDescriptor>
                   
                        <!-- Here some more InternalDescriptor Nodes -->
                    
                     </Metadata>
                     <RepresentationDescription default="true" description="Description" documentPartCount="1" mimeType="application/octet-stream">
                        <DocumentPartDescription fileName="20mb.test" mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" hashValue=""/>
                     </RepresentationDescription>
                  </ArchivedDocumentDescription>
                  <DocumentPart mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" representationNumber="0">
                     <Data fileName="20mb.test">
                        <BinaryData>
                          <!-- Here is the BASE64 converted document -->
                        </BinaryData>
                     </Data>
                  </DocumentPart>
               </ArchivedDocument>
            </ns2:GetDocumentByIDResponse>
         </soap:Body>
      </soap:Envelope>
      

       

      Now i want to extract the filename and the Base64 converted document from this XML response.

       

      For the extraction of the filename i use the following command:

      v_filename := apex_web_service.parse_xml(v_xml, '//ArchivedDocument/ArchivedDocumentDescription/RepresentationDescription/DocumentPartDescription/@fileName');
      

       

      For the extraction of the binary data i use the following command:

      v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');
      

       

      My problem is the performance of this extraction. Here i created some summary of the start and end time for the commands:

       

      Start TimeEnd TimeDifferenceCommand
      10.09.13 - 15:46:11,40266800010.09.13 - 15:47:21,40789500000:01:10,005227v_filename_bcm := apex_web_service.parse_xml(v_xml, '//ArchivedDocument/ArchivedDocumentDescription/RepresentationDescription/DocumentPartDescription/@fileName');
      10.09.13 - 15:47:21,40789500010.09.13 - 15:47:22,33678600000:00:00,928891v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');

       

      As you can see the extraction of the filename is slower then the document extraction. For the Extraction of the filename i need ~01

       

      I wonder about it and started some tests.

       

      I tried to use an exact - non dynamic - filename. So i have this commands:

      v_filename := '20mb_1.test';
      v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');
      

       

      Under this Conditions the time for the document extraction soar. You can see this in the following table:

       

      Start TimeEnd TimeDifferenceCommand
      10.09.13 - 16:02:33,21203500010.09.13 - 16:02:33,21254200000:00:00,000507v_filename_bcm := '20mb_1.test';
      10.09.13 - 16:02:33,21254200010.09.13 - 16:03:40,34239600000:01:07,129854v_clob := apex_web_service.parse_xml_clob(v_xml, '//ArchivedDocument/DocumentPart/Data/BinaryData/text()');

       

      So i'm looking for a faster extraction out of the xml file. Do you have any ideas? If you need more informations, please ask me.

       

      Thank you,

       

      Matthias

      PS: I use the Oracle 11.2.0.2.0

        • 1. Re: slow extraction in big XML-Files with PL/SQL
          BluShadow

          Have a look at "mdrake"'s answers on this thread:

           

          Re: XML file processing into oracle

           

          You may also like to look at other answers in the XML DB forum, and also the XML DB forum FAQ:

           

          XML DB FAQ

           

          as that is the place where the XML specialists tend to hang out.

           

          Is the "apex_webs_service.parse_xml_clob" a user defined package/function? or is it something that comes with Apex?

          • 2. Re: slow extraction in big XML-Files with PL/SQL
            MaNö

            Hey, first i want to thank your for the fast repost.Unfortunally i don't find an answer for my problem in the links. The "apex_webs_service.parse_xml_clob" function comes with Apex. You can read more about it here: http://docs.oracle.com/cd/E17556_01/doc/apirefs.40/e15519/apex_web_service.htm

            • 3. Re: slow extraction in big XML-Files with PL/SQL
              BluShadow

              Well, based on what you posted, the delay appears to be the first attempt to read contents from the XML, so in your first test the first thing you tried to read was the filename, and that was the 'slow' part, and then when you eliminated that, the document itself appeared to go slower, but that would be because it's the first thing you try to extract from the XML.  Subsequent extractions from the same XML will likely benefit from caching etc.

               

              You may also want to use explicit paths in your XQuery expression rather than starting it with "\\" which will cause it to try and scan the whole XML looking for any matching nodes to start from.  An explicit path will give it direct access to the correct node.  How much benefit that will give I'm not sure without testing... so you can test that.

               

              As for the links I provided, they do offer advice on registering an XML schema, describing the XML, on the database in order to shred the XML against, giving significant performance increases.  If you don't think that'll help then I'm not sure what will, and you'd still be better asking in the XML DB forum.

              • 4. Re: slow extraction in big XML-Files with PL/SQL
                odie_63

                Although using an XML schema is a good advice for an XML-centric application, I think it's a little overkill in this situation.

                 

                Here are two approaches you can test :

                 

                • Using the DOM interface over your XMLType variable, for example :

                 

                DECLARE

                 

                  v_xml    xmltype := xmltype('<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> 

                       <soap:Body> 

                          <ns2:GetDocumentByIDResponse xmlns:ns2="***"> 

                             <ArchivedDocument> 

                                <ArchivedDocumentDescription version="1" currentVersion="true" documentClassName="Allgemeines Dokument" csbDocumentID="***"> 

                                   <Metadata archiveDate="2013-08-01+02:00" documentID="123"> 

                                      <Descriptor type="Integer" name="fachlicheId"> 

                                         <Value>123</Value> 

                                      </Descriptor> 

                                      <Descriptor type="String" name="user"> 

                                         <Value>***</Value> 

                                      </Descriptor> 

                                      <InternalDescriptor type="Date" ID="DocumentDate"> 

                                         <Value>2013-08-01+02:00</Value> 

                                      </InternalDescriptor> 

                                  

                                      <!-- Here some more InternalDescriptor Nodes --> 

                                   

                                   </Metadata> 

                                   <RepresentationDescription default="true" description="Description" documentPartCount="1" mimeType="application/octet-stream"> 

                                      <DocumentPartDescription fileName="20mb.test" mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" hashValue=""/> 

                                   </RepresentationDescription> 

                                </ArchivedDocumentDescription> 

                                <DocumentPart mimeType="application/octet-stream" length="20971520 " documentPartNumber="0" representationNumber="0"> 

                                   <Data fileName="20mb.test"> 

                                      <BinaryData> 

                                        ABC123 

                                      </BinaryData> 

                                   </Data> 

                                </DocumentPart> 

                             </ArchivedDocument> 

                          </ns2:GetDocumentByIDResponse> 

                       </soap:Body> 

                    </soap:Envelope>');

                   

                  domDoc    dbms_xmldom.DOMDocument;

                  docNode   dbms_xmldom.DOMNode;

                  node      dbms_xmldom.DOMNode;

                 

                  nsmap     varchar2(2000) := 'xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns2="***"';

                  xpath_pfx varchar2(2000) := '/soap:Envelope/soap:Body/ns2:GetDocumentByIDResponse/';

                 

                  istream   sys.utl_characterinputstream;

                  buf       varchar2(32767);

                  numRead   pls_integer := 1;

                 

                  filename       varchar2(30);

                  base64clob     clob;

                 

                BEGIN

                 

                  domDoc := dbms_xmldom.newDOMDocument(v_xml);

                  docNode := dbms_xmldom.makeNode(domdoc);

                 

                  filename := dbms_xslprocessor.valueOf(

                                docNode

                              , xpath_pfx || 'ArchivedDocument/ArchivedDocumentDescription/RepresentationDescription/DocumentPartDescription/@fileName'

                              , nsmap

                              ) ;

                             

                  node := dbms_xslprocessor.selectSingleNode(

                            docNode

                          , xpath_pfx || 'ArchivedDocument/DocumentPart/Data/BinaryData/text()'

                          , nsmap

                          ) ;

                 

                  --create an input stream to read the node content :

                  istream := dbms_xmldom.getNodeValueAsCharacterStream(node);

                  dbms_lob.createtemporary(base64clob, false);

                 

                  -- read the content in 32k chunk and append data to the CLOB :

                  loop

                    istream.read(buf, numRead);

                    exit when numRead = 0;

                    dbms_lob.writeappend(base64clob, numRead, buf);

                  end loop;

                 

                  -- free resources :

                  istream.close();

                  dbms_xmldom.freeDocument(domDoc);

                 

                END;

                /

                 

                • Using a temporary XMLType storage (binary XML) :

                 

                create table tmp_xml of xmltype

                xmltype store as securefile binary xml;

                 

                insert into tmp_xml values( v_xml );

                 

                select x.*

                from tmp_xml t

                   , xmltable(

                       xmlnamespaces(

                         'http://schemas.xmlsoap.org/soap/envelope/' as "soap"

                       , '***' as "ns2"

                       )

                     , '/soap:Envelope/soap:Body/ns2:GetDocumentByIDResponse/ArchivedDocument/DocumentPart/Data'

                       passing t.object_value

                       columns filename    varchar2(30) path '@fileName'

                             , base64clob  clob         path 'BinaryData'

                     ) x

                ;