3 Replies Latest reply on May 13, 2019 12:51 PM by Feisty_Duck

    xml data to oracle table

    Feisty_Duck

      Hi all,

       

      I have a problem and I was hoping someone could advise on appropriate action

      I have this very complex XML schema and an XML document that i have to import into the database

      and then parse the document and insert the data in different tables. (XML document size >130 MB )

      The problem is the process is taking quite long ( 1,5 h ) and I am wondering if I have done all I can to make it as fast as possible.

       

      My database version is 11.2.0.4.

       

      What I have tried so far:

       

      First I registered my schema:

       

      BEGIN
        DBMS_XMLSCHEMA.registerSchema(
          SCHEMAURL => 'camt.053.001.02.xsd',
          SCHEMADOC => bfilename('DB_FOLDER','camt.053.001.02.xsd'),
          CSID => nls_charset_id('AL32UTF8'),
          LOCAL            => TRUE,
          GENTYPES          => FALSE,
          GENBEAN          => FALSE,
          GENTABLES          => FALSE,
          FORCE            => FALSE,
          OPTIONS          => DBMS_XMLSCHEMA.REGISTER_BINARYXML,
          OWNER=> USER);
      END;
      

       

      If I try to put GENTYPES =>TRUE and GENTABLES=>TRUE then I get the ORA-01792 error.

       

       

      Then created a table with XMLtype column:

       

      CREATE TABLE ROBI_XML_DATA
          (id number, BAG xmltype)
               xmltype COLUMN BAG STORE AS BINARY XML
            XMLSCHEMA "camt.053.001.02.xsd" ELEMENT "Document";
      

       

      I then inserted the XML document:

       

      INSERT INTO ROBI_XML_DATA (id, BAG) 
      VALUES (  1,
      xmltype( 
          bfilename('DATABASE_DIR', 'example.xml') 
        , nls_charset_id('AL32UTF8')   
        ) 
      );
      

       

      With the query below I then go and parse the XML to get the data:

       

           SELECT /*+ all_rows */
                 testt ,
                             test2,
                             test3,
                             test4,
                             test5,
                             test6,
                              test7,
                               test8,
                                test9
            FROM ROBI_XML_DATA  x,
                 XMLTABLE (
                     XMLNAMESPACES (
                         DEFAULT 'urn:iso:std:iso:20022:tech:xsd:camt.053.001.02'),
                     '/Document/BkToCstmrStmt/Stmt/Ntry'
                     PASSING x.bag
                     COLUMNS testt VARCHAR2 (100) PATH 'Amt/@Ccy',
                             test2 VARCHAR2 (100) PATH 'CdtDbtInd',
                             test3 VARCHAR2 (100) PATH 'BookgDt/Dt',
                             test4 VARCHAR2 (100) PATH 'ValDt/Dt',
                             test5 VARCHAR2 (100) PATH 'AcctSvcrRef',
                             test6 varchar2(100)  PATH 'NtryDtls/TxDtls/Refs/InstrId',
                              test7 varchar2(100)  PATH 'NtryDtls/TxDtls/Refs/EndToEndId',
                               test8 varchar2(100)  PATH 'NtryDtls/TxDtls/Refs/TxId',
                                test9 varchar2(100)  PATH 'NtryDtls/TxDtls/RltdPties/DbtrAcct/Id/IBAN'
                                                                    ) t
           WHERE x.id = 1;
      

       

      I have also tried to create some XMLindexes to speed things up but it did not make any difference:

       

      CREATE INDEX XIDX_BAG_01
          ON ROBI_XML_DATA(BAG)
          INDEXTYPE IS XDB.XMLINDEX
          PARAMETERS ('PATHS (INCLUDE (/Document/BkToCstmrStmt/Stmt/Ntry//*))'
                   )
                   ;
      

       

      I created an index with specific paths in XML that I query, but then the indexing took about an hour and a half so there was no use in that:

       

      CREATE INDEX XIDX_BAG_01
          ON ROBI_XML_DATA(BAG)
          INDEXTYPE IS XDB.XMLINDEX
          PARAMETERS ('GROUP BAG_VIEW_GROUP
                           XMLTABLE CONTENT_BAG_SXI_TABLE
                           XMLNAMESPACES(default ''urn:iso:std:iso:20022:tech:xsd:camt.053.001.02'')
                          , ''/Document/BkToCstmrStmt/Stmt/Ntry''
                       COLUMNS testt VARCHAR2(100) PATH ''Amt'',
                        test2 VARCHAR2(100) PATH ''CdtDbtInd'',
                        test3 VARCHAR2(100) PATH ''BookgDt/Dt'',
                        test4 VARCHAR2(100) PATH ''ValDt/Dt'',
                        test5 VARCHAR2(100) PATH ''AcctSvcrRef''
                   '
                   )
                   ;
      

       

      The partial example of the XML document is as follows (there are multiple Ntry tags in my case -  >120.000 ):

       

      <?xml version="1.0" encoding="UTF-8"?>
      <Document xmlns="urn:iso:std:iso:20022:tech:xsd:camt.053.001.02">
      <BkToCstmrStmt>
      <GrpHdr>
      <MsgId>TRK18327680</MsgId>
      <CreDtTm>2019-04-18T07:14:38</CreDtTm>
      </GrpHdr>
      <Stmt>
      <Id>SI56XXXXXXXXX-EUR-20190417</Id>
      <LglSeqNb>74</LglSeqNb>
      <CreDtTm>2019-04-17T00:00:00</CreDtTm>
      <Acct>
      <Id>
      <IBAN>SI56XXXXXXXXX</IBAN>
      </Id>
      <Ccy>EUR</Ccy>
      <Ownr>
      <Nm>TEST</Nm>
      <PstlAdr>
      <Ctry>SI</Ctry>
      <AdrLine>TEST 3</AdrLine>
      <AdrLine>2319 Test</AdrLine>
      </PstlAdr>
      </Ownr>
      </Acct>
      <Bal>
      <Tp>
      <CdOrPrtry>
      <Cd>OPBD</Cd>
      </CdOrPrtry>
      </Tp>
      <Amt Ccy="EUR">531407.79</Amt>
      <CdtDbtInd>CRDT</CdtDbtInd>
      <Dt>
      <Dt>2019-04-17</Dt>
      </Dt>
      </Bal>
      <Bal>
      <Tp>
      <CdOrPrtry>
      <Cd>CLBD</Cd>
      </CdOrPrtry>
      </Tp>
      <Amt Ccy="EUR">535886.8</Amt>
      <CdtDbtInd>CRDT</CdtDbtInd>
      <Dt>
      <Dt>2019-04-17</Dt>
      </Dt>
      </Bal>
      <TxsSummry>
      <TtlCdtNtries>
      <NbOfNtries>134</NbOfNtries>
      <Sum>7402.64</Sum>
      </TtlCdtNtries>
      <TtlDbtNtries>
      <NbOfNtries>9</NbOfNtries>
      <Sum>2923.63</Sum>
      </TtlDbtNtries>
      </TxsSummry>
      <Ntry>
      <Amt Ccy="EUR">6.54</Amt>
      <CdtDbtInd>DBIT</CdtDbtInd>
      <Sts>BOOK</Sts>
      <BookgDt>
      <Dt>2019-04-17</Dt>
      </BookgDt>
      <ValDt>
      <Dt>2019-04-17</Dt>
      </ValDt>
      <AcctSvcrRef>A89110000560951</AcctSvcrRef>
      <BkTxCd>
      <Prtry>
      <Cd>NOTPROVIDED</Cd>
      </Prtry>
      </BkTxCd>
      <NtryDtls>
      <TxDtls>
      <Refs>
      <InstrId>GH54678303768543</InstrId>
      <EndToEndId>HG6845554870457847568</EndToEndId>
      <TxId>98756875496754fG</TxId>
      </Refs>
      <RltdPties>
      <Dbtr>
      <Nm>TEST</Nm>
      <PstlAdr>
      <Ctry>SI</Ctry>
      <AdrLine>TEST 3</AdrLine>
      <AdrLine>2319 Test</AdrLine>
      </PstlAdr>
      </Dbtr>
      <DbtrAcct>
      <Id>
      <IBAN>GH89508345795706</IBAN>
      </Id>
      </DbtrAcct>
      <Cdtr>
      <Nm>TEST.</Nm>
      <PstlAdr>
      <Ctry>SI</Ctry>
      <AdrLine>TEST 3</AdrLine>
      <AdrLine>2319 Test</AdrLine>
      </PstlAdr>
      </Cdtr>
      <CdtrAcct>
      <Id>
      <IBAN>GH89508345795706</IBAN>
      </Id>
      </CdtrAcct>
      </RltdPties>
      <RltdAgts>
      <DbtrAgt>
      <FinInstnId>
      <BIC>XXXTEST</BIC>
      </FinInstnId>
      </DbtrAgt>
      <CdtrAgt>
      <FinInstnId>
      <BIC>XXXTEST</BIC>
      </FinInstnId>
      </CdtrAgt>
      </RltdAgts>
      <Purp>
      <Cd>OTHR</Cd>
      </Purp>
      <RmtInf>
      <Strd>
      <CdtrRefInf>
      <Ref>GH4983-5867</Ref>
      </CdtrRefInf>
      <AddtlRmtInf>Naer tuieh</AddtlRmtInf>
      </Strd>
      </RmtInf>
      </TxDtls>
      </NtryDtls>
      </Ntry>
      </Stmt>
      </BkToCstmrStmt>
      </Document>
      
        • 1. Re: xml data to oracle table
          Jason_(A_Non)

          I don't have an answer but a couple of suggestions.  What's the execution plan for the SQL statement without the indexes existing?  A way to check is via

          SELECT * FROM table (

            DBMS_XPLAN.DISPLAY_CURSOR('atfwcg8anrykp', NULL, 'ALLSTATS LAST');

          using the sql_id for your statement.

           

          I would suggest also trying to use the hint NO_XML_QUERY_REWRITE to see if that makes a difference as well in both time and execution plan.

           

          The only advantage the index might give you is if this same data is read multiple times.  Reading the index might be faster than parsing the original XML each time.

          • 2. Re: xml data to oracle table
            Feisty_Duck

            Hi, thanks for the advice.

            It makes sense that an index will not make any difference here because I have to go through the whole document anyway. And I do not have to read it multiple times, just once

            For now, since I am not at work,  I can only tell you that the explain plan says it will do a full table scan and that's all it says actually.

            I will get back to you tomorrow with the execution plan and the results of the hint test...

            • 3. Re: xml data to oracle table
              Feisty_Duck

              So this is the execution plan for the query without the suggested hint. It executes fast but only for the first few results. If i want all the data then as I have said it takes about an hour and a half.

               

              SQL_ID  dkydp47hmv69k, child number 0
              -------------------------------------
              SELECT /*+ all_rows gather_plan_statistics*/            testt ,         
                             test2,                        test3,                     
                 test4,                        test5,-- createDT VARCHAR2(100) PATH 
              '/CreDtTm'                        test6,                        test7,  
                                    test8,                        test9       FROM 
              ROBI_XML_DATA  x,            XMLTABLE (                XMLNAMESPACES (  
                                DEFAULT 'urn:iso:std:iso:20022:tech:xsd:camt.053.001.0
              2'),                '/Document/BkToCstmrStmt/Stmt/Ntry'                
              PASSING x.bag                COLUMNS testt VARCHAR2 (100) PATH 
              'Amt/@Ccy',                        test2 VARCHAR2 (100) PATH 
              'CdtDbtInd',                        test3 VARCHAR2 (100) PATH 
              'BookgDt/Dt',                        test4 VARCHAR2 (100) PATH 
              'ValDt/Dt',                        test5 VARCHAR2 (100) PATH 
              'AcctSvcrRef',-- createDT VARCHAR2(100) PATH '/CreDtTm'                 
                     test6 varchar2(100)  PATH 'NtryDtls/TxDtl
              Plan hash value: 2138523162
              -------------------------------------------------------------------------------------------------------
              | Id  | Operation          | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
              -------------------------------------------------------------------------------------------------------
              |   0 | SELECT STATEMENT   |               |      1 |        |    501 |00:00:00.02 |     156 |      1 |
              |   1 |  NESTED LOOPS      |               |      1 |   8168 |    501 |00:00:00.02 |     156 |      1 |
              |*  2 |   TABLE ACCESS FULL| ROBI_XML_DATA |      1 |      1 |      1 |00:00:00.01 |       5 |      0 |
              |   3 |   XPATH EVALUATION |               |      1 |        |    501 |00:00:00.02 |     151 |      1 |
              -------------------------------------------------------------------------------------------------------
              Predicate Information (identified by operation id):
              ---------------------------------------------------
              
                 2 - filter("X"."ID"=1)
              

               

              The second plan is with the suggested hint. I think it's a bit faster, it takes about an hour to load all the results

               

              SQL_ID  78mh1yn62yjb4, child number 0
              -------------------------------------
              SELECT /*+ all_rows NO_XML_QUERY_REWRITE gather_plan_statistics*/       
                   testt ,                        test2,                        
              test3,                        test4,                        test5,-- 
              createDT VARCHAR2(100) PATH '/CreDtTm'                        test6,    
                                  test7,                        test8,                
                      test9       FROM ROBI_XML_DATA  x,            XMLTABLE (        
                      XMLNAMESPACES (                    DEFAULT 
              'urn:iso:std:iso:20022:tech:xsd:camt.053.001.02'),                
              '/Document/BkToCstmrStmt/Stmt/Ntry'                PASSING x.bag        
                      COLUMNS testt VARCHAR2 (100) PATH 'Amt/@Ccy',                   
                   test2 VARCHAR2 (100) PATH 'CdtDbtInd',                        
              test3 VARCHAR2 (100) PATH 'BookgDt/Dt',                        test4 
              VARCHAR2 (100) PATH 'ValDt/Dt',                        test5 VARCHAR2 
              (100) PATH 'AcctSvcrRef',-- createDT VARCHAR2(100) PATH '/CreDtTm'      
                                test6 varchar2(100)
              
              Plan hash value: 3965559185
              
              ----------------------------------------------------------------------------------------------------------
              | Id  | Operation             | Name          | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
              ----------------------------------------------------------------------------------------------------------
              |   0 | SELECT STATEMENT      |               |      1 |        |    121K|00:59:22.22 |   21362 |   7088 |
              |   1 |  NESTED LOOPS         |               |      1 |   8168 |    121K|00:59:22.22 |   21362 |   7088 |
              |*  2 |   TABLE ACCESS FULL   | ROBI_XML_DATA |      1 |      1 |      1 |00:00:00.02 |       6 |      3 |
              |   3 |   XMLTABLE EVALUATION |               |      1 |        |    121K|00:59:22.20 |   21356 |   7085 |
              ----------------------------------------------------------------------------------------------------------
              
              Predicate Information (identified by operation id):
              ---------------------------------------------------
              
                 2 - filter("X"."ID"=1)
              

               

              So thanks for your advice. I will mark your answer as a possible solution.