2 Replies Latest reply: Jul 22, 2013 8:52 AM by Marwim RSS

    Performance problem with XMLTABLE

    Marwim

      Management Summary

       

      • Users load XML-files into a table
      • Programs reads data via a View using XMLTABLE
      • Performace issues with with 20.000 records within one file, while expecting up to 500.000

       

      Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

      With the Partitioning, OLAP, Data Mining and Real Application Testing options

       

      Hello,

       

      I need to process transaction informations for SEPA (http://en.wikipedia.org/wiki/Single_Euro_Payments_Area)

      My program works for smaller transaction files, but users complain about performance when there are several thousands of transfers within a file.

       

      Users upload files into a table

      CREATE TABLE pfile_xml (
           px_dateiname       VARCHAR2(2000)
          ,px_namespace       VARCHAR2(200)
          ,px_data            XMLTYPE --Binary XML storage
          )
      
      

       

      Since there is more than one XML-schema involved, the namespace is extracted and stored in a column to make my life easier.

      Usually there are not more than 10 records in this table.

      Programs access the XML via different views, in this case it is defined as

       

      CREATE OR REPLACE VIEW v_load_sepa_auszahlung_txinf AS
      SELECT  pfile_xml.px_dateiname
             ,paymentInformation.PaymentIdentifier
             ,transactionInformation.EndToEndId
             ,transactionInformation.PaymentPurpose
             ,transactionInformation.Currency
             ,transactionInformation.Amount
             ,transactionInformation.CreditorAgentBIC
             ,transactionInformation.CreditorName
             ,transactionInformation.CreditorAccountIBAN
             ,transactionInformation.RemittanceInformationU
             ,transactionInformation.TrfTxInf
      FROM    pfile_xml
             ,XMLTABLE(
                  XMLNAMESPACES(DEFAULT 'urn:iso:std:iso:20022:tech:xsd:pain.001.002.03'),
                  '/Document/CstmrCdtTrfInitn'
                  PASSING pfile_xml.px_data
                  COLUMNS PaymentIdentifier       VARCHAR2(35)    PATH 'PmtInf/PmtInfId'
                  ) paymentInformation
             ,XMLTABLE(
                  XMLNAMESPACES(DEFAULT 'urn:iso:std:iso:20022:tech:xsd:pain.001.002.03'),
                  'Document/CstmrCdtTrfInitn/PmtInf/CdtTrfTxInf'
                  PASSING pfile_xml.px_data
                  COLUMNS EndToEndId              VARCHAR2(35)    PATH 'PmtId/EndToEndId'
                         ,PaymentPurpose          VARCHAR2(5)     PATH 'PmtTpInf/CtgyPurp/Cd'
                         ,Currency                VARCHAR2(3)     PATH 'Amt/InstdAmt/@Ccy'
                         ,Amount                  NUMBER          PATH 'Amt/InstdAmt'
                         ,CreditorAgentBIC        VARCHAR2(11)    PATH 'CdtrAgt/FinInstnId/BIC'
                         ,CreditorName            VARCHAR2(70)    PATH 'Cdtr/Nm'
                         ,CreditorAccountIBAN     VARCHAR2(35)    PATH 'CdtrAcct/Id/IBAN'
                         ,RemittanceInformationU  VARCHAR2(140)   PATH 'RmtInf/Ustrd'--Unstructured
                         ,TrfTxInf                XMLTYPE         PATH '/'
                  ) transactionInformation;
      


      My program uses a cursor

      SELECT  *
      FROM    v_load_sepa_auszahlung_txinf
      WHERE   px_dateiname = ...;
      

       

      Explain plan

       

      SQL_ID  b3a24kq8dwpp3, child number 0

      -------------------------------------

      SELECT  /*+ gather_plan_statistics */ * FROM   

      v_load_sepa_auszahlung_txinf WHERE   px_dateiname =

      'H:\zv_clearing\zvc\g\zv2\sepa-eur-SRA_A_BKxml_RE_1_test'

       

      Plan hash value: 351014842

       

      ----------------------------------------------------------------------------------------------------------------------------------------------

      | Id  | Operation                     | Name      | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time   | A-Rows |   A-Time   | Buffers | Reads  |

      ----------------------------------------------------------------------------------------------------------------------------------------------

      |   0 | SELECT STATEMENT              |           |      1 |        |       | 89650 (100)|          |  21702 |00:24:10.80 |   21719 |     22M|

      |   1 |  NESTED LOOPS                 |           |      1 |     66M|   187G| 89650   (5)| 00:05:11 |  21702 |00:24:10.80 |   21719 |     22M|

      |   2 |   NESTED LOOPS                |           |      1 |   8168 |    23M|    14   (8)| 00:00:01 |  21702 |00:00:01.99 |      10 |   3081 |

      |   3 |    TABLE ACCESS BY INDEX ROWID| PFILE_XML |      1 |      1 |  3004 |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       2 |      0 |

      |*  4 |     INDEX UNIQUE SCAN         | PX_PK     |      1 |      1 |       |     1   (0)| 00:00:01 |      1 |00:00:00.01 |       1 |      0 |

      |   5 |    XPATH EVALUATION           |           |      1 |        |       |            |          |  21702 |00:00:01.97 |       8 |   3081 |

      |   6 |   XPATH EVALUATION            |           |  21702 |        |       |            |          |  21702 |00:24:08.68 |   21709 |     22M|

      ----------------------------------------------------------------------------------------------------------------------------------------------

       

      Query Block Name / Object Alias (identified by operation id):

      -------------------------------------------------------------

       

         1 - SEL$769200B6

         3 - SEL$769200B6 / PFILE_XML@SEL$2

         4 - SEL$769200B6 / PFILE_XML@SEL$2

       

      Outline Data

      -------------

       

        /*+

        BEGIN_OUTLINE_DATA
        IGNORE_OPTIM_EMBEDDED_HINTS
        OPTIMIZER_FEATURES_ENABLE('11.2.0.2')
        DB_VERSION('11.2.0.2')
        OPT_PARAM('_optim_peek_user_binds' 'false')
        OPT_PARAM('_optimizer_join_elimination_enabled' 'false')
        OPT_PARAM('optimizer_index_cost_adj' 50)
        OPT_PARAM('optimizer_index_caching' 90)
        ALL_ROWS
        FORCE_XML_QUERY_REWRITE
        XML_DML_RWT_STMT
        XMLINDEX_REWRITE
        XMLINDEX_REWRITE_IN_SELECT
        NO_COST_XML_QUERY_REWRITE
        OUTLINE_LEAF(@"SEL$769200B6")
        MERGE(@"SEL$499A68DA")
        MERGE(@"SEL$847EB518")
        OUTLINE(@"SEL$94CAFA11")
        MERGE(@"SEL$3")
        MERGE(@"SEL$4")
        OUTLINE(@"SEL$499A68DA")
        OUTLINE(@"SEL$847EB518")
        OUTLINE(@"SEL$DA52E8C8")
        MERGE(@"SEL$960CCFB6")
        OUTLINE(@"SEL$3")
        OUTLINE(@"SEL$4")
        OUTLINE(@"SEL$750E64C1")
        OUTLINE(@"SEL$9B0FF836")
        OUTLINE(@"SEL$1")
        OUTLINE(@"SEL$960CCFB6")
        MERGE(@"SEL$56C2D102")
        MERGE(@"SEL$CDCAF675")
        OUTLINE(@"SEL$2")
        OUTLINE(@"SEL$56C2D102")
        OUTLINE(@"SEL$CDCAF675")
        OUTLINE(@"SEL$FF3F23C7")
        OUTLINE(@"SEL$A6CC16C1")
        INDEX_RS_ASC(@"SEL$769200B6" "PFILE_XML"@"SEL$2" ("PFILE_XML"."PX_DATEINAME"))
        FULL(@"SEL$769200B6" "P1"@"SEL$9B0FF836")
        FULL(@"SEL$769200B6" "P"@"SEL$750E64C1")
        LEADING(@"SEL$769200B6" "PFILE_XML"@"SEL$2" "P1"@"SEL$9B0FF836" "P"@"SEL$750E64C1")
        USE_NL(@"SEL$769200B6" "P1"@"SEL$9B0FF836")
        USE_NL(@"SEL$769200B6" "P"@"SEL$750E64C1")
        END_OUTLINE_DATA

        */

       

      Predicate Information (identified by operation id):

      ---------------------------------------------------

       

         4 - access("PFILE_XML"."PX_DATEINAME"='H:\zv_clearing\zvc\g\zv2\sepa-eur-SRA_A_BKxml_RE_1_test')

       

      Column Projection Information (identified by operation id):

      -----------------------------------------------------------

       

         1 - "PFILE_XML".ROWID[ROWID,10], "PFILE_XML"."PX_DATEINAME"[VARCHAR2,2000], "PFILE_XML"."SYS_NC00005$"[LOB,4000], VALUE(A0)[64],

             VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001],

             VALUE(A0)[4001], VALUE(A0)[4001]

         2 - "PFILE_XML".ROWID[ROWID,10], "PFILE_XML"."PX_DATEINAME"[VARCHAR2,2000], "PFILE_XML"."SYS_NC00005$"[LOB,4000], VALUE(A0)[64],

             VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001]

         3 - "PFILE_XML".ROWID[ROWID,10], "PFILE_XML"."PX_DATEINAME"[VARCHAR2,2000], "PFILE_XML"."SYS_NC00005$"[LOB,4000]

         4 - "PFILE_XML".ROWID[ROWID,10], "PFILE_XML"."PX_DATEINAME"[VARCHAR2,2000]

         5 - VALUE(A0)[64], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001], VALUE(A0)[4001],

             VALUE(A0)[4001], VALUE(A0)[4001]

         6 - VALUE(A0)[4001]

       

       

      It takes 24 minutes to read 21.702 transactions. In production we expect files with up to 500.000 transactions. The data is read only once, stored in relational table and then the row in pfile_xml is deleted.

       

      Any idea how I can speed up the processing?

       

       

      Regards

      Marcus

        • 1. Re: Performance problem with XMLTABLE
          odie_63

          Hi Marcus,

           

          The data is read only once, stored in relational table and then the row in pfile_xml is deleted.

          When I saw the view, I immediately thought about creating a structured XMLIndex.

          But given the above piece of information, I guess the time spent to load the internal structures of the index will equal the time to run the query alone, resulting in no apparent gain.

           

          Do you have some test files you can share?

           

          I remember some of your older posts where you were using structured storage (via an XML schema). Did you try that as well?

           

           

          Also,

          ,TrfTxInf                XMLTYPE         PATH '/'

          Is that part really necessary ?


          You can also try to correlate the two XMLTable scans instead of reading both independently from the root node.

          Something along those lines :

          FROM pfile_xml 

             , XMLTABLE( 

                 XMLNAMESPACES(DEFAULT 'urn:iso:std:iso:20022:tech:xsd:pain.001.002.03'), 

                   '/Document/CstmrCdtTrfInitn/PmtInf' 

                   PASSING pfile_xml.px_data 

                   COLUMNS PaymentIdentifier       VARCHAR2(35)    PATH 'PmtInfId'

                         , TrfTxInf                XMLType         PATH 'CdtTrfTxInf'

               ) paymentInformation 

             , XMLTABLE( 

                 XMLNAMESPACES(DEFAULT 'urn:iso:std:iso:20022:tech:xsd:pain.001.002.03'), 

                   '/CdtTrfTxInf' 

                   PASSING paymentInformation.TrfTxInf 

                   COLUMNS EndToEndId              VARCHAR2(35)    PATH 'PmtId/EndToEndId' 

                          ,PaymentPurpose          VARCHAR2(5)     PATH 'PmtTpInf/CtgyPurp/Cd' 

                          ,Currency                VARCHAR2(3)     PATH 'Amt/InstdAmt/@Ccy' 

                          ,Amount                  NUMBER          PATH 'Amt/InstdAmt' 

                          ,CreditorAgentBIC        VARCHAR2(11)    PATH 'CdtrAgt/FinInstnId/BIC' 

                          ,CreditorName            VARCHAR2(70)    PATH 'Cdtr/Nm' 

                          ,CreditorAccountIBAN     VARCHAR2(35)    PATH 'CdtrAcct/Id/IBAN' 

                          ,RemittanceInformationU  VARCHAR2(140)   PATH 'RmtInf/Ustrd'

                ) transactionInformation

           

           

          Message was edited by: odie_63

          • 2. Re: Performance problem with XMLTABLE
            Marwim
            ,TrfTxInf                XMLTYPE         PATH '/'

            Is that part really necessary ?

            Yes, the system is something like a "Clearing Agency". It receives transactions from inhouse applications and distributes them to the appropriate banks and vice versa. The TrfTxInf fragments will not be touched but only rebundled into other files.

             

            You can also try to correlate the two XMLTable scans instead of reading both independently from the root node.

             

            This sentence made me think about the prerequisites. One document mentioned that, while according to the XSD the node PmtInf can appear multiple times, I can safely assume, there will be only one PmtInf node per document.So I don't need the first XMLTABLE statement in my view. This reduces the runtime of the select to less than one minute. I will test whether this is enough to satisfy the users.

             

            Sometimes it is not a technical problem, just a lack of gray matter usage

             

            Regards

            Marcus