2 Replies Latest reply: Dec 13, 2013 8:33 PM by 978225 RSS

    Xml processing approach for DWH environment

    978225

      Hello all,

       

      I have a performance problem with my 3-level XML document processing approach and am trying to figure out if I really just need to find a way to optimize, or if the approach is wrong from the get-go.

       

      I am in a DWH scenario on 11.2.0.2.

       

      I have XMLs with a 3-level hierarchical structure coming in from the source system:

      • Event (currently about 600,000 documents)
        • Action   (avg 50 per event)
          • Param      (avg 2 per Action)

       

      The frontend (QlikView) is supposed to display the data that is contained in the XML.

      So now I am trying to get the values out of the XML documents into 3 relational tables (one per document level) for QlikView to work with.

       

      I created a table with an schema-based object-relational XmlType column that I loaded the XMLs into. This worked ok, the approximate runtime was 7 hours. Then I use nested XMLTABLE constructs to extract the data out of the XMLTYPE column into the 3 normalized relational target tables.

      For the initial load, I am using an INSERT /*+APPEND*/ INTO SELECT... statement.

      For the incremental load later on, I am hoping to be able to use a MERGE statement (performance permitting).

       

      The initial loads for the first two levels are going fine, with runtimes of about 30 seconds for the 1st level and about a half hour for the 2nd level.

      However, loading the 3rd level into the target table, the DB runs out of ORATEMP or ORAUNDO space (sometimes one, sometimes the other).

      So I tried splitting it up by looping over the numeric PK of the XML table, essentially trying to process only 1000 records at a time, but the estimated query cost is still almost the same, none of the B*-tree indexes I created on the OCT tables are being used, and I am still crashing because of insufficient ORATEMP/ORAUNDO.

       

      The next thing I am going to try is using table partitioning to partition the table with the XMLTYPE column (and thus automatically also the OCT tables) into one partition per day (the data collected so far goes 3 months back, so I figure partitioning by month will still leave me with to big of a partition).

       

      But before I go into the details of the 3rd level extraction query and trying to find an index to improve the nested XMLTABLE joins on the OCT tables, I would like to evaluate whether this whole approach is maybe flawed??

       

      Any comments or ideas?

       

      Thank you in advance...

       

       

      P.S.: I know making ORATEMP/ORAUNDO "big enough" (whatever that means) would work, and I put in a request with the admin, but this takes a few weeks (seriously... :-( ) and it sounds like a brute-force approach