9 Replies Latest reply: Oct 10, 2013 3:03 PM by 878475 RSS

    extracting data from CLOB's using Materialized views

    878475

      Hello,

      We have xml data in clob's which I have a requirement to extract out ( ~50 attributes ) on a daily basis , So we decided to use materialized views with complete refreshes ( open for suggestions though )

       

      A small snippet of the code

       

      CREATE MATERIALIZED VIEW MWMRPT.TASK_INBOUND

      BUILD IMMEDIATE

      REFRESH COMPLETE ON DEMAND

      WITH ROWID

      AS

      SELECT   M.TASK_ID, M.BO_STATUS_CD, b.*

        FROM   CISADM.M1_TASK m,

               XMLTABLE ('/a' PASSING XMLPARSE (

                         CONTENT '<a>' || M.BO_DATA_AREA || '</a>'

                         ) COLUMNS

                         serviceDeliverySiteId varchar2 (15) PATH

            'cmPCGeneralInfo/serviceDeliverySiteId',

      serviceSequenceId varchar2 (3) PATH 'cmPCGeneralInfo/serviceSequenceId',

      completedByAssignmentId varchar2 (50) PATH 'completedByAssignmentId',

      CUST_ID varchar2 (10) PATH 'cmPCCustomerInformation/customerId',

      ACCT_SEQ varchar2 (5) PATH 'customerInformation/accountId',

      AGRMT_SEQ varchar2 (5) PATH 'cmPCCustomerAgreement/agreementId',

      COLL_SEQ varchar2 (5) PATH 'cmPCGeneralInfo/accountCollectionSeq',

      REVENUE_CLASS varchar2 (10) PATH 'cmPCCustomerAgreement/revenueClassCode',

      REQUESTED_BY varchar2 (50) PATH 'customerInformation/contactName',....~50 attributes

       

       

      this  ddl ran > 20hrs and no materialized view created. there are some limitations that we have

      • cannot create a materialized view log
      • cannot alter the source table as its vendor defined
      • cannot do an ETL

       

      DB is 11g R2

       

      Any thoughts/suggestions are highly appreciated

        • 1. Re: extracting data from CLOB's using Materialized views
          odie_63

          Hi,

           

          How large are your CLOBs ?

          How many rows in the base table ?

           

          Does the SELECT alone run fine ? Post an explain plan please.

           

          And most likely you don't need to wrap the content into additional <a> elements.

          I don't know if it will solve the main problem but you can simplify the query like this :

          XMLTABLE ('.'

            PASSING XMLPARSE ( CONTENT M.BO_DATA_AREA )

            COLUMNS

              serviceDeliverySiteId varchar2 (15) PATH '/cmPCGeneralInfo/serviceDeliverySiteId',

              serviceSequenceId     varchar2 (3)  PATH '/cmPCGeneralInfo/serviceSequenceId',

              ...

          )

          • 2. Re: extracting data from CLOB's using Materialized views
            878475

            thanks for responding Odie_63.. here are responses and hope it helps

             

            # rows in the table ~15K

            Size of Clob ~4K average

             

            select runs fine and below is the plan

             

            Plan

            SELECT STATEMENT  ALL_ROWSCost: 103  Bytes: 1,010,413  Cardinality: 5,129 

              1 TABLE ACCESS FULL TABLE CISADM.M1_TASK Cost: 103  Bytes: 1,010,413  Cardinality: 5,129 

             

            the reason why i do the content and wrap the <a> element is because , the xml element inside the clob does not have the root node so in order to use XPATH , i need it .

             

            Again thanks for looking

            • 3. Re: extracting data from CLOB's using Materialized views
              odie_63

              the reason why i do the content and wrap the <a> element is because , the xml element inside the clob does not have the root node so in order to use XPATH , i need it .

              You don't need it. See my example.

               

              And are you sure you've posted the complete plan ? I don't see the portion related to the XMLTABLE evaluation.

              • 4. Re: extracting data from CLOB's using Materialized views
                878475

                Sorry posted the incorrect plan ...

                 

                Plan

                SELECT STATEMENT  ALL_ROWSCost: 15,002  Bytes: 854,160,977  Cardinality: 4,472,047 

                  3 NESTED LOOPS  Cost: 15,002  Bytes: 854,160,977  Cardinality: 4,472,047 

                  1 TABLE ACCESS FULL TABLE CISADM.M1_TASK Cost: 103  Bytes: 103,572  Cardinality: 548 

                  2 COLLECTION ITERATOR PICKLER FETCH PROCEDURE SYS.XQSEQUENCEFROMXMLTYPE Cost: 27  Bytes: 16,336  Cardinality: 8,168 

                • 5. Re: extracting data from CLOB's using Materialized views
                  878475

                  XMLTABLE ('.'

                    PASSING XMLPARSE ( CONTENT M.BO_DATA_AREA )

                    COLUMNS

                      serviceDeliverySiteId varchar2 (15) PATH '/cmPCGeneralInfo/serviceDeliverySiteId',

                      serviceSequenceId     varchar2 (3)  PATH '/cmPCGeneralInfo/serviceSequenceId',

                      ...

                  )

                  without the element <a> worked , but no change in performance...Thanks though

                  • 6. Re: extracting data from CLOB's using Materialized views
                    Jason_(A_Non)

                    What about creating a new table defined as XMLTYPE using storage method SECUREFILE BINARY XML (default in 11.2.0.2 and later) and writing code to truncate that table and then simply

                    INSERT INTO new_table (task_id, BO_STATUS_CD, BO_DATA_AREA)
                    SELECT task_id, BO_STATUS_CD, BO_DATA_AREA
                    FROM CISADM.M1_TASK
                    

                    and then creating a regular view on top of that?  You could do a materialized view, but between the performance boost you will see using SECUREFILE BINARY XML, you may not need the materialized view.  You could also create a structured/unstructured index into the XML as well if needed to avoid using a materialized view.

                     

                    Just a thought and question to see what options are open for working around the limitations for dealing with XML stored in a CLOB.

                    • 7. Re: extracting data from CLOB's using Materialized views
                      odie_63

                      I've been exploring a similar approach, using the following test case.

                      It creates a table "MASTER_TABLE" containing 20,000 rows, and a CLOB containing an XML fragment like this :

                      <ThisIsElement1>09HOLVUF3T6VX5QUN8UBV9BRW3FHRB9JFO4TSV79R6J87QWVGN</ThisIsElement1>

                      <ThisIsElement2>UUL47WDW6C63YIIBOP1X4FEEJ2Z7NCR9BDFHGSLA5YZ5SAH8Y8</ThisIsElement2>

                      <ThisIsElement3>O1BU1EXLBU945HQLLFB3LUO03XPWMHBN8Y7SO8YRCQXRSWKKL4</ThisIsElement3>

                      ...

                      <ThisIsElement49>1HT88050QIGOPGUHGS9RKK54YP7W6OOI6NXVM107GM47R5LUNC</ThisIsElement49>

                      <ThisIsElement50>9FJ1JZ615EOUIX6EKBIVOWFDYCPQZM2HBQQ8HDP3ABVJ5N1OJA</ThisIsElement50>

                      then an intermediate table "MASTER_TABLE_XML" with the same columns except for the CLOB which is converted to XMLType, and finally a MVIEW :

                      SQL> create table master_table as

                        2  select level as id

                        3       , cast('ROW'||to_char(level) as varchar2(30)) as name

                        4       , (

                        5           select xmlserialize(content

                        6                    xmlagg(

                        7                      xmlelement(evalname('ThisIsElement'||to_char(level)), dbms_random.string('X',50))

                        8                    )

                        9                    as clob indent

                      10                  )

                      11           from dual

                      12           connect by level <= 50

                      13         ) as xmlcontent

                      14  from dual

                      15  connect by level <= 20000 ;

                       

                      Table created.

                       

                      SQL> call dbms_stats.gather_table_stats(user, 'MASTER_TABLE');

                       

                      Call completed.

                       

                      SQL> create table master_table_xml (

                        2    id         number

                        3  , name       varchar2(30)

                        4  , xmlcontent xmltype

                        5  )

                        6  xmltype column xmlcontent store as securefile binary xml

                        7  ;

                       

                      Table created.

                       

                      SQL> create materialized view master_table_mv

                        2  build deferred

                        3  refresh complete on demand

                        4  as

                        5  select t.id

                        6       , t.name

                        7       , x.*

                        8  from master_table_xml t

                        9     , xmltable('/r' passing t.xmlcontent

                      10         columns

                      11           ThisIsElement1  varchar2(50) path 'ThisIsElement1'

                      12         , ThisIsElement2  varchar2(50) path 'ThisIsElement2'

                      13         , ThisIsElement3  varchar2(50) path 'ThisIsElement3'

                      14         , ThisIsElement4  varchar2(50) path 'ThisIsElement4'

                      15         , ThisIsElement5  varchar2(50) path 'ThisIsElement5'

                      16         , ThisIsElement6  varchar2(50) path 'ThisIsElement6'

                      17         , ThisIsElement7  varchar2(50) path 'ThisIsElement7'

                      18         , ThisIsElement8  varchar2(50) path 'ThisIsElement8'

                      19         , ThisIsElement9  varchar2(50) path 'ThisIsElement9'

                      20         , ThisIsElement10 varchar2(50) path 'ThisIsElement10'

                      21         , ThisIsElement11 varchar2(50) path 'ThisIsElement11'

                      22         , ThisIsElement12 varchar2(50) path 'ThisIsElement12'

                      23         , ThisIsElement13 varchar2(50) path 'ThisIsElement13'

                      24         , ThisIsElement14 varchar2(50) path 'ThisIsElement14'

                      25         , ThisIsElement15 varchar2(50) path 'ThisIsElement15'

                      26         , ThisIsElement16 varchar2(50) path 'ThisIsElement16'

                      27         , ThisIsElement17 varchar2(50) path 'ThisIsElement17'

                      28         , ThisIsElement18 varchar2(50) path 'ThisIsElement18'

                      29         , ThisIsElement19 varchar2(50) path 'ThisIsElement19'

                      30         , ThisIsElement20 varchar2(50) path 'ThisIsElement20'

                      31         , ThisIsElement21 varchar2(50) path 'ThisIsElement21'

                      32         , ThisIsElement22 varchar2(50) path 'ThisIsElement22'

                      33         , ThisIsElement23 varchar2(50) path 'ThisIsElement23'

                      34         , ThisIsElement24 varchar2(50) path 'ThisIsElement24'

                      35         , ThisIsElement25 varchar2(50) path 'ThisIsElement25'

                      36         , ThisIsElement26 varchar2(50) path 'ThisIsElement26'

                      37         , ThisIsElement27 varchar2(50) path 'ThisIsElement27'

                      38         , ThisIsElement28 varchar2(50) path 'ThisIsElement28'

                      39         , ThisIsElement29 varchar2(50) path 'ThisIsElement29'

                      40         , ThisIsElement30 varchar2(50) path 'ThisIsElement30'

                      41         , ThisIsElement31 varchar2(50) path 'ThisIsElement31'

                      42         , ThisIsElement32 varchar2(50) path 'ThisIsElement32'

                      43         , ThisIsElement33 varchar2(50) path 'ThisIsElement33'

                      44         , ThisIsElement34 varchar2(50) path 'ThisIsElement34'

                      45         , ThisIsElement35 varchar2(50) path 'ThisIsElement35'

                      46         , ThisIsElement36 varchar2(50) path 'ThisIsElement36'

                      47         , ThisIsElement37 varchar2(50) path 'ThisIsElement37'

                      48         , ThisIsElement38 varchar2(50) path 'ThisIsElement38'

                      49         , ThisIsElement39 varchar2(50) path 'ThisIsElement39'

                      50         , ThisIsElement40 varchar2(50) path 'ThisIsElement40'

                      51         , ThisIsElement41 varchar2(50) path 'ThisIsElement41'

                      52         , ThisIsElement42 varchar2(50) path 'ThisIsElement42'

                      53         , ThisIsElement43 varchar2(50) path 'ThisIsElement43'

                      54         , ThisIsElement44 varchar2(50) path 'ThisIsElement44'

                      55         , ThisIsElement45 varchar2(50) path 'ThisIsElement45'

                      56         , ThisIsElement46 varchar2(50) path 'ThisIsElement46'

                      57         , ThisIsElement47 varchar2(50) path 'ThisIsElement47'

                      58         , ThisIsElement48 varchar2(50) path 'ThisIsElement48'

                      59         , ThisIsElement49 varchar2(50) path 'ThisIsElement49'

                      60         , ThisIsElement50 varchar2(50) path 'ThisIsElement50'

                      61  ) x ;

                       

                      Materialized view created.

                       

                       

                      The refresh is then performed in two steps :

                      1. INSERT INTO master_table_xml
                      2. MVIEW refresh

                      (note : since we insert into an XMLType column, we do need an XML document (single-rooted) this time)

                      SQL> set timing on

                      SQL>

                      SQL> truncate table master_table_xml;

                       

                      Table truncated.

                       

                      Elapsed: 00:00:00.27

                      SQL>

                      SQL> insert into master_table_xml

                        2  select id

                        3       , name

                        4       , xmlparse(document '<r>'||xmlcontent||'</r>')

                        5  from master_table ;

                       

                      20000 rows created.

                       

                      Elapsed: 00:04:38.72

                      SQL>

                      SQL> call dbms_mview.refresh('MASTER_TABLE_MV');

                       

                      Call completed.

                       

                      Elapsed: 00:00:22.42

                      SQL>

                      SQL> select count(*) from master_table_mv;

                       

                        COUNT(*)

                      ----------

                           20000

                       

                      Elapsed: 00:00:01.38

                      SQL> truncate table master_table_xml;

                       

                      Table truncated.

                       

                      Elapsed: 00:00:00.41

                       

                      • 8. Re: extracting data from CLOB's using Materialized views
                        878475

                        Thanks Jason and Odie_63... I tried an normal secure file and it did not quite help ... trying the approach as a securefile with Binary xml

                        • 9. Re: extracting data from CLOB's using Materialized views
                          878475

                          secure file with binary xml seems the best thanks guys