This discussion is archived
9 Replies Latest reply: Oct 10, 2013 1:03 PM by 878475 RSS

extracting data from CLOB's using Materialized views

878475 Newbie
Currently Being Moderated

Hello,

We have xml data in clob's which I have a requirement to extract out ( ~50 attributes ) on a daily basis , So we decided to use materialized views with complete refreshes ( open for suggestions though )

 

A small snippet of the code

 

CREATE MATERIALIZED VIEW MWMRPT.TASK_INBOUND

BUILD IMMEDIATE

REFRESH COMPLETE ON DEMAND

WITH ROWID

AS

SELECT   M.TASK_ID, M.BO_STATUS_CD, b.*

  FROM   CISADM.M1_TASK m,

         XMLTABLE ('/a' PASSING XMLPARSE (

                   CONTENT '<a>' || M.BO_DATA_AREA || '</a>'

                   ) COLUMNS

                   serviceDeliverySiteId varchar2 (15) PATH

      'cmPCGeneralInfo/serviceDeliverySiteId',

serviceSequenceId varchar2 (3) PATH 'cmPCGeneralInfo/serviceSequenceId',

completedByAssignmentId varchar2 (50) PATH 'completedByAssignmentId',

CUST_ID varchar2 (10) PATH 'cmPCCustomerInformation/customerId',

ACCT_SEQ varchar2 (5) PATH 'customerInformation/accountId',

AGRMT_SEQ varchar2 (5) PATH 'cmPCCustomerAgreement/agreementId',

COLL_SEQ varchar2 (5) PATH 'cmPCGeneralInfo/accountCollectionSeq',

REVENUE_CLASS varchar2 (10) PATH 'cmPCCustomerAgreement/revenueClassCode',

REQUESTED_BY varchar2 (50) PATH 'customerInformation/contactName',....~50 attributes

 

 

this  ddl ran > 20hrs and no materialized view created. there are some limitations that we have

  • cannot create a materialized view log
  • cannot alter the source table as its vendor defined
  • cannot do an ETL

 

DB is 11g R2

 

Any thoughts/suggestions are highly appreciated

  • 1. Re: extracting data from CLOB's using Materialized views
    odie_63 Guru
    Currently Being Moderated

    Hi,

     

    How large are your CLOBs ?

    How many rows in the base table ?

     

    Does the SELECT alone run fine ? Post an explain plan please.

     

    And most likely you don't need to wrap the content into additional <a> elements.

    I don't know if it will solve the main problem but you can simplify the query like this :

    XMLTABLE ('.'

      PASSING XMLPARSE ( CONTENT M.BO_DATA_AREA )

      COLUMNS

        serviceDeliverySiteId varchar2 (15) PATH '/cmPCGeneralInfo/serviceDeliverySiteId',

        serviceSequenceId     varchar2 (3)  PATH '/cmPCGeneralInfo/serviceSequenceId',

        ...

    )

  • 2. Re: extracting data from CLOB's using Materialized views
    878475 Newbie
    Currently Being Moderated

    thanks for responding Odie_63.. here are responses and hope it helps

     

    # rows in the table ~15K

    Size of Clob ~4K average

     

    select runs fine and below is the plan

     

    Plan

    SELECT STATEMENT  ALL_ROWSCost: 103  Bytes: 1,010,413  Cardinality: 5,129 

      1 TABLE ACCESS FULL TABLE CISADM.M1_TASK Cost: 103  Bytes: 1,010,413  Cardinality: 5,129 

     

    the reason why i do the content and wrap the <a> element is because , the xml element inside the clob does not have the root node so in order to use XPATH , i need it .

     

    Again thanks for looking

  • 3. Re: extracting data from CLOB's using Materialized views
    odie_63 Guru
    Currently Being Moderated

    the reason why i do the content and wrap the <a> element is because , the xml element inside the clob does not have the root node so in order to use XPATH , i need it .

    You don't need it. See my example.

     

    And are you sure you've posted the complete plan ? I don't see the portion related to the XMLTABLE evaluation.

  • 4. Re: extracting data from CLOB's using Materialized views
    878475 Newbie
    Currently Being Moderated

    Sorry posted the incorrect plan ...

     

    Plan

    SELECT STATEMENT  ALL_ROWSCost: 15,002  Bytes: 854,160,977  Cardinality: 4,472,047 

      3 NESTED LOOPS  Cost: 15,002  Bytes: 854,160,977  Cardinality: 4,472,047 

      1 TABLE ACCESS FULL TABLE CISADM.M1_TASK Cost: 103  Bytes: 103,572  Cardinality: 548 

      2 COLLECTION ITERATOR PICKLER FETCH PROCEDURE SYS.XQSEQUENCEFROMXMLTYPE Cost: 27  Bytes: 16,336  Cardinality: 8,168 

  • 5. Re: extracting data from CLOB's using Materialized views
    878475 Newbie
    Currently Being Moderated

    XMLTABLE ('.'

      PASSING XMLPARSE ( CONTENT M.BO_DATA_AREA )

      COLUMNS

        serviceDeliverySiteId varchar2 (15) PATH '/cmPCGeneralInfo/serviceDeliverySiteId',

        serviceSequenceId     varchar2 (3)  PATH '/cmPCGeneralInfo/serviceSequenceId',

        ...

    )

    without the element <a> worked , but no change in performance...Thanks though

  • 6. Re: extracting data from CLOB's using Materialized views
    Jason_(A_Non) Expert
    Currently Being Moderated

    What about creating a new table defined as XMLTYPE using storage method SECUREFILE BINARY XML (default in 11.2.0.2 and later) and writing code to truncate that table and then simply

    INSERT INTO new_table (task_id, BO_STATUS_CD, BO_DATA_AREA)
    SELECT task_id, BO_STATUS_CD, BO_DATA_AREA
    FROM CISADM.M1_TASK

    and then creating a regular view on top of that?  You could do a materialized view, but between the performance boost you will see using SECUREFILE BINARY XML, you may not need the materialized view.  You could also create a structured/unstructured index into the XML as well if needed to avoid using a materialized view.

     

    Just a thought and question to see what options are open for working around the limitations for dealing with XML stored in a CLOB.

  • 7. Re: extracting data from CLOB's using Materialized views
    odie_63 Guru
    Currently Being Moderated

    I've been exploring a similar approach, using the following test case.

    It creates a table "MASTER_TABLE" containing 20,000 rows, and a CLOB containing an XML fragment like this :

    <ThisIsElement1>09HOLVUF3T6VX5QUN8UBV9BRW3FHRB9JFO4TSV79R6J87QWVGN</ThisIsElement1>

    <ThisIsElement2>UUL47WDW6C63YIIBOP1X4FEEJ2Z7NCR9BDFHGSLA5YZ5SAH8Y8</ThisIsElement2>

    <ThisIsElement3>O1BU1EXLBU945HQLLFB3LUO03XPWMHBN8Y7SO8YRCQXRSWKKL4</ThisIsElement3>

    ...

    <ThisIsElement49>1HT88050QIGOPGUHGS9RKK54YP7W6OOI6NXVM107GM47R5LUNC</ThisIsElement49>

    <ThisIsElement50>9FJ1JZ615EOUIX6EKBIVOWFDYCPQZM2HBQQ8HDP3ABVJ5N1OJA</ThisIsElement50>

    then an intermediate table "MASTER_TABLE_XML" with the same columns except for the CLOB which is converted to XMLType, and finally a MVIEW :

    SQL> create table master_table as

      2  select level as id

      3       , cast('ROW'||to_char(level) as varchar2(30)) as name

      4       , (

      5           select xmlserialize(content

      6                    xmlagg(

      7                      xmlelement(evalname('ThisIsElement'||to_char(level)), dbms_random.string('X',50))

      8                    )

      9                    as clob indent

    10                  )

    11           from dual

    12           connect by level <= 50

    13         ) as xmlcontent

    14  from dual

    15  connect by level <= 20000 ;

     

    Table created.

     

    SQL> call dbms_stats.gather_table_stats(user, 'MASTER_TABLE');

     

    Call completed.

     

    SQL> create table master_table_xml (

      2    id         number

      3  , name       varchar2(30)

      4  , xmlcontent xmltype

      5  )

      6  xmltype column xmlcontent store as securefile binary xml

      7  ;

     

    Table created.

     

    SQL> create materialized view master_table_mv

      2  build deferred

      3  refresh complete on demand

      4  as

      5  select t.id

      6       , t.name

      7       , x.*

      8  from master_table_xml t

      9     , xmltable('/r' passing t.xmlcontent

    10         columns

    11           ThisIsElement1  varchar2(50) path 'ThisIsElement1'

    12         , ThisIsElement2  varchar2(50) path 'ThisIsElement2'

    13         , ThisIsElement3  varchar2(50) path 'ThisIsElement3'

    14         , ThisIsElement4  varchar2(50) path 'ThisIsElement4'

    15         , ThisIsElement5  varchar2(50) path 'ThisIsElement5'

    16         , ThisIsElement6  varchar2(50) path 'ThisIsElement6'

    17         , ThisIsElement7  varchar2(50) path 'ThisIsElement7'

    18         , ThisIsElement8  varchar2(50) path 'ThisIsElement8'

    19         , ThisIsElement9  varchar2(50) path 'ThisIsElement9'

    20         , ThisIsElement10 varchar2(50) path 'ThisIsElement10'

    21         , ThisIsElement11 varchar2(50) path 'ThisIsElement11'

    22         , ThisIsElement12 varchar2(50) path 'ThisIsElement12'

    23         , ThisIsElement13 varchar2(50) path 'ThisIsElement13'

    24         , ThisIsElement14 varchar2(50) path 'ThisIsElement14'

    25         , ThisIsElement15 varchar2(50) path 'ThisIsElement15'

    26         , ThisIsElement16 varchar2(50) path 'ThisIsElement16'

    27         , ThisIsElement17 varchar2(50) path 'ThisIsElement17'

    28         , ThisIsElement18 varchar2(50) path 'ThisIsElement18'

    29         , ThisIsElement19 varchar2(50) path 'ThisIsElement19'

    30         , ThisIsElement20 varchar2(50) path 'ThisIsElement20'

    31         , ThisIsElement21 varchar2(50) path 'ThisIsElement21'

    32         , ThisIsElement22 varchar2(50) path 'ThisIsElement22'

    33         , ThisIsElement23 varchar2(50) path 'ThisIsElement23'

    34         , ThisIsElement24 varchar2(50) path 'ThisIsElement24'

    35         , ThisIsElement25 varchar2(50) path 'ThisIsElement25'

    36         , ThisIsElement26 varchar2(50) path 'ThisIsElement26'

    37         , ThisIsElement27 varchar2(50) path 'ThisIsElement27'

    38         , ThisIsElement28 varchar2(50) path 'ThisIsElement28'

    39         , ThisIsElement29 varchar2(50) path 'ThisIsElement29'

    40         , ThisIsElement30 varchar2(50) path 'ThisIsElement30'

    41         , ThisIsElement31 varchar2(50) path 'ThisIsElement31'

    42         , ThisIsElement32 varchar2(50) path 'ThisIsElement32'

    43         , ThisIsElement33 varchar2(50) path 'ThisIsElement33'

    44         , ThisIsElement34 varchar2(50) path 'ThisIsElement34'

    45         , ThisIsElement35 varchar2(50) path 'ThisIsElement35'

    46         , ThisIsElement36 varchar2(50) path 'ThisIsElement36'

    47         , ThisIsElement37 varchar2(50) path 'ThisIsElement37'

    48         , ThisIsElement38 varchar2(50) path 'ThisIsElement38'

    49         , ThisIsElement39 varchar2(50) path 'ThisIsElement39'

    50         , ThisIsElement40 varchar2(50) path 'ThisIsElement40'

    51         , ThisIsElement41 varchar2(50) path 'ThisIsElement41'

    52         , ThisIsElement42 varchar2(50) path 'ThisIsElement42'

    53         , ThisIsElement43 varchar2(50) path 'ThisIsElement43'

    54         , ThisIsElement44 varchar2(50) path 'ThisIsElement44'

    55         , ThisIsElement45 varchar2(50) path 'ThisIsElement45'

    56         , ThisIsElement46 varchar2(50) path 'ThisIsElement46'

    57         , ThisIsElement47 varchar2(50) path 'ThisIsElement47'

    58         , ThisIsElement48 varchar2(50) path 'ThisIsElement48'

    59         , ThisIsElement49 varchar2(50) path 'ThisIsElement49'

    60         , ThisIsElement50 varchar2(50) path 'ThisIsElement50'

    61  ) x ;

     

    Materialized view created.

     

     

    The refresh is then performed in two steps :

    1. INSERT INTO master_table_xml
    2. MVIEW refresh

    (note : since we insert into an XMLType column, we do need an XML document (single-rooted) this time)

    SQL> set timing on

    SQL>

    SQL> truncate table master_table_xml;

     

    Table truncated.

     

    Elapsed: 00:00:00.27

    SQL>

    SQL> insert into master_table_xml

      2  select id

      3       , name

      4       , xmlparse(document '<r>'||xmlcontent||'</r>')

      5  from master_table ;

     

    20000 rows created.

     

    Elapsed: 00:04:38.72

    SQL>

    SQL> call dbms_mview.refresh('MASTER_TABLE_MV');

     

    Call completed.

     

    Elapsed: 00:00:22.42

    SQL>

    SQL> select count(*) from master_table_mv;

     

      COUNT(*)

    ----------

         20000

     

    Elapsed: 00:00:01.38

    SQL> truncate table master_table_xml;

     

    Table truncated.

     

    Elapsed: 00:00:00.41

     

  • 8. Re: extracting data from CLOB's using Materialized views
    878475 Newbie
    Currently Being Moderated

    Thanks Jason and Odie_63... I tried an normal secure file and it did not quite help ... trying the approach as a securefile with Binary xml

  • 9. Re: extracting data from CLOB's using Materialized views
    878475 Newbie
    Currently Being Moderated

    secure file with binary xml seems the best thanks guys

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points