This discussion is archived
5 Replies Latest reply: Jun 1, 2006 12:54 PM by Arie Geller RSS

Indexing a XMLType column record

Arie Geller Guru
Currently Being Moderated
Hello,

I'm in a situation where every day I'm receiving a XML file, which contains 2000-3000 records, in the following structure:
<top>
   <data>
        <rec  item1="abc" item2="def" item3="ghij" ... />
        <rec  item1="123" item2="456" item3="7890" ... />
        <rec  item1="a1b" item2="2a1" item3="5f4"  ... />
         ...
         ...
   </data>
</top>
I'm loading this XML file into a table with XMLType column:
CREATE TABLE  "DC_XML" 
   (     "DC_FILE" VARCHAR2(15), 
     "XML_DOC"  "XMLTYPE", 
     "LAST_UPLOAD" DATE, 
      PRIMARY KEY ("DC_FILE") ENABLE
   )
/
I need to extract only several records from this document, into a "regular" table, according to the attribute value of item1.

I'm trying to reduce the extract time for each element, so I thought about indexing the XML document – now in XML_DOC column – using item1 attribute value.

This is part of an ApEx application. I tried to use the SQL Workshop Object Browser wizard, and it created the following index:
CREATE INDEX  "DC_XML_CTX1" ON  "DC_XML" (SYS_MAKEXML("SYS_NC00003$")) 
   INDEXTYPE IS "CTXSYS"."CONTEXT"
/
Unfortunately, I don't see any improvement on the extract time, and I'm not sure this is an effective index.

Is it possible to create an effective index on an attribute in the XML document?
If so how?

Appreciate any help,
Arie.
  • 1. Re: Indexing a XMLType column record
    Marco Gralike Oracle ACE Director
    Currently Being Moderated
    Did you try to create an function based index based on your xpath selection to fetch the attributes. Probably this won't help you - I guess.

    But anyway look at the section "Indexing XMLType Columns" (http://download-uk.oracle.com/docs/cd/B19306_01/appdev.102/b14259/xdb04cre.htm#sthref596). One of the solutions would be to make use of XMLType OR storage so query rewrite could help you in the situation and you will be able to build better indexes. The other way (for now) is the usage of Oracle Text indexing (as you did - but also see the example from the manual) and/or CTXXPATH indexing (which will be depricated).

    Don't forget to generate statistics (so the optimzer is better aware of the situation) and enable query rewrite (if not already so - automatically set on system level).

    Sorry, to say that i can't help you with an example. Don't have a database at hand, to check my syntax.

    Which database version, OS are you on?
  • 2. Re: Indexing a XMLType column record
    mdrake Expert
    Currently Being Moderated
    Here's an example using a registered XML Schema that will let you create an index the value of item1... It uses the 10.2.x XMLTable feature. If you are on earlier version of the database you would have to write the query using nested table(xmlsequence(extract())) constructs
    SQL> var schemaURL varchar2(256)
    SQL> var schemaPath varchar2(256)
    SQL> --
    SQL> begin
      2    :schemaURL := 'testcase.xsd';
      3    :schemaPath := '/public/testcase.xsd';
      4  end;
      5  /
    
    PL/SQL procedure successfully completed.
    
    SQL> call dbms_xmlSchema.deleteSchema(:schemaURL,4)
      2  /
    
    Call completed.
    
    SQL> declare
      2    res boolean;
      3    xmlSchema xmlType := xmlType(
      4  '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
      5  <!--W3C Schema generated by XMLSpy v2006 sp2 U (http://www.altova.com)-->
      6  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdb="http://xmlns.oracle.com/xdb" elementFormDefault="qualified" xdb:sto
    VarrayAsTable="true">
      7     <xs:complexType name="recType" xdb:SQLType="REC_T">
      8             <xs:attribute name="item1" type="xs:string" use="required"/>
      9             <xs:attribute name="item2" type="xs:string" use="required"/>
     10             <xs:attribute name="item3" type="xs:string" use="required"/>
     11     </xs:complexType>
     12     <xs:complexType name="dataType" xdb:SQLType="DATA_T">
     13             <xs:sequence>
     14                     <xs:element name="rec" type="recType" maxOccurs="unbounded" xdb:SQLCollType="REC_V"/>
     15             </xs:sequence>
     16     </xs:complexType>
     17     <xs:element name="top" xdb:defaultTable="XML_TABLE">
     18             <xs:complexType xdb:SQLType="TOP_T">
     19                     <xs:sequence>
     20                             <xs:element name="data" type="dataType"  maxOccurs="unbounded" xdb:SQLCollType="DATA_V"/>
     21                     </xs:sequence>
     22             </xs:complexType>
     23     </xs:element>
     24  </xs:schema>');
     25  begin
     26    if (dbms_xdb.existsResource(:schemaPath)) then
     27      dbms_xdb.deleteResource(:schemaPath);
     28    end if;
     29    res := dbms_xdb.createResource(:schemaPath,xmlSchema);
     30  end;
     31  /
    
    PL/SQL procedure successfully completed.
    
    SQL> begin
      2    dbms_xmlschema.registerSchema
      3    (
      4      :schemaURL,
      5      xdbURIType(:schemaPath).getClob(),
      6      TRUE,TRUE,FALSE,TRUE
      7    );
      8  end;
      9  /
    
    PL/SQL procedure successfully completed.
    
    SQL> insert into XML_TABLE values ( xmltype(
      2  '<top>
      3     <data>
      4          <rec  item1="abc" item2="def" item3="ghij" />
      5          <rec  item1="123" item2="456" item3="7890"  />
      6          <rec  item1="a1b" item2="2a1" item3="5f4" />
      7     </data>
      8  </top>
      9  '))
     10  /
    
    1 row created.
    
    
    Execution Plan
    ----------------------------------------------------------
    
    -------------------------------------------------------------------------
    | Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------
    |   0 | INSERT STATEMENT |      |     1 |   100 |     1   (0)| 00:00:01 |
    -------------------------------------------------------------------------
    
    SQL> set long 100000
    SQL> --
    SQL> select * from XML_TABLE
      2  /
    
    SYS_NC_ROWINFO$
    --------------------------------------------------------------------------------
    <top>
      <data>
        <rec item1="abc" item2="def" item3="ghij"/>
        <rec item1="123" item2="456" item3="7890"/>
        <rec item1="a1b" item2="2a1" item3="5f4"/>
      </data>
    </top>
    
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1028460904
    
    -------------------------------------------------------------------------------
    | Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |           |     1 |  5668 |     3   (0)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| XML_TABLE |     1 |  5668 |     3   (0)| 00:00:01 |
    -------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter(SYS_CHECKACL("ACLOID","OWNERID",xmltype('<privilege
                  xmlns="http://xmlns.oracle.com/xdb/acl.xsd"
                  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                  xsi:schemaLocation="http://xmlns.oracle.com/xdb/acl.xsd
                  http://xmlns.oracle.com/xdb/acl.xsd
                  DAV:http://xmlns.oracle.com/xdb/dav.xsd"><read-properties/><read-c
    ontent
    
                  s/></privilege>'))=1)
    
    Note
    -----
       - dynamic sampling used for this statement
    
    SQL> var nested_table_name varchar2(32)
    SQL> /
    
    SYS_NC_ROWINFO$
    --------------------------------------------------------------------------------
    <top>
      <data>
        <rec item1="abc" item2="def" item3="ghij"/>
        <rec item1="123" item2="456" item3="7890"/>
        <rec item1="a1b" item2="2a1" item3="5f4"/>
      </data>
    </top>
    
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1028460904
    
    -------------------------------------------------------------------------------
    | Id  | Operation         | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |           |     1 |  5668 |     3   (0)| 00:00:01 |
    |*  1 |  TABLE ACCESS FULL| XML_TABLE |     1 |  5668 |     3   (0)| 00:00:01 |
    -------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter(SYS_CHECKACL("ACLOID","OWNERID",xmltype('<privilege
                  xmlns="http://xmlns.oracle.com/xdb/acl.xsd"
                  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                  xsi:schemaLocation="http://xmlns.oracle.com/xdb/acl.xsd
                  http://xmlns.oracle.com/xdb/acl.xsd
                  DAV:http://xmlns.oracle.com/xdb/dav.xsd"><read-properties/><read-c
    ontent
    
                  s/></privilege>'))=1)
    
    Note
    -----
       - dynamic sampling used for this statement
    
    SQL> begin
      2    select table_name
      3      into :nested_table_name
      4      from user_nested_tables
      5     where table_type_name = 'DATA_V'
      6       and parent_table_name = 'XML_TABLE';
      7
      8     execute immediate 'rename "'|| :nested_table_name ||'" to DATA_TABLE';
      9
     10    select table_name
     11      into :nested_table_name
     12      from user_nested_tables
     13     where table_type_name = 'REC_V'
     14       and parent_table_name = 'DATA_TABLE';
     15
     16     execute immediate 'rename "'|| :nested_table_name ||'" to REC_TABLE';
     17  end;
     18  /
    
    PL/SQL procedure successfully completed.
    
    SQL> desc TOP_T
     Name                                      Null?    Type
     ----------------------------------------- -------- ----------------------------
     SYS_XDBPD$                                         XDB.XDB$RAW_LIST_T
     data                                               DATA_V
    
    SQL> --
    SQL> desc DATA_V
     DATA_V VARRAY(2147483647) OF DATA_T
     DATA_T is NOT FINAL
     Name                                      Null?    Type
     ----------------------------------------- -------- ----------------------------
     SYS_XDBPD$                                         XDB.XDB$RAW_LIST_T
     rec                                                REC_V
    
    SQL> --
    SQL> desc REC_V
     REC_V VARRAY(2147483647) OF REC_T
     REC_T is NOT FINAL
     Name                                      Null?    Type
     ----------------------------------------- -------- ----------------------------
     SYS_XDBPD$                                         XDB.XDB$RAW_LIST_T
     item1                                              VARCHAR2(4000 CHAR)
     item2                                              VARCHAR2(4000 CHAR)
     item3                                              VARCHAR2(4000 CHAR)
    
    SQL> --
    SQL> desc DATA_TABLE
     Name                                      Null?    Type
     ----------------------------------------- -------- ----------------------------
     SYS_XDBPD$                                         XDB.XDB$RAW_LIST_T
     rec                                                REC_V
    
    SQL> --
    SQL> desc REC_TABLE
     Name                                      Null?    Type
     ----------------------------------------- -------- ----------------------------
     SYS_XDBPD$                                         XDB.XDB$RAW_LIST_T
     item1                                              VARCHAR2(4000 CHAR)
     item2                                              VARCHAR2(4000 CHAR)
     item3                                              VARCHAR2(4000 CHAR)
    
    SQL> --
    SQL> create index ITEM1_INDEX on REC_TABLE("item1")
      2  /
    
    Index created.
    
    SQL> set lines 150
    SQL> --
    SQL> set autotrace on explain
    SQL> --
    SQL> select i.*
      2    from XML_TABLE,
      3         XMLTABLE
      4         (
      5            '/top'
      6            passing object_value
      7            columns
      8            DATA_FRAGMENTS xmltype path 'data'
      9         ) t,
     10         XMLTABLE
     11         (
     12            '/data'
     13            passing t.DATA_FRAGMENTS
     14            columns
     15            REC_FRAGMENTS xmltype path 'rec'
     16         ) d,
     17         XMLTABLE
     18         (
     19            '/rec[@item1="123"]'
     20            passing d.REC_FRAGMENTS
     21            columns
     22            item1 varchar2(20) path '@item1',
     23            item2 varchar2(20) path '@item2',
     24            item3 varchar2(20) path '@item3'
     25         ) i
     26  /
    
    ITEM1                ITEM2                ITEM3
    -------------------- -------------------- --------------------
    123                  456                  7890
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1991437087
    
    --------------------------------------------------------------------------------------------------
    | Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT             |                   |     1 |  9846 |     4   (0)| 00:00:01 |
    |   1 |  NESTED LOOPS                |                   |     1 |  9846 |     4   (0)| 00:00:01 |
    |   2 |   NESTED LOOPS               |                   |     1 |  6056 |     3   (0)| 00:00:01 |
    |*  3 |    INDEX UNIQUE SCAN         | SYS_IOT_TOP_53725 |     1 |  6026 |     1   (0)| 00:00:01 |
    |*  4 |     INDEX RANGE SCAN         | ITEM1_INDEX       |     1 |       |     1   (0)| 00:00:01 |
    |*  5 |    INDEX UNIQUE SCAN         | SYS_IOT_TOP_53723 |     1 |    30 |     2   (0)| 00:00:01 |
    |*  6 |     INDEX RANGE SCAN         | SYS_C005603       |     1 |       |     0   (0)| 00:00:01 |
    |*  7 |   TABLE ACCESS BY INDEX ROWID| XML_TABLE         |     1 |  3790 |     1   (0)| 00:00:01 |
    |*  8 |    INDEX UNIQUE SCAN         | SYS_C005604       |     1 |       |     0   (0)| 00:00:01 |
    --------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       3 - access("item1"='123')
           filter("SYS_NC_TYPEID$" IS NOT NULL)
       4 - access("item1"='123')
       5 - access("NESTED_TABLE_ID"="DATA_TABLE"."SYS_NC0000600007$")
           filter("SYS_NC_TYPEID$" IS NOT NULL)
       6 - access("NESTED_TABLE_ID"="DATA_TABLE"."SYS_NC0000600007$")
       7 - filter(SYS_OP_NOEXPAND("XML_TABLE"."XMLDATA") IS NOT NULL AND
                  SYS_CHECKACL("ACLOID","OWNERID",xmltype('<privilege
                  xmlns="http://xmlns.oracle.com/xdb/acl.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-in
                  stance" xsi:schemaLocation="http://xmlns.oracle.com/xdb/acl.xsd
                  http://xmlns.oracle.com/xdb/acl.xsd DAV:http://xmlns.oracle.com/xdb/dav.xsd"><read-propert
                  ies/><read-contents/></privilege>'))=1)
       8 - access("DATA_TABLE"."NESTED_TABLE_ID"="XML_TABLE"."SYS_NC0000800009$")
    
    Note
    -----
       - dynamic sampling used for this statement
    
    SQL>
  • 3. Re: Indexing a XMLType column record
    Arie Geller Guru
    Currently Being Moderated
    Hello,

    Thanks Marco and Mark for your replies.

    Mark – I have two preliminary questions, following your very detailed example (I really appreciate it).

    - I'm trying to maintain a compatibility with Oracle 10g EX. Is that mean I need to use the xmlsequence(extract()) format for my statements?

    - Can I create/use the index on a non-registered XML schema? (I'm just not sure that XML schema is part of the vocabulary of the one who's producing the XML files I need to work with).

    Thanks so far,
    Arie.
  • 4. Re: Indexing a XMLType column record
    mdrake Expert
    Currently Being Moderated
    For the moment table(xmlsequence(extract())) is required if XE is a target platform. We hope to address this issue the next time XE is revised. However I have no visablilty as to when this might happen.....

    The Index requires Schema Based Storage since this the way we force the each of the rec elements to be stored in a table were the item1 values can be indexed. In the current archtiecture it is difficult to index elements / attributes that occur more than once in a document wihtout using Schema Based Storage
  • 5. Re: Indexing a XMLType column record
    Arie Geller Guru
    Currently Being Moderated
    Hi Mark,

    Well, it seems I need to learn about nested tables and registered XML schemas …

    Thanks a lot,
    Arie.