This discussion is archived
8 Replies Latest reply: Mar 11, 2013 6:46 AM by pl_sequel RSS

FBI not used in xmlquery

pl_sequel Newbie
Currently Being Moderated
G'day all,

Running the following set-up: Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

Getting a full table scan on a table with an FBI.

Here is a sample set-up:
create table a (id number, filepath varchar2(256 char));
insert into a values(1,'/dir/subdir/file1.xml');
insert into a values(2,'/dir/subdir/file2.xml');
insert into a values(3,'/dir/subdir/file3.xml');
insert into a values(4,'/dir/subdir/file4.xml');
insert into a values(5,'/dir/subdir/file5.xml');
commit;

create index filepath_substr_fbi on a(substr(filepath,instr(filepath,'/',-1)+1));
create unique index filepath_uidx on a(filepath);
So the requirement is to retrieve the ids from the table containing a particular filename. The filenames to retrieve are sourced from an xmltype. The approach I've taken is the following:
select * from a , (XMLTable('for $j in /task/related_files/ora:tokenize(text(),"\|")
where ora:matches($j,"\d+\.xml") return normalize-space($j)'
    PASSING xmltype(
  '<task><related_files>File 1|file1.xml|file 2|file2.xml|file 3|file3.xml</related_files></task>'
  ))) x  where substr(a.filepath,instr(a.filepath,'/',-1)+1) = x.column_value.getstringval();
which returns the following results:
"ID"     "FILEPATH"     "COLUMN_VALUE"
1     "/dir/subdir/file1.xml"     file1.xml
2     "/dir/subdir/file2.xml"     file2.xml
3     "/dir/subdir/file3.xml"     file3.xml
Now, after running this sample, and doing an explain plan on it, I did notice it used the index!
Execution Plan
----------------------------------------------------------
Plan hash value: 107833345

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                       |   408 |   109K|    32   (4)| 00:00:01 |
|   1 |  MERGE JOIN                         |                       |   408 |   109K|    32   (4)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID       | A                     |     5 |  1365 |     2   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN                  | FILEPATH_SUBSTR_FBI   |     5 |       |     1   (0)| 00:00:01 |
|*  4 |   SORT JOIN                         |                       |   408 |   816 |    30   (4)| 00:00:01 |
|*  5 |    COLLECTION ITERATOR PICKLER FETCH| XQSEQUENCEFROMXMLTYPE |   408 |   816 |    29   (0)| 00:00:0
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access(SUBSTR("FILEPATH",INSTR("FILEPATH",'/',(-1))+1)="XMLTYPE"."GETSTRINGVAL"(XMLCDATA(SYS_
              QNORMSPACE(SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0)),1)))
       filter(SUBSTR("FILEPATH",INSTR("FILEPATH",'/',(-1))+1)="XMLTYPE"."GETSTRINGVAL"(XMLCDATA(SYS_
              QNORMSPACE(SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0)),1)))
   5 - filter( REGEXP_LIKE (SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0),'\d+\.xml'))

Note
-----
   - dynamic sampling used for this statement (level=2)
Now for some reason, my actual code (which is not much different than sample above)..gets the following plan:
Execution Plan
----------------------------------------------------------
Plan hash value: 1696443977

----------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |                       |   449 | 53431 |    33   (4)| 00:00:01 |
|*  1 |  HASH JOIN                         |                       |   449 | 53431 |    33   (4)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                | DATA_CAPTURE_RECORD   |    11 |  1287 |     3   (0)| 00:00:01 |
|*  3 |   COLLECTION ITERATOR PICKLER FETCH| XQSEQUENCEFROMXMLTYPE |   408 |   816 |    29   (0)| 00:00
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access(SUBSTR("DRAFT_VPATH",INSTR("DRAFT_VPATH",'/',(-1))+1)="XMLTYPE"."GETSTRINGVAL"(XMLCDAT
              A(SYS_XQNORMSPACE(SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0)),1)))
   3 - filter( REGEXP_LIKE (SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0),'\d+\.xml'))
Here is the sql query for the above plan:
select * from data_capture_record x, (XMLTable('for $j in /task/related_tasks/ora:tokenize(text(),"\|")
where ora:matches($j,"\d+\.xml") return normalize-space($j)'
    PASSING xmltype(
  '<task><related_tasks>Actioning a Reminder Work Item or a BFd Action Item Work Item for Jury Service - clean up this Edit - another Edit | 17.xml| ROE Correct WI Received for a Regular ROE s | 24.xml| Sending benefit statement copies that are currently archived . | 10.xml| Test of TINY MCE Configurations - Test Edit. | 46.xml|</related_tasks> </task>'
  ))) a  where substr(x.DRAFT_VPATH,instr(x.DRAFT_VPATH,'/',-1)+1) = a.column_value.getstringval();
Any ideas? A quick note, i just started getting into xquery this week..so if you spot something, or see a better solution to my problem, please let me know...

Thanks for any help or advice.

Stephane
  • 1. Re: FBI not used in xmlquery
    damorgan Oracle ACE Director
    Currently Being Moderated
    What, precisely, is wrong with a full table scan?

    When I look at your explain plan report I see:
    Time
    ---------
    00:00:01 
    how much faster do you want this to run?
  • 2. Re: FBI not used in xmlquery
    pl_sequel Newbie
    Currently Being Moderated
    hahaha...yeah, honestly hadn't even looked at the time. the "full scan" caught my attention. Thing is the table right now only has like 10 rows in it, but it's expected to grow to thousands, so I would think the time would get much worse, given a full scan of the table, unless I'm missing something...

    to further add, since i am querying the column using a substr/instr function, and only interested in a very small subset of rows (a task could have perhaps 5 related files, out of a couple of thousand for example), i figured this would be a good case for a FBI, to avoid having to substr/instr each row to find a match.

    Edited by: pl_sequel on Mar 1, 2013 10:54 AM
  • 3. Re: FBI not used in xmlquery
    damorgan Oracle ACE Director
    Currently Being Moderated
    There is absolutely nothing wrong with a full table scan. I have plenty of examples I use for teaching where forcing index usage makes performance worse.

    But let's take a second to look at your situation with only 10 rows. Those 10 rows are in 1 8K block. Your index is in how many blocks? Which is more efficient for Oracle ... to read the index blocks and then the table block or just to read the one 8K block holding all of your data?

    The optimizer is making a very intelligent choice. What you need to do is either:

    1. Load a realistic data set and retest your query
    2. Fake table and index statistics and see how the optimizer responds (how to do it here: http://www.morganslibrary.org/reference/tuning.html) ... look for the heading "Setting CBO Statistics."

    Faking it takes a lot less effort and usually gives very good answers.
  • 4. Re: FBI not used in xmlquery
    pl_sequel Newbie
    Currently Being Moderated
    Thanks! I have to admit, I need to work on tuning and optimization... this is making sense. I'll read up on that article.
  • 5. Re: FBI not used in xmlquery
    odie_63 Guru
    Currently Being Moderated
    Indeed, cardinality and selectivity are essential in this case.

    Consider the following test case with a 1,000,000-row table, the CBO picks up the index :
    Connected to:
    Oracle Database 11g Express Edition Release 11.2.0.2.0 - Production
    
    SQL> create table test2 as
      2  select level as path_id
      3      , cast('/dir/subdir/file'||to_char(level,'fm0999999')||'.xml' as varchar2(256)) as filepath
      4  from dual
      5  connect by level <= 1000000 ;
    
    Table created.
    
    SQL> select count(*) from test2;
    
      COUNT(*)
    ----------
       1000000
    
    SQL> create index test2_ix on test2 ( substr(filepath, instr(filepath, '/', -1) + 1) );
    
    Index created.
    
    SQL> exec dbms_stats.gather_table_stats(user, 'TEST2');
    
    PL/SQL procedure successfully completed.
    
    SQL> set lines 120
    SQL> set autotrace on explain
    SQL> select *
      2  from test2 t
      3  where substr(filepath, instr(filepath, '/', -1) + 1) in (
      4    select filename
      5    from xmltable(
      6           'for $i in ora:tokenize(/task/related_files, "\|")
      7            where ora:matches($i, "\d+\.xml")
      8            return normalize-space($i)'
      9           passing xmlparse(document '<task><related_files>File 1| file0000001.xml|file 2|file0999999.xml |file 3|file0007773.xml</related_files></
    task>')
     10           columns filename varchar2(30) path '.'
     11         ) x
     12  ) ;
    
       PATH_ID FILEPATH
    ---------- ------------------------------
             1 /dir/subdir/file0000001.xml
          7773 /dir/subdir/file0007773.xml
        999999 /dir/subdir/file0999999.xml
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 420041503
    
    -------------------------------------------------------------------------------------------------------------
    | Id  | Operation                           | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                    |                       |   257 | 13107 |   643   (1)| 00:00:08 |
    |   1 |  NESTED LOOPS                       |                       |   257 | 13107 |   643   (1)| 00:00:08 |
    |   2 |   SORT UNIQUE                       |                       |   408 |   816 |    29   (0)| 00:00:01 |
    |*  3 |    COLLECTION ITERATOR PICKLER FETCH| XQSEQUENCEFROMXMLTYPE |   408 |   816 |    29   (0)| 00:00:01 |
    |   4 |   TABLE ACCESS BY INDEX ROWID       | TEST2                 |     1 |    49 |     3   (0)| 00:00:01 |
    |*  5 |    INDEX RANGE SCAN                 | TEST2_IX              |     1 |       |     2   (0)| 00:00:01 |
    -------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       3 - filter( REGEXP_LIKE (SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0),'\d+\.xml'))
       5 - access(SUBSTR("FILEPATH",INSTR("FILEPATH",'/',-1)+1)=CAST(SYS_XQ_UPKXML2SQL(SYS_XQEXVAL(SYS_XQ
                  _PKSQL2XML(SYS_XQNORMSPACE(SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0)),1,2,0),0,0,20971520,0),50,1,2) AS
                  varchar2(30) ))
    
    Note
    -----
       - Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)
    
    SQL> select t.*
      2  from test2 t
      3       join xmltable(
      4         'for $i in ora:tokenize(/task/related_files, "\|")
      5          where ora:matches($i, "\d+\.xml")
      6          return normalize-space($i)'
      7         passing xmlparse(document '<task><related_files>File 1| file0000001.xml|file 2|file0999999.xml |file 3|file0007773.xml</related_files></ta
    sk>')
      8         columns filename varchar2(30) path '.'
      9       ) x
     10       on substr(t.filepath, instr(t.filepath, '/', -1) + 1) = x.filename
     11  ;
    
       PATH_ID FILEPATH
    ---------- ------------------------------
             1 /dir/subdir/file0000001.xml
        999999 /dir/subdir/file0999999.xml
          7773 /dir/subdir/file0007773.xml
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1810535419
    
    ------------------------------------------------------------------------------------------------------------
    | Id  | Operation                          | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                   |                       |   411 | 20961 |  1254   (1)| 00:00:16 |
    |   1 |  NESTED LOOPS                      |                       |   411 | 20961 |  1254   (1)| 00:00:16 |
    |*  2 |   COLLECTION ITERATOR PICKLER FETCH| XQSEQUENCEFROMXMLTYPE |   408 |   816 |    29   (0)| 00:00:01 |
    |   3 |   TABLE ACCESS BY INDEX ROWID      | TEST2                 |     1 |    49 |     3   (0)| 00:00:01 |
    |*  4 |    INDEX RANGE SCAN                | TEST2_IX              |     1 |       |     2   (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter( REGEXP_LIKE (SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0),'\d+\.xml'))
       4 - access(SUBSTR("FILEPATH",INSTR("FILEPATH",'/',-1)+1)=CAST(SYS_XQ_UPKXML2SQL(SYS_XQEXVAL(SYS_X
                  Q_PKSQL2XML(SYS_XQNORMSPACE(SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0)),1,2,0),0,0,20971520,0),50,1,2)
                  AS varchar2(30) ))
    
    Note
    -----
       - Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)
    From the plans above we see that the CBO estimates 408 rows from XMLTable.
    This estimation could be refined by using hints : CARDINALITY (not documented), DYNAMIC_SAMPLING, or by using a wrapper function and the extensible optimizer feature.
    For example, if you expect at most 10 rows every time :
    SQL> select /*+ cardinality(x 10) */ t.*
      2  from test2 t
      3       join xmltable(
      4         'for $i in ora:tokenize(/task/related_files, "\|")
      5          where ora:matches($i, "\d+\.xml")
      6          return normalize-space($i)'
      7         passing xmlparse(document '<task><related_files>File 1| file0000001.xml|file 2|file0999999.xml |file 3|file0007773.xml</related_files></ta
    sk>')
      8         columns filename varchar2(30) path '.'
      9       ) x
     10       on substr(t.filepath, instr(t.filepath, '/', -1) + 1) = x.filename
     11  ;
    
       PATH_ID FILEPATH
    ---------- ------------------------------
             1 /dir/subdir/file0000001.xml
        999999 /dir/subdir/file0999999.xml
          7773 /dir/subdir/file0007773.xml
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1810535419
    
    ------------------------------------------------------------------------------------------------------------
    | Id  | Operation                          | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                   |                       |    10 |   510 |    59   (0)| 00:00:01 |
    |   1 |  NESTED LOOPS                      |                       |    10 |   510 |    59   (0)| 00:00:01 |
    |*  2 |   COLLECTION ITERATOR PICKLER FETCH| XQSEQUENCEFROMXMLTYPE |    10 |    20 |    29   (0)| 00:00:01 |
    |   3 |   TABLE ACCESS BY INDEX ROWID      | TEST2                 |     1 |    49 |     3   (0)| 00:00:01 |
    |*  4 |    INDEX RANGE SCAN                | TEST2_IX              |     1 |       |     2   (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter( REGEXP_LIKE (SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0),'\d+\.xml'))
       4 - access(SUBSTR("FILEPATH",INSTR("FILEPATH",'/',-1)+1)=CAST(SYS_XQ_UPKXML2SQL(SYS_XQEXVAL(SYS_X
                  Q_PKSQL2XML(SYS_XQNORMSPACE(SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0)),1,2,0),0,0,20971520,0),50,1,2)
                  AS varchar2(30) ))
    
    Note
    -----
       - Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)
  • 6. Re: FBI not used in xmlquery
    MarcoGralike Oracle ACE Director
    Currently Being Moderated
    Regarding optimization.
    Note
    -----
       - Unoptimized XML construct detected (enable XMLOptimizationCheck for more information)
    A question I would have with this setup, why the "overload", overkill of forcing checking, switching between SQL and XQuery engines, why rewriting of varchar2 into xmltype's?
    I would have stored the file paths in the table or in XML and then process it via XQuery OR store it in varchar2 and then process in NOT via XQuery.
    First thing in my mind was "why this approach"? Just fanciness...of doing something in XQuery when there is probably no need for it...?
  • 7. Re: FBI not used in xmlquery
    odie_63 Guru
    Currently Being Moderated
    First thing in my mind was "why this approach"? Just fanciness...of doing something in XQuery when there is probably no need for it...?
    Most of the time, because one the following reasons :

    - Application developers don't know or care about the design on database side
    - Old design legacy
    - So-called app "architects" think that storing data in XML or more commonly CSV format is cool and brings a great deal of flexibility, thus totally negating the very purpose of a RDBMS
  • 8. Re: FBI not used in xmlquery
    pl_sequel Newbie
    Currently Being Moderated
    Hey guys,

    Thanks for the comment guys. Here's a bit more context.

    Our project is using Autonomy Teamsite CMS and it uses a custom xml based filesystem for storage and versioning. Our requirements force us to also deploy and store these xml content records to the Oracle database. So it was only fitting to leverage Oracle's XML and xquery support...

    I had various options to consider...structured vs unstructured storage, xml schema vs non xml schema etc... we opted for a simpler architecture, non xml schema-based, unstructured xmltype column with binary xml storage, and using xmldb index to index every element in the content record. We're not using an xmltype object table, but using simple table with supporting columns such as audit fields, sequence generated IDs etc... My initial prototype was using virtual columns to extract key elements from the XMLTYPE column..but in the end, we decided to keep it "simple"... and less rigid. Teamsite xml content records are fully configurable (data capture templates are xml documents themselves, so developers can create any content type, and Teamsite captures it as an xml document). Teamsite xml records are also schemaless, so it didn't make sense for us to enforce a schema in Oracle, and go as far as having it shredded into a full blow object-relational structure, since the Teamsite data capture templates could change at any given time, and would be a pain to integrate back into the DB me thinks.

    We have various querying requirements, such producing RSS feeds (which means extracting title, summary, dates etc.. from each xml content record), querying on metadata values in the xml record to produce various topic listings, and supporting Oracle text search.

    Another requirement was to enable end users to related content records to each other, and doing so in the data capture forms in Teamsite forms publisher is tedious to say the least. Our clients aren't fond of the chosen CMS and find it overly complex, especially for non-technical authors (mostly MS-Office users)... so we had to strike a delicate balance between how much of the CMS UI we leverage, and how much we offload to custom implementations.

    The related content records are captured in Teamsite and stored in the xml as a CSV element value. I then need to extract this info, and use it to query the content records in the Oracle database to retrieve the titles and summaries, and ids of each related record, and I insert back into the xml for storage.

    Now this is the content authoring component.

    We use Autonomy Livesite on the front end to render the content pages. In short, we develop java classes that call our Oracle stored procedures to query our xml content records. The results are then fed back to our java classes as xml, which is then returned to the Livesite runtime engine, where it then runs our custom xslt to render the final html.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points