This discussion is archived
14 Replies Latest reply: Dec 1, 2008 7:40 AM by Jonathan Lewis RSS

Function Based Index And Selectivity

638933 Newbie
Currently Being Moderated
Hi All,

I have some doubts w.r.t FBI.I am on 10gR2 (10.2.0.4) with Solaris 5.9
I was under impression that FBI does not provide guaranteed index access and CBO choose access pattern purely on basis of available stats and query selectivity.
However, many a times i found that CBO is going for FBI even in case when FTS provides better query elapsed time.

I created following test case:

create table fbi_test (id number,flag varchar2(1));

begin
for i in 1..1000000
loop
insert into fbi_test values(i,'Y');
end loop;
end;
/
commit;
begin
for i in 1..10
loop
insert into fbi_test values(i,'N');
end loop;
end;
/
commit;
ANALYZE TABLE FBI_TEST COMPUTE STATISTICS;
CREATE INDEX fbi_test_FBI
ON fbi_test (CASE WHEN flag = 'Y' THEN 1 ELSE NULL END);

Autotrace for FBI ACCESS

SQL> select *from fbi_test where (CASE WHEN flag = 'Y' THEN 1 ELSE NULL END)=1;

1000000 rows selected.

Elapsed: 00:00:18.43

Execution Plan
----------------------------------------------------------

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10000 | 50000 | 342 (1)|
| 1 | TABLE ACCESS BY INDEX ROWID| FBI_TEST | 10000 | 50000 | 342 (1)|
|* 2 | INDEX RANGE SCAN | FBI_TEST_FBI | 4000 | | 1958 (1)|
---------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
136812 consistent gets
0 physical reads
0 redo size
22180292 bytes sent via SQL*Net to client
733814 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000000 rows processed

Autotrace for FTS

SQL> select *from fbi_test where flag = 'Y';

1000000 rows selected.

Elapsed: 00:00:16.56

Execution Plan
----------------------------------------------------------

-------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 500K| 2441K| 371 (9)|
|* 1 | TABLE ACCESS FULL| FBI_TEST | 500K| 2441K| 371 (9)|
-------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
68372 consistent gets
0 physical reads
0 redo size
22180292 bytes sent via SQL*Net to client
733814 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1000000 rows processed


FYI...
SQL> show parameter opt

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
filesystemio_options string asynch
object_cache_optimal_size integer 102400
optimizer_dynamic_sampling integer 2
optimizer_features_enable string 10.2.0.3
optimizer_index_caching integer 0
optimizer_index_cost_adj integer 100
optimizer_mode string ALL_ROWS
optimizer_secure_view_merging boolean TRUE
plsql_optimize_level integer 2

My questions are,
1.why oracle optimizer is going for FBI with high cardinality of 100000 rows?
2.Why cost of FTS is high (371) as compare to FBI (342) eventhough FTS is having fewer IO (68372) + Less Elapsed Time?
3.Why Optimizer is considering ELAPSED TIME during plan generation?

Any inpute would be highly appreciated.
  • 1. Re: Function Based Index And Selectivity
    SatishKandi Guru
    Currently Being Moderated
    Could you repeat your test case after gathering statistics with DBMS_STATS instead of ANALYZE TABLE?
  • 2. Re: Function Based Index And Selectivity
    Anurag Tibrewal Guru
    Currently Being Moderated
    Hi,

    Answers are

    1) why oracle optimizer is going for FBI with high cardinality of 100000 rows?

    Ans: Cardinality of 100000 is actual cardinality that you got after execution of the query not the computed cardinality which oracle does while preparation of execution plan. Here selectivity seems to be 0.5, so computed cardinality is somewhat near 50000 so might be oracle would find it easy to go for index scan. Going for FTS or index scan would depends on the optimizer mode and some of the parameters. Somewhat actual cardinality can only be estimated if you have histogram generated too apart from statistics.

    2).Why cost of FTS is high (371) as compare to FBI (342) eventhough FTS is having fewer IO (68372) + Less Elapsed Time?

    Ans: Since oracle calculated that the cost of FBI is less thats why it went for FBI. Note: There could be a lot of difference between the actual cost (actual io) and estimated cost(estimated io). Cost was calculated based on computed cardinality and not the actual cardinality.

    3)Why Optimizer is considering ELAPSED TIME during plan generation?

    Ans: I do not understand your question.

    Regards
    Anurag Tibrewal
  • 3. Re: Function Based Index And Selectivity
    638933 Newbie
    Currently Being Moderated
    Hi Anurag,

    Thanks for the reply.Yes, after creating histrograms, now cbo is going for FTS.I got your point on computed and actual cardinality computation and how histogram can help to provide most accurate cardinality to CBO.

    My thrid question was actually: Is there any statistic that may help CBO provide elapsed time information at the time of execution plan generation.
    In first test (without Histogram), we noticed that eventhough elapsed time of FTS is lower then index access, Optimizer was going for INDEX ACCESS and it has degraded the performance.
    Like wise, any way to help cbo consider elaped time during execution plan generation?I.e.If elapsed time is going to be high with INDEX ACCESS, go for FTS.

    If the tuning goal is to improve query execution time rather throughput, such thing is must required.
  • 4. Re: Function Based Index And Selectivity
    Anurag Tibrewal Guru
    Currently Being Moderated
    Hi,

    Elapsed time is never counted for performance because it may increase/decrease for the same query same plan depending on the system resources.

    I do not think so there is any way in which optimizer would take in consideration about the elapased time because elapsed time is calculated only after the actual execution of the query.

    Regards
    Anurag Tibrewal
  • 5. Re: Function Based Index And Selectivity
    Chris Antognini Explorer
    Currently Being Moderated
    Hi
    1.why oracle optimizer is going for FBI with high cardinality of 100000 rows?
    With that way of gathering object statistics you are hiding information to the query optimizer. Hence, it produces a sub-optimal plan. Remember: garbage in; garbage out!
    2.Why cost of FTS is high (371) as compare to FBI (342) eventhough FTS is having fewer IO (68372) + Less Elapsed Time?
    While a FTS performs multiblock I/O, an index range scan performs singleblock I/O. Since the cost of performing one multiblock I/O is different from the cost of performing one singleblock I/O, you cannot simply compare the number of LIO.
    3.Why Optimizer is considering ELAPSED TIME during plan generation?
    I guess there is a typo in that question. In fact, as you probably know, the query optimizer doesn't use the elapsed time...


    HTH
    Chris Antognini

    Troubleshooting Oracle Performance, Apress 2008
    http://top.antognini.ch
  • 6. Re: Function Based Index And Selectivity
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    user635930 wrote:
    Hi All,

    I have some doubts w.r.t FBI.I am on 10gR2 (10.2.0.4) with Solaris 5.9
    Execution Plan
    ----------------------------------------------------------

    ---------------------------------------------------------------------------------
    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
    ---------------------------------------------------------------------------------
    | 0 | SELECT STATEMENT | | 10000 | 50000 | 342 (1)|
    | 1 | TABLE ACCESS BY INDEX ROWID| FBI_TEST | 10000 | 50000 | 342 (1)|
    |* 2 | INDEX RANGE SCAN | FBI_TEST_FBI | 4000 | | 1958 (1)|
    ---------------------------------------------------------------------------------
    You're seeing three different effects here.

    First - for the table access by index, Oracle has "lost" the cost of the index range scan - notice that the total cost of the query is 342, but the cost of the index access is 1958. The total cost of the query should be 2,300 and Oracle should have chosen the full tablescan automatically.

    Second, the stats on the index show just one distinct value for "distinct_keys", and the optimizer has decided (for no reason I can think of - it may be a bug) to assume a 0.4% selectivity on the index.

    Third, the estimated cardinality of the table and index lines differs. Whatever Oracle has done in the index line has been forgotten, and the cardinality of the table line has been based on the predicate given by your case statement and, as a "complex function", that predicate has been given a selectivity of 1% - hence the 10,000 rows estimate.

    The combination of unsuitable statistics, an extreme case, and a couple of quirks in the optimizer mean that the chosen path is clearly unsuitable.

    [Addendum]: It just occurred to me that part of the problem is that you collected stats on the table before you created the index. Given you're running 10g, the 'create index' would automatically generate index stats at the same time - but since it's a function-based index, there's a "virtual column" created for the table as well, and that column won't have any statistics on it - which is why you get the "fixed percentage" selectivities.


    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk

    "Science is more than a body of knowledge; it is a way of thinking" Carl Sagan

    Edited by: Jonathan Lewis on Nov 29, 2008 1:15 PM
  • 7. Re: Function Based Index And Selectivity
    638933 Newbie
    Currently Being Moderated
    Great explanation Anurag.

    Now finally,what approach you would suggest to tune queries based on flag columns having only two (Y|N) distinct values?

    A.Histogram creation
    B.Function based index

    If A, what should be the bucket size?

    Thanks for your prompt replies.
  • 8. Re: Function Based Index And Selectivity
    311441 Employee ACE
    Currently Being Moderated
    This is a very common mistake when dealing with function-based indexes.

    What you have forgotten to consider is exactly how does Oracle accurately determine the cardinality of a query that is based on a virtual column derived from a function-based index?

    If you look at dba_tab_cols, you'll notice Oracle has created a virtual column based on your index:
    SQL> select column_name, num_distinct, density from dba_tab_cols where table_name = 'FBI_TEST';
    
    COLUMN_NAME                    NUM_DISTINCT    DENSITY
    ------------------------------ ------------ ----------
    SYS_NC00003$
    FLAG                                      2         .5
    ID                                  1000000    .000001
    You'll also notice there are no statistics on this column as you only collect table stats before* you created the function-based index.

    If you collect table stats after* you created the index, you see that Oracle now has accurate statistics on the virtual columns and hence can accurately determine the cardinality when using the function-based index:
    SQL> select column_name, num_distinct, density from dba_tab_cols where table_name = 'FBI_TEST';
    
    COLUMN_NAME                    NUM_DISTINCT    DENSITY
    ------------------------------ ------------ ----------
    SYS_NC00003$                              1          1
    FLAG                                      2         .5
    ID                                  1000000    .000001
    The execution plan will then correctly perform the full table scan as it can now accurately determine the correct cardinality (1000000 and not 10000).

    No stats, bad and generally incorrect assumptions. Good stats and usually the CBO does the right thing.

    Note: it's possible to collect just stats on the "hidden" columns if you wish for example using the method_opt=> 'FOR ALL HIDDEN COLUMNS' with dbms_stats.

    Cheers

    Richard Foote
    http://richardfoote.wordpress.com/
  • 9. Re: Function Based Index And Selectivity
    311441 Employee ACE
    Currently Being Moderated
    You don't need histograms, you just need the basic stats on the hidden column as the data in the hidden column is perfectly distributed as there's only the one value.

    Stats on the hidden column is your answer, not histograms.

    Cheers

    Richard Foote
    http://richardfoote.wordpress.com/
  • 10. Re: Function Based Index And Selectivity
    638933 Newbie
    Currently Being Moderated
    Hi Jonathan,Chris,Rich :

    Feeling lucky today to have all of you on this thread.I wounder THANKS would not be enough:-)

    Jonathan,

    What could be the reasons that Oracle has "lost" the cost of the index range scan?
    As you said, total cost of query should be 2,300, why the plan is not reflecting that?

    Is that because, table is analyzed first and FBI is created later?
    If i analyze table after creating FBI,(As Rich has pointed out),CBO calculates the cost perfectely and goes for FTS.

    "Second, the stats on the index show just one distinct value for "distinct_keys", and the optimizer has decided (for no reason I can think of - it may be a bug)
    to assume a 0.4% selectivity on the index."
    Can i ask you how did you get only one distinct value and 0.4% selectivity for index?

    Rich,
    Hiden columns are only w.r.t FBI? Which columns you would consider as hidden and hence must be analyzed to help CBO generate optimal plan?
    Do you advice me to go for FBI in such cases?Or accurate stats will be enough to let CBO choose better plan (FTS or INDEX SCAN) depending on data distribution dynamically?

    To be more precise,
    Can i request you to list column_names which got to have accurate stats so that CBO can pick up the best plan?
  • 11. Re: Function Based Index And Selectivity
    311441 Employee ACE
    Currently Being Moderated
    Hidden columns are those columns that don't appear when you describe the table :)

    You can determine the hidden columns via dba_tab_cols:

    SQL> select column_name, num_distinct, density, hidden_column, virtual_column from dba_tab_cols where table_name = 'FBI_TEST';
    COLUMN_NAME                    NUM_DISTINCT    DENSITY HID VIR
    ------------------------------ ------------ ---------- --- ---
    SYS_NC00003$                              1          1 YES YES
    FLAG                                      2         .5 NO  NO
    ID                                  1000000    .000001 NO  NO
    You don't recognise the SYS_NC00003$ column, right.

    11g has extended virtual columns so that they can be visible and manually added to table definitions.

    It's important to collect stats on virtual columns if you write SQL that references the columns in a predicate or join condition, etc. so that the CBO can use the stats to accurately determine the cardinality of the step in an execution plan. This is generally the case if it's based on a function-based index as the intent is generally to use the index for a predicate condition somewhere.

    Cheers

    Richard Foote
    http://richardfoote.wordpress.com/
  • 12. Re: Function Based Index And Selectivity
    638933 Newbie
    Currently Being Moderated
    Hi Rich,

    Thanks for the reply.However, can i ask you whether i should go for FBI or not when CBO is not using INDEX and going for FTS even if resultset size is low?
    i have seen couple of cases like this.we are creating histograms and stats are very much recent.
    BTW...How did you know that fbi_test table is analyzed before the fbi is created?

    Regards,
    Bhavik Desai
    INDIA
  • 13. Re: Function Based Index And Selectivity
    311441 Employee ACE
    Currently Being Moderated
    Hi Bhavik

    The CBO will go for the FBI "rightly" if it has less "cost" than other possible alternatives. For the CBO to calculate correct costings for the FBI and make reasonable comparisons, it needs accurate (enough) statistics on the associated virtual column (before 11g).

    I knew the table was analyzed before you created the index because:

    1) The execution plan had the incorrect cardinality estimates (10000 rather than 1000000), and

    2) Because you told me so with your initial demo ;)

    Cheers

    Richard Foote
    http://richardfoote.wordpress.com/
  • 14. Re: Function Based Index And Selectivity
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    user635930 wrote:
    Jonathan,

    What could be the reasons that Oracle has "lost" the cost of the index range scan?
    As you said, total cost of query should be 2,300, why the plan is not reflecting that?
    The results are not self-consistent, so we have to suspect a bug; and if you're prepared to assume that it's a bug there is no sensible answer to "why".

    >
    Is that because, table is analyzed first and FBI is created later?
    It looks like that the bug is one of those where a certain step in the code has been skipped, and this could be because the absence of stats on the hidden column has made the optimizer jump to a "missing stats" routine that doesn't continue correctly (and the fact that the column is a hidden column may be related to the fact that the error appears).
    If i analyze table after creating FBI,(As Rich has pointed out),CBO calculates the cost perfectely and goes for FTS.

    "Second, the stats on the index show just one distinct value for "distinct_keys", and the optimizer has decided (for no reason I can think of - it may be a bug)
    to assume a 0.4% selectivity on the index."
    Can i ask you how did you get only one distinct value and 0.4% selectivity for index?
    One distrinct value because that's what your code generated, and I don't think Oracle can ever overestimate the number of distinct values. (Though I haven't checked that on 11g).

    The fact that your cardinlaities were suspiciously rounded numbers was a hint to take your test case and run it a couple of times with different numbers of input rows - and the estimates came out at 40% every time. (The 10,000 I could already explain from Oracle's need to assign a selectivity to the CASE predicate).

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk

    "The temptation to form premature theories upon insufficient data is the bane of our profession."
    Sherlock Holmes (Sir Arthur Conan Doyle) in "The Valley of Fear".

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points