14 Replies Latest reply on Dec 1, 2008 3:40 PM by Jonathan Lewis

    Function Based Index And Selectivity

    638933
      Hi All,

      I have some doubts w.r.t FBI.I am on 10gR2 (10.2.0.4) with Solaris 5.9
      I was under impression that FBI does not provide guaranteed index access and CBO choose access pattern purely on basis of available stats and query selectivity.
      However, many a times i found that CBO is going for FBI even in case when FTS provides better query elapsed time.

      I created following test case:

      create table fbi_test (id number,flag varchar2(1));

      begin
      for i in 1..1000000
      loop
      insert into fbi_test values(i,'Y');
      end loop;
      end;
      /
      commit;
      begin
      for i in 1..10
      loop
      insert into fbi_test values(i,'N');
      end loop;
      end;
      /
      commit;
      ANALYZE TABLE FBI_TEST COMPUTE STATISTICS;
      CREATE INDEX fbi_test_FBI
      ON fbi_test (CASE WHEN flag = 'Y' THEN 1 ELSE NULL END);

      Autotrace for FBI ACCESS

      SQL> select *from fbi_test where (CASE WHEN flag = 'Y' THEN 1 ELSE NULL END)=1;

      1000000 rows selected.

      Elapsed: 00:00:18.43

      Execution Plan
      ----------------------------------------------------------

      ---------------------------------------------------------------------------------
      | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
      ---------------------------------------------------------------------------------
      | 0 | SELECT STATEMENT | | 10000 | 50000 | 342 (1)|
      | 1 | TABLE ACCESS BY INDEX ROWID| FBI_TEST | 10000 | 50000 | 342 (1)|
      |* 2 | INDEX RANGE SCAN | FBI_TEST_FBI | 4000 | | 1958 (1)|
      ---------------------------------------------------------------------------------
      Statistics
      ----------------------------------------------------------
      0 recursive calls
      0 db block gets
      136812 consistent gets
      0 physical reads
      0 redo size
      22180292 bytes sent via SQL*Net to client
      733814 bytes received via SQL*Net from client
      66668 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      1000000 rows processed

      Autotrace for FTS

      SQL> select *from fbi_test where flag = 'Y';

      1000000 rows selected.

      Elapsed: 00:00:16.56

      Execution Plan
      ----------------------------------------------------------

      -------------------------------------------------------------------
      | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
      -------------------------------------------------------------------
      | 0 | SELECT STATEMENT | | 500K| 2441K| 371 (9)|
      |* 1 | TABLE ACCESS FULL| FBI_TEST | 500K| 2441K| 371 (9)|
      -------------------------------------------------------------------
      Statistics
      ----------------------------------------------------------
      0 recursive calls
      0 db block gets
      68372 consistent gets
      0 physical reads
      0 redo size
      22180292 bytes sent via SQL*Net to client
      733814 bytes received via SQL*Net from client
      66668 SQL*Net roundtrips to/from client
      0 sorts (memory)
      0 sorts (disk)
      1000000 rows processed


      FYI...
      SQL> show parameter opt

      NAME TYPE VALUE
      ------------------------------------ ----------- ------------------------------
      filesystemio_options string asynch
      object_cache_optimal_size integer 102400
      optimizer_dynamic_sampling integer 2
      optimizer_features_enable string 10.2.0.3
      optimizer_index_caching integer 0
      optimizer_index_cost_adj integer 100
      optimizer_mode string ALL_ROWS
      optimizer_secure_view_merging boolean TRUE
      plsql_optimize_level integer 2

      My questions are,
      1.why oracle optimizer is going for FBI with high cardinality of 100000 rows?
      2.Why cost of FTS is high (371) as compare to FBI (342) eventhough FTS is having fewer IO (68372) + Less Elapsed Time?
      3.Why Optimizer is considering ELAPSED TIME during plan generation?

      Any inpute would be highly appreciated.
        • 1. Re: Function Based Index And Selectivity
          Satish Kandi
          Could you repeat your test case after gathering statistics with DBMS_STATS instead of ANALYZE TABLE?
          • 2. Re: Function Based Index And Selectivity
            Anurag Tibrewal
            Hi,

            Answers are

            1) why oracle optimizer is going for FBI with high cardinality of 100000 rows?

            Ans: Cardinality of 100000 is actual cardinality that you got after execution of the query not the computed cardinality which oracle does while preparation of execution plan. Here selectivity seems to be 0.5, so computed cardinality is somewhat near 50000 so might be oracle would find it easy to go for index scan. Going for FTS or index scan would depends on the optimizer mode and some of the parameters. Somewhat actual cardinality can only be estimated if you have histogram generated too apart from statistics.

            2).Why cost of FTS is high (371) as compare to FBI (342) eventhough FTS is having fewer IO (68372) + Less Elapsed Time?

            Ans: Since oracle calculated that the cost of FBI is less thats why it went for FBI. Note: There could be a lot of difference between the actual cost (actual io) and estimated cost(estimated io). Cost was calculated based on computed cardinality and not the actual cardinality.

            3)Why Optimizer is considering ELAPSED TIME during plan generation?

            Ans: I do not understand your question.

            Regards
            Anurag Tibrewal
            • 3. Re: Function Based Index And Selectivity
              638933
              Hi Anurag,

              Thanks for the reply.Yes, after creating histrograms, now cbo is going for FTS.I got your point on computed and actual cardinality computation and how histogram can help to provide most accurate cardinality to CBO.

              My thrid question was actually: Is there any statistic that may help CBO provide elapsed time information at the time of execution plan generation.
              In first test (without Histogram), we noticed that eventhough elapsed time of FTS is lower then index access, Optimizer was going for INDEX ACCESS and it has degraded the performance.
              Like wise, any way to help cbo consider elaped time during execution plan generation?I.e.If elapsed time is going to be high with INDEX ACCESS, go for FTS.

              If the tuning goal is to improve query execution time rather throughput, such thing is must required.
              • 4. Re: Function Based Index And Selectivity
                Anurag Tibrewal
                Hi,

                Elapsed time is never counted for performance because it may increase/decrease for the same query same plan depending on the system resources.

                I do not think so there is any way in which optimizer would take in consideration about the elapased time because elapsed time is calculated only after the actual execution of the query.

                Regards
                Anurag Tibrewal
                • 5. Re: Function Based Index And Selectivity
                  Chris Antognini
                  Hi
                  1.why oracle optimizer is going for FBI with high cardinality of 100000 rows?
                  With that way of gathering object statistics you are hiding information to the query optimizer. Hence, it produces a sub-optimal plan. Remember: garbage in; garbage out!
                  2.Why cost of FTS is high (371) as compare to FBI (342) eventhough FTS is having fewer IO (68372) + Less Elapsed Time?
                  While a FTS performs multiblock I/O, an index range scan performs singleblock I/O. Since the cost of performing one multiblock I/O is different from the cost of performing one singleblock I/O, you cannot simply compare the number of LIO.
                  3.Why Optimizer is considering ELAPSED TIME during plan generation?
                  I guess there is a typo in that question. In fact, as you probably know, the query optimizer doesn't use the elapsed time...


                  HTH
                  Chris Antognini

                  Troubleshooting Oracle Performance, Apress 2008
                  http://top.antognini.ch
                  • 6. Re: Function Based Index And Selectivity
                    Jonathan Lewis
                    user635930 wrote:
                    Hi All,

                    I have some doubts w.r.t FBI.I am on 10gR2 (10.2.0.4) with Solaris 5.9
                    Execution Plan
                    ----------------------------------------------------------

                    ---------------------------------------------------------------------------------
                    | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
                    ---------------------------------------------------------------------------------
                    | 0 | SELECT STATEMENT | | 10000 | 50000 | 342 (1)|
                    | 1 | TABLE ACCESS BY INDEX ROWID| FBI_TEST | 10000 | 50000 | 342 (1)|
                    |* 2 | INDEX RANGE SCAN | FBI_TEST_FBI | 4000 | | 1958 (1)|
                    ---------------------------------------------------------------------------------
                    You're seeing three different effects here.

                    First - for the table access by index, Oracle has "lost" the cost of the index range scan - notice that the total cost of the query is 342, but the cost of the index access is 1958. The total cost of the query should be 2,300 and Oracle should have chosen the full tablescan automatically.

                    Second, the stats on the index show just one distinct value for "distinct_keys", and the optimizer has decided (for no reason I can think of - it may be a bug) to assume a 0.4% selectivity on the index.

                    Third, the estimated cardinality of the table and index lines differs. Whatever Oracle has done in the index line has been forgotten, and the cardinality of the table line has been based on the predicate given by your case statement and, as a "complex function", that predicate has been given a selectivity of 1% - hence the 10,000 rows estimate.

                    The combination of unsuitable statistics, an extreme case, and a couple of quirks in the optimizer mean that the chosen path is clearly unsuitable.

                    [Addendum]: It just occurred to me that part of the problem is that you collected stats on the table before you created the index. Given you're running 10g, the 'create index' would automatically generate index stats at the same time - but since it's a function-based index, there's a "virtual column" created for the table as well, and that column won't have any statistics on it - which is why you get the "fixed percentage" selectivities.


                    Regards
                    Jonathan Lewis
                    http://jonathanlewis.wordpress.com
                    http://www.jlcomp.demon.co.uk

                    "Science is more than a body of knowledge; it is a way of thinking" Carl Sagan

                    Edited by: Jonathan Lewis on Nov 29, 2008 1:15 PM
                    • 7. Re: Function Based Index And Selectivity
                      638933
                      Great explanation Anurag.

                      Now finally,what approach you would suggest to tune queries based on flag columns having only two (Y|N) distinct values?

                      A.Histogram creation
                      B.Function based index

                      If A, what should be the bucket size?

                      Thanks for your prompt replies.
                      • 8. Re: Function Based Index And Selectivity
                        Richard Foote
                        This is a very common mistake when dealing with function-based indexes.

                        What you have forgotten to consider is exactly how does Oracle accurately determine the cardinality of a query that is based on a virtual column derived from a function-based index?

                        If you look at dba_tab_cols, you'll notice Oracle has created a virtual column based on your index:
                        SQL> select column_name, num_distinct, density from dba_tab_cols where table_name = 'FBI_TEST';
                        
                        COLUMN_NAME                    NUM_DISTINCT    DENSITY
                        ------------------------------ ------------ ----------
                        SYS_NC00003$
                        FLAG                                      2         .5
                        ID                                  1000000    .000001
                        You'll also notice there are no statistics on this column as you only collect table stats before* you created the function-based index.

                        If you collect table stats after* you created the index, you see that Oracle now has accurate statistics on the virtual columns and hence can accurately determine the cardinality when using the function-based index:
                        SQL> select column_name, num_distinct, density from dba_tab_cols where table_name = 'FBI_TEST';
                        
                        COLUMN_NAME                    NUM_DISTINCT    DENSITY
                        ------------------------------ ------------ ----------
                        SYS_NC00003$                              1          1
                        FLAG                                      2         .5
                        ID                                  1000000    .000001
                        The execution plan will then correctly perform the full table scan as it can now accurately determine the correct cardinality (1000000 and not 10000).

                        No stats, bad and generally incorrect assumptions. Good stats and usually the CBO does the right thing.

                        Note: it's possible to collect just stats on the "hidden" columns if you wish for example using the method_opt=> 'FOR ALL HIDDEN COLUMNS' with dbms_stats.

                        Cheers

                        Richard Foote
                        http://richardfoote.wordpress.com/
                        • 9. Re: Function Based Index And Selectivity
                          Richard Foote
                          You don't need histograms, you just need the basic stats on the hidden column as the data in the hidden column is perfectly distributed as there's only the one value.

                          Stats on the hidden column is your answer, not histograms.

                          Cheers

                          Richard Foote
                          http://richardfoote.wordpress.com/
                          • 10. Re: Function Based Index And Selectivity
                            638933
                            Hi Jonathan,Chris,Rich :

                            Feeling lucky today to have all of you on this thread.I wounder THANKS would not be enough:-)

                            Jonathan,

                            What could be the reasons that Oracle has "lost" the cost of the index range scan?
                            As you said, total cost of query should be 2,300, why the plan is not reflecting that?

                            Is that because, table is analyzed first and FBI is created later?
                            If i analyze table after creating FBI,(As Rich has pointed out),CBO calculates the cost perfectely and goes for FTS.

                            "Second, the stats on the index show just one distinct value for "distinct_keys", and the optimizer has decided (for no reason I can think of - it may be a bug)
                            to assume a 0.4% selectivity on the index."
                            Can i ask you how did you get only one distinct value and 0.4% selectivity for index?

                            Rich,
                            Hiden columns are only w.r.t FBI? Which columns you would consider as hidden and hence must be analyzed to help CBO generate optimal plan?
                            Do you advice me to go for FBI in such cases?Or accurate stats will be enough to let CBO choose better plan (FTS or INDEX SCAN) depending on data distribution dynamically?

                            To be more precise,
                            Can i request you to list column_names which got to have accurate stats so that CBO can pick up the best plan?
                            • 11. Re: Function Based Index And Selectivity
                              Richard Foote
                              Hidden columns are those columns that don't appear when you describe the table :)

                              You can determine the hidden columns via dba_tab_cols:

                              SQL> select column_name, num_distinct, density, hidden_column, virtual_column from dba_tab_cols where table_name = 'FBI_TEST';
                              COLUMN_NAME                    NUM_DISTINCT    DENSITY HID VIR
                              ------------------------------ ------------ ---------- --- ---
                              SYS_NC00003$                              1          1 YES YES
                              FLAG                                      2         .5 NO  NO
                              ID                                  1000000    .000001 NO  NO
                              You don't recognise the SYS_NC00003$ column, right.

                              11g has extended virtual columns so that they can be visible and manually added to table definitions.

                              It's important to collect stats on virtual columns if you write SQL that references the columns in a predicate or join condition, etc. so that the CBO can use the stats to accurately determine the cardinality of the step in an execution plan. This is generally the case if it's based on a function-based index as the intent is generally to use the index for a predicate condition somewhere.

                              Cheers

                              Richard Foote
                              http://richardfoote.wordpress.com/
                              • 12. Re: Function Based Index And Selectivity
                                638933
                                Hi Rich,

                                Thanks for the reply.However, can i ask you whether i should go for FBI or not when CBO is not using INDEX and going for FTS even if resultset size is low?
                                i have seen couple of cases like this.we are creating histograms and stats are very much recent.
                                BTW...How did you know that fbi_test table is analyzed before the fbi is created?

                                Regards,
                                Bhavik Desai
                                INDIA
                                • 13. Re: Function Based Index And Selectivity
                                  Richard Foote
                                  Hi Bhavik

                                  The CBO will go for the FBI "rightly" if it has less "cost" than other possible alternatives. For the CBO to calculate correct costings for the FBI and make reasonable comparisons, it needs accurate (enough) statistics on the associated virtual column (before 11g).

                                  I knew the table was analyzed before you created the index because:

                                  1) The execution plan had the incorrect cardinality estimates (10000 rather than 1000000), and

                                  2) Because you told me so with your initial demo ;)

                                  Cheers

                                  Richard Foote
                                  http://richardfoote.wordpress.com/
                                  • 14. Re: Function Based Index And Selectivity
                                    Jonathan Lewis
                                    user635930 wrote:
                                    Jonathan,

                                    What could be the reasons that Oracle has "lost" the cost of the index range scan?
                                    As you said, total cost of query should be 2,300, why the plan is not reflecting that?
                                    The results are not self-consistent, so we have to suspect a bug; and if you're prepared to assume that it's a bug there is no sensible answer to "why".

                                    >
                                    Is that because, table is analyzed first and FBI is created later?
                                    It looks like that the bug is one of those where a certain step in the code has been skipped, and this could be because the absence of stats on the hidden column has made the optimizer jump to a "missing stats" routine that doesn't continue correctly (and the fact that the column is a hidden column may be related to the fact that the error appears).
                                    If i analyze table after creating FBI,(As Rich has pointed out),CBO calculates the cost perfectely and goes for FTS.

                                    "Second, the stats on the index show just one distinct value for "distinct_keys", and the optimizer has decided (for no reason I can think of - it may be a bug)
                                    to assume a 0.4% selectivity on the index."
                                    Can i ask you how did you get only one distinct value and 0.4% selectivity for index?
                                    One distrinct value because that's what your code generated, and I don't think Oracle can ever overestimate the number of distinct values. (Though I haven't checked that on 11g).

                                    The fact that your cardinlaities were suspiciously rounded numbers was a hint to take your test case and run it a couple of times with different numbers of input rows - and the estimates came out at 40% every time. (The 10,000 I could already explain from Oracle's need to assign a selectivity to the CASE predicate).

                                    Regards
                                    Jonathan Lewis
                                    http://jonathanlewis.wordpress.com
                                    http://www.jlcomp.demon.co.uk

                                    "The temptation to form premature theories upon insufficient data is the bane of our profession."
                                    Sherlock Holmes (Sir Arthur Conan Doyle) in "The Valley of Fear".