Forum Stats

  • 3,853,693 Users
  • 2,264,256 Discussions
  • 7,905,433 Comments

Discussions

Wrong Cardinality Estimation

User_OCZ1T
User_OCZ1T Member Posts: 1,923 Gold Badge
edited May 18, 2014 5:52AM in General Database Discussions

Hi , I am using Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production version of oracle. Below are the parameters, having default values.

optimizer_index_cost_adj   100
optimizer_index_caching     0

I am having a simple query, similar as below


select  c1, c2,c3 from tab1
where c1 = 'abc'; its using below plan ---------------------------------------------------------------------------------------
| Id  | Operation         | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |      1 |        |      0 |00:00:00.27 |   26816 |
|*  1 |  TABLE ACCESS FULL| TAB1    |      1 |   1171K|      0 |00:00:00.27 |   26816 |
--------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    1 - filter("TAB1"."c1"='abc') now my concern is , even if the rows it returns is 0, how the expected cardinality calculated to be 1171K. This perhaps tilting the optimizer decision to go for FTS of table, even if ther exist an index id1 on TAB1(C1) and resulting less record sometimes.Now below are the stats for the table and column c1 select num_rows from dba_tables where table_name='TAB1' NUM_ROWS
-------
1171095
select density,num_distinct  from dba_tab_col_statistics where table_name='TAB1' and column_name='C1' density                 num_distinct
--------               ---------
4.26950845149198E-7     1171095
As ther existing frequency histogram on this column, so the expected cardinality estimation should be density*num_distinct = .5. select endpoint_number, endpoint_value from dba_tab_histograms where table_name='TAB1' and column_name='C1'; ENDPOINT_NUMBER    ENDPOINT_VALUE
--------------     ----------------
234219             3.80421485912222E35
when i force the index , the plan becaomes as below
Elapsed: 00:00:01.50 Execution Plan
----------------------------------------------------------
Plan hash value: 1013434088 ---------------------------------------------------------------------------------------------
| Id  | Operation                   | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |               |  1171K|    93M|   870K  (1)| 01:13:50 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TAB1          |  1171K|    93M|   870K  (1)| 01:13:50 |
|*  2 |   INDEX RANGE SCAN          | ID1           |  1171K|       | 21124   (1)| 00:01:48 |
--------------------------------------------------------------------------------------------- Predicate Information (identified by operation id):
---------------------------------------------------    2 - access("TAB1"."C1"='abc')

Best Answer

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,061 Blue Diamond
    Answer ✓

    You've got a frequency histogram which has captured one value - there's a known bug for that case (though I can't quote numbers - check Randolf Geist's blog).  I'm a little surprised that it's still around in 11.2.0.3, but there were a few edges cases, perhaps not all fixed.  I note that the histogram value you captured starts: 'ID:414" - given that you used a 20% sample I wonder if the column has a million distinct values that all start with the same 15 characters (select min(), max() from table to satisfy my curiosity) - Oracle gets a little lost with the statistics for character strings after (roughly) the first 6.

    Regards

    Jonathan Lewis

Answers

  • 1 What is the size of the table?

    2 What is the value of db_file_multiblock_read_count

    3 Did you or Oracle (Oracle will do this automatically from 10g and higher) system stats?

    If you didn't, you are aware the defaults of these two parameters favor full table scans?

    Apparently a full table scan will consume the least amount of IO (which you don't show).

    Please be aware that the assertion ' a full table scan is always evil'  has to be labeled as a myth, quite often full table scans are much more efficient.

    BTW: there is nothing wrong with the cardinality estimation, what you see there is the number of rows which satisfies your access path, full table scan.

    ---------

    Sybrand Bakker

    Senior Oracle DBA

  • User_OCZ1T
    User_OCZ1T Member Posts: 1,923 Gold Badge
    edited May 17, 2014 7:23AM

    Thanks.

    Table size - 211 Mb

    db_file_multiblock_read_count - 128

    Here i am not running behind the FTS and assuming it as culprit, i am just trying to figure out the fact on which oracle works that is the optimizer math.
    i just checking 'cost based oracle fundamentals' by jonathan lewis and checked how it calculates cardinality/cost(in case of constant value in filter) and figureout the path of execution, and was trying to match with my scenaio.

    As i already mentioned the estimated value is quite different than actual, (estimated -1171K and actual -0). In a complex query this can significantly affect the over all path of execution!

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,061 Blue Diamond
    Answer ✓

    You've got a frequency histogram which has captured one value - there's a known bug for that case (though I can't quote numbers - check Randolf Geist's blog).  I'm a little surprised that it's still around in 11.2.0.3, but there were a few edges cases, perhaps not all fixed.  I note that the histogram value you captured starts: 'ID:414" - given that you used a 20% sample I wonder if the column has a million distinct values that all start with the same 15 characters (select min(), max() from table to satisfy my curiosity) - Oracle gets a little lost with the statistics for character strings after (roughly) the first 6.

    Regards

    Jonathan Lewis

  • User_OCZ1T
    User_OCZ1T Member Posts: 1,923 Gold Badge
    edited May 18, 2014 2:42AM

    Thank you very much.

    but if i am checking the dba_tab_col_statistics the low_value and high value are same, as below. so it means all the value what ever i will put in filter
    condition will be mostly out of range.

    HIGH_VALUE                                                                                                                                     LOW_VALUE
    ----------------------------------------------------------------                                                                     ----------------------------------------------------------------
    49443A3431346435313230346534313522222222222222432303230323032                           49443A3431346435313230346534313522222222222222432303230323032


    few questions

    1)Regarding getting fix for this issue, i am thinking like ,as this columns is having mostly distinct values only, if i can drop the histogram from this column
    and keep it as it is. Then will this be a good solution?

    2)Is there a patch available for this bug (if you can mention the bug no, it will be really great)and i can ask the techops guys to work
    with oracle support and get it done?

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,061 Blue Diamond
    edited May 17, 2014 2:28PM

    Your values are all identical over the first 35 characters - the optimiser doesn't look past 32 when collecting histograms (until 12c)Frequency Histogram 5 | Oracle Scratchpad, so even the"endpoint_actual_value" would be the same for all of them.  It's possible that the column doesn't even had a histogram - check user_tab_columns to see if you have histogram = FREQUENCY or histogram = NONE. If it does have a histogram get rid of the histogram.

    I said I didn't have the bug number, but if you  get your techops to raise an SR Oracle support should be able to find any relevant bugs and patches very easily

    Regards

    Jonathan Lewis

  • User_XFPHV
    User_XFPHV Member Posts: 77 Employee

    In 12c histograms use the first 64 bytes, though this change is not backportable.  Though not a perfect fixed, you may want to look into applying the patch for the following bug which adds a cardinality sanity check so the cardinality cannot be greater than num_rows - (ndv -1) when using an equality operator and histograms are used.
    Bug 18377553 : POOR CARDINALITY ESTIMATE WITH HISTOGRAMS AND VALUES > 32 BYTES

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,061 Blue Diamond

    Excellent news - and an elegant workaround.

    Thanks for posting.

    Regards

    Jonathan Lewis

This discussion has been closed.