Forum Stats

  • 3,741,232 Users
  • 2,248,397 Discussions
  • 7,861,693 Comments

Discussions

Query plan randomly changes from day to day resulting in poor performance/cardinality

3041124
3041124 Member Posts: 1
edited Oct 5, 2015 6:01PM in SQL & PL/SQL

We're experiencing an issue where it seems that the query plan changes from day to day for a particular procedure that runs once a night. I believe I've pinpointed the query that's causing the problem below. It's resulting in a performance variance of 10 second completion time vs 20 minutes (nothing in between). It started occurring about 2 months ago and now it's becoming more prevalent where the bad query plan is coming up more often.We didn't introduce any code changes and nothing looks fishy with the data.

I took two database saveoffs and noticed that the query plans vary for a simple query. We do run gather statistics every night. (DBMS_STATS.GATHER_SCHEMA_STATS (ownname=>sys_context( 'userenv', 'current_schema' ),  estimate_percent => 1);)

In my testing, I also ran statistics every single time before doing the query plan and the query plans remain consistent for each database (they differ from each other).

Any idea what the problem is for me?

"Bad Database"

1a) Record count from query

select count(*) from cs_bucket_member_v2 where bucket_type='P' and sec_id > 0 and order_id=0

-- query returns 1405695

1b) Query plan in DB that runs too long. Notice row count is smaller than expected.(1148 rows in the plan whereas the query itself returns 1.4 million records)

badplan.png

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------

Plan hash value: 2027833486

----------------------------------------------------------------------------------------------------

| Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT             |                     |     1 |    12 |   155   (0)| 00:00:02 |

|   1 |  SORT AGGREGATE              |                     |     1 |    12 |            |          |

|*  2 |   TABLE ACCESS BY INDEX ROWID| CS_BUCKET_MEMBER_V2 |  1148 | 13776 |   155   (0)| 00:00:02 |

|*  3 |    INDEX RANGE SCAN          | CS_BUCKET_MEMBER_N1 |  1272 |       |     3   (0)| 00:00:01 |

----------------------------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT

----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - filter("BUCKET_TYPE"='P' AND "SEC_ID">0)

   3 - access("ORDER_ID"=0)

"Good database"

2a)  Record count from query

select count(*) from cs_bucket_member_v2 where bucket_type='P' and sec_id > 0 and order_id=0

-- query returns 1614191

2b) Query plan in DB that runs efficiently and quickly. Good plan. Notice row count is about correct (1.5 million in plan vs 1.6 million from query result)

goodplan.png

SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------

Plan hash value: 3596429027

------------------------------------------------------------------------------------------

| Id  | Operation          | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |                     |     1 |    12 | 11215   (2)| 00:01:41 |

|   1 |  SORT AGGREGATE    |                     |     1 |    12 |            |          |

|*  2 |   TABLE ACCESS FULL| CS_BUCKET_MEMBER_V2 |  1522K|    17M| 11215   (2)| 00:01:41 |

------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------

---------------------------------------------------

   2 - filter("ORDER_ID"=0 AND "SEC_ID">0 AND "BUCKET_TYPE"='P')

14 rows selected.

Martin PreissCoolblessed DBA

Answers

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Sep 30, 2015 3:58AM

    What is the reason for gathering stats on every table every night? Using estimate_percent of 1 could easily explain differing cardinalities and different plans.

    If you have decided that the full table scan is a good plan then either hint for it or add a sql plan baseline (preferred)

    blessed DBA
  • John_K
    John_K Member Posts: 2,498 Gold Trophy
    edited Sep 30, 2015 4:09AM

    As above, 1% is quite a low estimate percent.

    It's also possible there is a data skew or even a dependency between the cols bucket_type sec_id and order_id. How many rows do you have with bucket type of 'P' as opposed to other values? Is there a histogram on the bucket_type column?

  • Martin Preiss
    Martin Preiss Member Posts: 2,381 Gold Trophy
    edited Sep 30, 2015 6:19AM

    how many different values are there for the order_id in CS_BUCKET_MEMBER_V2 and how many rows does the table contain? THe optimizer exects the index access on order_id to return just 1272 rows. If there are no histograms on the column the optimizer expects an even distribution of values and if the value 0 occurs much more frequently than other values (=> data skew) then this could lead to a massive error in the calculation. If there are histograms the calculation uses a different approach - but the result could still be misleading if the histogram is not representative - and as others already said: with a 1% sample it is possible that you miss special patterns in the data (and I also agree that the daily statistics gathering with a small percentage is probably the reason for the unstable plans and the different performance).

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,679 Gold Crown
    edited Oct 5, 2015 5:51PM

    Which version of Oracle ?

    How many rows in the table ?

    As per Martin Preiss comments, the problem looks like it's related to histograms.

    I'd guess that it's a very large table and 0 is a special value with far more rows than the typical value; but possibly there's a degree of clustering of 0 so that on a small sample Oracle doesn't see that the skew.

    On good days the 1% sample picks up the skew, Oracle creates a histogram and you get the tablescan; on bad days your sample doesn't pick up the skew and 0 is treated like any other value.

    If you're on 11g then you should probably be using auto_sample_size anyway, but making the change takes a bit of effort because you really need to determine first of all which columns need histograms and the best moment at which to gather (or programmatically create) them.

    quick and dirty sql to list the histograms for the table:#

    set pagesize 60

    set linesize 180

    set trimspool on

    select

            column_name,

            endpoint_Number, endpoint_value

    from

            user_tab_histograms

    where

            table_name = upper('&m_table')

    order by

            column_name, endpoint_number

    ;

    Regards

    Jonathan Lewis

    PS The significance of the "becoming more prevalent" is that if order_id = 0 relates to (e.g.) new data that then gets a change of order_id then we might guess that (a) the volume for order_id = 0 is fairly constant as the size of the table grows - i.e. it's a smaller percentage of the total, and (b) the volume is largely clustered (around the end of the table) rather than being spread evenly throughout the table.  As an example I've got a table with 1M rows where the last 50000 hold the value 1000 and the rest have 10 rows per value - at a 1% sample (10,000 rows) Oracle doesn't spot the skew.  Auto_sample_size spots the skew then builds the histogram - strangely on a sample of only 5,500.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,679 Gold Crown
    edited Sep 30, 2015 1:28PM

    I've modelled one reason why this might happen on my blog: https://jonathanlewis.wordpress.com/2015/09/30/estimate_percent/

    Regards

    Jonathan Lewis

    Martin Preiss
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,679 Gold Crown
    edited Oct 5, 2015 6:01PM

    A member of Oracle support has taken the example from my blog and raised bug 21947352.

    It turns out that you don't even have to have a peculiar pattern like I had of putting all the rows with the special value at the end of the table - he modified my example to show the same effect even when the special value was scattered evenly across the entire table.

    Regards

    Jonathan Lewis

    CoolMartin Preiss
This discussion has been closed.