6 Replies Latest reply on Oct 5, 2015 10:01 PM by Jonathan Lewis

    Query plan randomly changes from day to day resulting in poor performance/cardinality

    3041124

      We're experiencing an issue where it seems that the query plan changes from day to day for a particular procedure that runs once a night. I believe I've pinpointed the query that's causing the problem below. It's resulting in a performance variance of 10 second completion time vs 20 minutes (nothing in between). It started occurring about 2 months ago and now it's becoming more prevalent where the bad query plan is coming up more often.We didn't introduce any code changes and nothing looks fishy with the data.

       

      I took two database saveoffs and noticed that the query plans vary for a simple query. We do run gather statistics every night. (DBMS_STATS.GATHER_SCHEMA_STATS (ownname=>sys_context( 'userenv', 'current_schema' ),  estimate_percent => 1);)

       

      In my testing, I also ran statistics every single time before doing the query plan and the query plans remain consistent for each database (they differ from each other).

       

      Any idea what the problem is for me?

       

      "Bad Database"

      1a) Record count from query

      select count(*) from cs_bucket_member_v2 where bucket_type='P' and sec_id > 0 and order_id=0

      -- query returns 1405695

       

      1b) Query plan in DB that runs too long. Notice row count is smaller than expected.(1148 rows in the plan whereas the query itself returns 1.4 million records)

      badplan.png

      PLAN_TABLE_OUTPUT

      ----------------------------------------------------------------------------------------------------

      Plan hash value: 2027833486

       

       

      ----------------------------------------------------------------------------------------------------

      | Id  | Operation                    | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

      ----------------------------------------------------------------------------------------------------

      |   0 | SELECT STATEMENT             |                     |     1 |    12 |   155   (0)| 00:00:02 |

      |   1 |  SORT AGGREGATE              |                     |     1 |    12 |            |          |

      |*  2 |   TABLE ACCESS BY INDEX ROWID| CS_BUCKET_MEMBER_V2 |  1148 | 13776 |   155   (0)| 00:00:02 |

      |*  3 |    INDEX RANGE SCAN          | CS_BUCKET_MEMBER_N1 |  1272 |       |     3   (0)| 00:00:01 |

      ----------------------------------------------------------------------------------------------------

       

       

       

       

      PLAN_TABLE_OUTPUT

      ----------------------------------------------------------------------------------------------------

      Predicate Information (identified by operation id):

      ---------------------------------------------------

       

       

         2 - filter("BUCKET_TYPE"='P' AND "SEC_ID">0)

         3 - access("ORDER_ID"=0)

       

      "Good database"

      2a)  Record count from query

      select count(*) from cs_bucket_member_v2 where bucket_type='P' and sec_id > 0 and order_id=0

      -- query returns 1614191

       

       

      2b) Query plan in DB that runs efficiently and quickly. Good plan. Notice row count is about correct (1.5 million in plan vs 1.6 million from query result)

      goodplan.png

      SQL> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

       

       

      PLAN_TABLE_OUTPUT

      -------------------------------------------------------------------------------------------

      Plan hash value: 3596429027

       

       

      ------------------------------------------------------------------------------------------

      | Id  | Operation          | Name                | Rows  | Bytes | Cost (%CPU)| Time     |

      ------------------------------------------------------------------------------------------

      |   0 | SELECT STATEMENT   |                     |     1 |    12 | 11215   (2)| 00:01:41 |

      |   1 |  SORT AGGREGATE    |                     |     1 |    12 |            |          |

      |*  2 |   TABLE ACCESS FULL| CS_BUCKET_MEMBER_V2 |  1522K|    17M| 11215   (2)| 00:01:41 |

      ------------------------------------------------------------------------------------------

       

       

      Predicate Information (identified by operation id):

       

       

      PLAN_TABLE_OUTPUT

      -------------------------------------------------------------------------------------------

      ---------------------------------------------------

       

       

         2 - filter("ORDER_ID"=0 AND "SEC_ID">0 AND "BUCKET_TYPE"='P')

       

       

      14 rows selected.

        • 1. Re: Query plan randomly changes from day to day resulting in poor performance/cardinality
          Andrew Sayer

          What is the reason for gathering stats on every table every night? Using estimate_percent of 1 could easily explain differing cardinalities and different plans.

          If you have decided that the full table scan is a good plan then either hint for it or add a sql plan baseline (preferred)

          • 2. Re: Query plan randomly changes from day to day resulting in poor performance/cardinality
            ORA-01033

            As above, 1% is quite a low estimate percent.

             

            It's also possible there is a data skew or even a dependency between the cols bucket_type sec_id and order_id. How many rows do you have with bucket type of 'P' as opposed to other values? Is there a histogram on the bucket_type column?

            • 3. Re: Query plan randomly changes from day to day resulting in poor performance/cardinality
              Martin Preiss

              how many different values are there for the order_id in CS_BUCKET_MEMBER_V2 and how many rows does the table contain? THe optimizer exects the index access on order_id to return just 1272 rows. If there are no histograms on the column the optimizer expects an even distribution of values and if the value 0 occurs much more frequently than other values (=> data skew) then this could lead to a massive error in the calculation. If there are histograms the calculation uses a different approach - but the result could still be misleading if the histogram is not representative - and as others already said: with a 1% sample it is possible that you miss special patterns in the data (and I also agree that the daily statistics gathering with a small percentage is probably the reason for the unstable plans and the different performance).

              • 4. Re: Query plan randomly changes from day to day resulting in poor performance/cardinality
                Jonathan Lewis

                Which version of Oracle ?

                How many rows in the table ?

                 

                As per Martin Preiss comments, the problem looks like it's related to histograms.

                 

                I'd guess that it's a very large table and 0 is a special value with far more rows than the typical value; but possibly there's a degree of clustering of 0 so that on a small sample Oracle doesn't see that the skew.

                On good days the 1% sample picks up the skew, Oracle creates a histogram and you get the tablescan; on bad days your sample doesn't pick up the skew and 0 is treated like any other value.

                 

                If you're on 11g then you should probably be using auto_sample_size anyway, but making the change takes a bit of effort because you really need to determine first of all which columns need histograms and the best moment at which to gather (or programmatically create) them.

                 

                quick and dirty sql to list the histograms for the table:#

                 

                set pagesize 60

                set linesize 180

                set trimspool on

                 

                select

                        column_name,

                        endpoint_Number, endpoint_value

                from

                        user_tab_histograms

                where

                        table_name = upper('&m_table')

                order by

                        column_name, endpoint_number

                ;

                 

                 

                 

                Regards

                Jonathan Lewis

                 

                 

                PS The significance of the "becoming more prevalent" is that if order_id = 0 relates to (e.g.) new data that then gets a change of order_id then we might guess that (a) the volume for order_id = 0 is fairly constant as the size of the table grows - i.e. it's a smaller percentage of the total, and (b) the volume is largely clustered (around the end of the table) rather than being spread evenly throughout the table.  As an example I've got a table with 1M rows where the last 50000 hold the value 1000 and the rest have 10 rows per value - at a 1% sample (10,000 rows) Oracle doesn't spot the skew.  Auto_sample_size spots the skew then builds the histogram - strangely on a sample of only 5,500.

                • 6. Re: Query plan randomly changes from day to day resulting in poor performance/cardinality
                  Jonathan Lewis

                  A member of Oracle support has taken the example from my blog and raised bug 21947352.

                   

                  It turns out that you don't even have to have a peculiar pattern like I had of putting all the rows with the special value at the end of the table - he modified my example to show the same effect even when the special value was scattered evenly across the entire table.

                   

                  Regards

                  Jonathan Lewis