This discussion is archived
1 2 Previous Next 16 Replies Latest reply: Jan 4, 2013 2:17 PM by 905989 RSS

execution plan and stale statistics

905989 Newbie
Currently Being Moderated
Hi Gurus,

Oracle version 10.2.0.4.

Because of issues with skewed statistics DBAs had locked stats on a table. In short the stats were left stale for couple of years

The stats were as follows:
TABLE_NAME                       NUM_ROWS     BLOCKS LAST_ANALYZED       Locked
------------------------------ ---------- ---------- ------------------- -----
TABLE                       60010840    4759487 07/11/2010 06:10:53 ALL


Actual rows
----------
 101881527
So we had state stats

Following update stats on the loaded copy of schema into test, the figures are much healthier
TABLE_NAME                       NUM_ROWS     BLOCKS LAST_ANALYZED       Locked

------------------------------ ---------- ---------- ------------------- -----
TABLE                        101608528    8457437 04/01/2013 08:30:05


Actual rows
----------
 101608528
Note that the NUM_ROWS and ACTUAL_ROWS now tally

I ran a piece of query before and after updating stats and the following plans were noted.

Before stats done
18093 rows selected.

Elapsed: 00:06:41.71

Execution Plan
----------------------------------------------------------
Plan hash value: 2046255496

-----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | E-Rows  | E-Bytes | Cost (%CPU)| E-Time   |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |      10 |  1800   |  64   (2)  | 00:00:01 |
|   1 |  MERGE JOIN CARTESIAN         |                 |      10 |  1800   |  64   (2)  | 00:00:01 |
|*  2 |   TABLE ACCESS FULL           | TABLE2           |       1 |    37   |  11   (0)  | 00:00:01 |
|   3 |   BUFFER SORT                 |                 |      10 |  1430   |  53   (2)  | 00:00:01 |
|*  4 |    TABLE ACCESS BY INDEX ROWID| TABLE        |      10 |  1430   |  53   (2)  | 00:00:01 |
|*  5 |     INDEX RANGE SCAN          | TABLE_IDX1 |      35 |         |  23   (0)  | 00:00:01 |
-----------------------------------------------------------------------------------------------------
After stats done
18093 rows selected.

Elapsed: 00:05:00.70

Execution Plan
----------------------------------------------------------
Plan hash value: 2046255496

-----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name            | E-Rows  | E-Bytes | Cost (%CPU)| E-Time   |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                 |  3894   |   699K  | 20058   (2)| 00:04:01 |
|   1 |  MERGE JOIN CARTESIAN         |                 |  3894   |   699K  | 20058   (2)| 00:04:01 |
|*  2 |   TABLE ACCESS FULL           | TABLE2           |     1   |    37   |    11   (0)| 00:00:01 |
|   3 |   BUFFER SORT                 |                 |  3850   |   552K  | 20047   (2)| 00:04:01 |
|*  4 |    TABLE ACCESS BY INDEX ROWID| TABLE        |  3850   |   552K  | 20047   (2)| 00:04:01 |
|*  5 |     INDEX RANGE SCAN          | TABLE_IDX1 | 14811   |         |  8722   (3)| 00:01:45 |
-----------------------------------------------------------------------------------------------------
Now my question is that although execution plans are the same before and after updating stats, the estimated figures are much more realistic following update stats? Additionlly is my understanding correct that Oracle expected 10 rows returned with old stats whereas after stats update it expects 3894 rows to be returned by the query. In reality the query returned 18093 rows. There has been an improved timing of 5 minutes compared to 6min 41 sec before. a gain of 25%?

Thanks
  • 1. Re: execution plan and stale statistics
    Rob_J Journeyer
    Currently Being Moderated
    Hi,

    The plan is the same so if Oracle is actually executing it the same way then it should take around the same amount of time. However, you have to take into consideration other factors, such as contention on the system (I/O, CPU, other users, etc) at both times of running the query. Also, how much of the data is cached as this will have a drastic impact on response times.

    If on DEV, try this. Clear the cache ALTER SYSTEM FLUSH BUFFER_POOL. Then run the query again with the new stats and see what the timing comes out at.

    When gathering the stats, what options did you use? I would guess either the default or an estimate for the sample size. This means that Oracle probably only sampled a certain % of the data in the table and based the stats on that. If you use 100% sampling it should be pretty much spot on.

    Here is some information on [url http://www.ora00600.com/scripts/statistics/gathering_statistics.html]DBMS_STATS

    Then I would recommend having a look at the DBMS_STATS package in the Oracle docs for GATHER_TABLE_STATS and ensuring that the parameters you are using are applicable for your needs and environment.

    HTH,
    Rob
  • 2. Re: execution plan and stale statistics
    905989 Newbie
    Currently Being Moderated
    Thanks.

    This table is not partitioned and has many indexes.

    So if I did
                DBMS_STATS.GATHER_TABLE_STATS
                (
                   user,
                   table = 'TABLE',
                   cascade => true,
                   degree=>DBMS_STATS.AUTO_DEGREE
                );
    I believe that will do full statistics as oposed to using sampling?

    Thanks
  • 3. Re: execution plan and stale statistics
    Rob_J Journeyer
    Currently Being Moderated
    The docs say:

    estimate_percent:

    Percentage of rows to estimate (NULL means compute) The valid range is [0.000001,100]. Use the constant DBMS_STATS.AUTO_SAMPLE_SIZE to have Oracle determine the appropriate sample size for good statistics. This is the default.The default value can be changed using the SET_DATABASE_PREFS Procedure, SET_GLOBAL_PREFS Procedure, SET_SCHEMA_PREFS Procedure and SET_TABLE_PREFS Procedure.

    You haven't specified anything for estimate_percent, so it could be said that it's NULL. However, the default is to use an auto sample size. There is probably a way to find out what it used, but I don't know off the top of my head.

    I would do the following:

    1. Back up the stats that you have on that table which you are happy with
    2. Run the gather stats again with 100% sampling on, explcitly specified
    3. Compare those stats with the previous ones and see if they are more accurate

    I always say that to avoid any doubt always explicitly specify any parameters you want to set. Another reason to do this is because if you ran the same command on another database it could have a different outcome based on the default values.
  • 4. Re: execution plan and stale statistics
    JohnWatson Guru
    Currently Being Moderated
    To see the proportion of a table that was analyzed,
    select table_name,last_analyzed,sample_size,num_rows from dba_tables where owner='SCOTT';
  • 5. Re: execution plan and stale statistics
    905989 Newbie
    Currently Being Moderated
    Thanks again guys.

    This is what I am getting for this table:
    TABLE_NAME                     LAST_ANALYZED       SAMPLE_SIZE   NUM_ROWS
    ------------------------------ ------------------- ----------- ----------
    TABLE                       04/01/2013 08:30:05   101608528  101608528
    Which indicates that full sample size was used?

    It then begs the question why I am getting estimates from autotrace that are not very precise. Bottom line it is an execution plan based on many factors and only an estimate? Is that correct?
    18093 rows selected.
    
    Elapsed: 00:05:05.49
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2046255496
    
    -------------------------------------------------------------------------------------------------
    | Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT              |                 |  3894 |   699K| 20058   (2)| 00:04:01 |
    |   1 |  MERGE JOIN CARTESIAN         |                 |  3894 |   699K| 20058   (2)| 00:04:01 |
    |*  2 |   TABLE ACCESS FULL           | TABLE2           |     1 |    37 |    11   (0)| 00:00:01 |
    |   3 |   BUFFER SORT                 |                 |  3850 |   552K| 20047   (2)| 00:04:01 |
    |*  4 |    TABLE ACCESS BY INDEX ROWID| TABLE       |  3850 |   552K| 20047   (2)| 00:04:01 |
    |*  5 |     INDEX RANGE SCAN          | TABLE_IDX1 | 14811 |       |  8722   (3)| 00:01:45 |
    -------------------------------------------------------------------------------------------------
  • 6. Re: execution plan and stale statistics
    Rob_J Journeyer
    Currently Being Moderated
    How about the index being used, is that a 100% sample?
  • 7. Re: execution plan and stale statistics
    905989 Newbie
    Currently Being Moderated
    very good point Rob.

    That index used shows the following:
    select table_name,index_name,last_analyzed,sample_size,num_rows from user_indexes where table_name = 'TABLE' and index_name ='TABLE_IDX1'
    wst_user@DLNGFX07.WORLD> /
    
    TABLE_NAME                     INDEX_NAME                     LAST_ANALYZED       SAMPLE_SIZE   NUM_ROWS
    ------------------------------ ------------------------------ ------------------- ----------- ----------
    TABLE                       TABLE_IDX1                         04/01/2013 08:55:22      211502  108595391
    Which indicates sampling is used. What needs to be done to ensure that indexes use full sampling (100%) as well as table? Is there just what command for dbms_stats.gather_table_stats to do all?

    Something like:
    exec dbms_stats.gather_table_stats (user, 'TABLE', degree=>DBMS_STATS.AUTO_DEGREE, estimate_percent => 100, method_opt => 'for all indexed columns size 100', cascade=>true);
    Thanks
  • 8. Re: execution plan and stale statistics
    Iordan Iotzov Expert
    Currently Being Moderated
    In general, the stats for tables and indexes should as fresh as possible. The default nightly stats gathering job can accomplish that in most cases.
    In some cases, such as dealing with volatile tables, custom stats gathering procedures, including locking table stats, are needed.

    Is there a reason for you to lock the stats of the table?

    If yes, you should be aware that the benefits of locked stats decrease over time. You can find more info about locking stats and dealing with volatile tables in general in my Hotsos 2012 presentation (http://iiotzov.files.wordpress.com/2012/03/volatile_tables_presentation.pdf) and white paper (http://iiotzov.files.wordpress.com/2012/03/volatile_tables_white_paper.docx)

    Iordan Iotzov
    http://iiotzov.files.wordpress.com/
  • 9. Re: execution plan and stale statistics
    Rob_J Journeyer
    Currently Being Moderated
    Hi,

    No, I wouldn't think so. The *"method_opt => 'for all indexed columns size 100',"* part of that will attempt to create histograms, with 100 buckets. I've not much experience with histograms so I don't know the exact details about exactly what it will do.

    The way to gather index stats is by specifying cascade=>true, which I believe you have done originally(??), but I guess it picks up the database default value for the sample size. When you create an index, depending on what version you are on, it will gather 100% stats while it's building it (I think it's from 10g onwards but don't quote me on that).

    You could gather stats separately for that index with GATHER_INDEX_STATS and specify a 100% sample size. Or, if you always want to use a 100% sample size by default change the default value. I've written how to do that in the link I added earlier. Personally, I know I want all the stats gathering jobs to run with certain parameters every time so I change the default values in the database. Then, anyone who runs in a stats gathering job will use the same parameters, unless they override them explicitly.

    Rob
  • 10. Re: execution plan and stale statistics
    Rob_J Journeyer
    Currently Being Moderated
    I think it also depends on what kind of database you have. For example, on an OLTP system where you (generally) want every SQL statement to run in the same way each time, changing stats every night could lead to plan changes which are not optimal.

    Isn't that why Oracle introduced SQL Plan Management? It has to be. They realised that plan changes were not always for the best and that the auto stats gathering jobs do cause issues for people. So, they went with a compromise - still gather stats but if the plan changes as a result of it, don't actually implement that change until someone signs it off. That's the highest level overview of what SQL Plan Management does from my experience. I could be wrong?
  • 11. Re: execution plan and stale statistics
    Mohamed Houri Pro
    Currently Being Moderated
    -----------------------------------------------------------------------------------------------------
    | Id  | Operation                     | Name            | E-Rows  | E-Bytes | Cost (%CPU)| E-Time   |
    -----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT              |                 |      10 |  1800   |  64   (2)  | 00:00:01 |
    |   1 |  MERGE JOIN CARTESIAN         |                 |      10 |  1800   |  64   (2)  | 00:00:01 |
    |*  2 |   TABLE ACCESS FULL           | TABLE2           |       1 |    37   |  11   (0)  | 00:00:01 |
    |   3 |   BUFFER SORT                 |                 |      10 |  1430   |  53   (2)  | 00:00:01 |
    |*  4 |    TABLE ACCESS BY INDEX ROWID| TABLE        |      10 |  1430   |  53   (2)  | 00:00:01 |
    |*  5 |     INDEX RANGE SCAN          | TABLE_IDX1 |      35 |         |  23   (0)  | 00:00:01 |
    -----------------------------------------------------------------------------------------------------
    First, execution plan should always be accompagnied by its predicate part as this one may contains valuable information.

    Second, how are you generating those executions plans? If you want to see if the CBO (Oracle Optimizer) is doing a good estimations after you've gathered statistics then you should take the execution plan from memory using dbms_xplan package as shown in this blog article

    http://hourim.wordpress.com/2011/12/29/getting-explain-plan/

    Best Regards

    Mohamed Houri
    www.hourim.wordpress.com
  • 12. Re: execution plan and stale statistics
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    902986 wrote:
    TABLE_NAME                       NUM_ROWS     BLOCKS LAST_ANALYZED       Locked
    ------------------------------ ---------- ---------- ------------------- -----
    TABLE                       60010840    4759487 07/11/2010 06:10:53 ALL
    TABLE                        101608528    8457437 04/01/2013 08:30:05
    Now my question is that although execution plans are the same before and after updating stats, the estimated figures are much more realistic following update stats? Additionlly is my understanding correct that Oracle expected 10 rows returned with old stats whereas after stats update it expects 3894 rows to be returned by the query. In reality the query returned 18093 rows. There has been an improved timing of 5 minutes compared to 6min 41 sec before. a gain of 25%?
    Given that your stats are some 18 months out of date, it seems likely that the problem is the standard "out of range" problem.
    If your where clause includes a predicate that's based on a column that holds values that increase steadily with time (e.g. sequence number or timestamp) then your queries will eventually be for data which (according to the old statistics) doesn't exist in the database.

    If the predicates are equality, the optimizer uses a "linear decay" algorithm to estimate the data volumes of out of range data; if the predicates are range based Oracle then Oracle uses 1/num_distinct as the estimate of data.

    You need to check the index definition, predicates, and low and high values stored for the columns in the predicates.


    I'll guess that the link that Iordan Iotzov supplied describes this behaviour in detail.

    Regards
    Jonathan Lewis
  • 13. Re: execution plan and stale statistics
    905989 Newbie
    Currently Being Moderated
    This is becoming more interesting

    I did gather stats for that table as follows:
     exec dbms_stats.gather_index_stats(user,'TABLE_IDX1',estimate_percent =>100);
    and now I have
    TABLE_NAME                     INDEX_NAME                     LAST_ANALYZED       SAMPLE_SIZE   NUM_ROWS
    ------------------------------ ------------------------------ ------------------- ----------- ----------
    TABLE                          TABLE_IDX1                  04/01/2013 15:57:12   101608528  101608528
    Now that index shown as TABLE_IDX1 above is composed of 5 columns (AREA, PORTFOLIO, STATUS_FLAG, VALUE_DATE, CUST_MNEMONIC) and if I run the query again I get
    set timing on
    set autotrace on
    set echo on
    var begintime varchar2(20);
    exec :begintime := '28/06/2011 00:00:00';
    var endtime varchar2(20);
    exec :endtime := '30/06/2011 00:00:00';
    SELECT fd.book_id, fd.WSS_TID, fd.ORIG_TID, fd.DEAL_NUMBER, fd.TRADE_DATE, fd.NDF_FIXDT,fd.VALUE_DATE, fd.CUST_NAME_FULL,
    FD.CCY_BOUGHT,fd.AMT_BOUGHT,FD.CCY_SOLD, fd.AMT_SOLD, fd.TRADER_NAME, fd.entry_date_time, fd.STATUS_flag ,fd.rate,
    fd.PRODUCT, fd.area,a.AREA_DESCRIP,fd.portfolio, a.legal_entity_name
    FROM TABLE fd,
    TABLE2 a
    WHERE fd.TRADE_DATE <= TO_DATE(:endtime,'DD/MM/YYYY  hh24:mi:ss')
    and fd.VALUE_date >= TO_DATE(:begintime,'DD/MM/YYYY  hh24:mi:ss') AND fd.AREA ='SGFXMM' AND fd.area = a.area_code
    The output is as follows:
    18093 rows selected.
    
    Elapsed: 00:04:53.47
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 2046255496
    
    -------------------------------------------------------------------------------------------------
    | Id  | Operation                     | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT              |                 |  3894 |   699K| 18840   (2)| 00:03:47 |
    |   1 |  MERGE JOIN CARTESIAN         |                 |  3894 |   699K| 18840   (2)| 00:03:47 |
    |*  2 |   TABLE ACCESS FULL           | TABLE2           |     1 |    37 |    11   (0)| 00:00:01 |
    |   3 |   BUFFER SORT                 |                 |  3850 |   552K| 18829   (2)| 00:03:46 |
    |*  4 |    TABLE ACCESS BY INDEX ROWID| TABLE        |  3850 |   552K| 18829   (2)| 00:03:46 |
    |*  5 |     INDEX RANGE SCAN          | TABLE_IDX1 | 13858 |       |  8154   (3)| 00:01:38 |
    -------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - filter("A"."AREA_CODE"='SGFXMM')
       4 - filter("FD"."TRADE_DATE"<=TO_DATE(:ENDTIME,'DD/MM/YYYY  hh24:mi:ss'))
       5 - access("FD"."AREA"='SGFXMM' AND "FD"."VALUE_DATE">=TO_DATE(:BEGINTIME,'DD/MM/YYYY
                   hh24:mi:ss'))
           filter("FD"."VALUE_DATE">=TO_DATE(:BEGINTIME,'DD/MM/YYYY  hh24:mi:ss'))
    
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
              0  db block gets
         376020  consistent gets
         208569  physical reads
            144  redo size
        1678931  bytes sent via SQL*Net to client
           8782  bytes received via SQL*Net from client
           1208  SQL*Net roundtrips to/from client
              1  sorts (memory)
              0  sorts (disk)
          18093  rows processed
    Now Jonathan kindly mentioned out of range values. We have deleted the data that skewed the results. My point is: is there something fundamentally incorrect the way stats are gathered? Certainly we do not have skewed data anymore. What else I need to do?

    Thanks
  • 14. Re: execution plan and stale statistics
    JohnWatson Guru
    Currently Being Moderated
    Perhaps you need to get your execution plan in adifferent way. Using autotrace may tell lies when you are using bind variables, as you are doing. I am open to correction, but I believe that autotrace runs EXPLAIN PLAN, and EXPLAIN PLAN always does a hard parse but does not peek binds. Therefore it is possible that the cardinalities may not reflect reality, and that the plan you are seeing is not the plan that is used.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points