5 Replies Latest reply on Jun 7, 2020 8:43 AM by Jonathan Lewis

    Cost Calculation Issue

    933257

      Hi, We are using version 12.1.0.2.0 version of Oracle Exadata. When i was verifying performance of a "DELETE query", the execution path on a partitioned table was varying between index range scan and FTS and i was checking the exact statistics which must be influencing, but then i encountered something strange. That table is List partitioned(on LIST_ID) and subpartitiond by range on a date column(PT_DATE). I do see we have the system statistics set as OLTP or like a NON Exadata environment and thats why the FTS is being over costed. But still then, when i tried executing a simple query on an subpartition to see the cost of FULL scan i see it seems considering statistics(BLOCKS) of the partition but not subpartitions to calculate the cost for the FTS. Want to understand from experts , Why is it so? Ideally as i have hard coded the partition and subpartition key both in the filter , so it should use the subpartition statistics for calculating cost of a full scan. Why its not doing that? Or i am interpreting it in wrong way? And in the predicate section i don't see the partition key filter i.e. LIST_ID IN (1 , 3 ) .

       

      select table_name, num_rows, blocks,avg_row_len,global_stats from dba_tables where table_name='TAB1';

       

           

      TABLE_NAMENUM_ROWSBLOCKSAVG_ROW_LENGLOBAL_STATS
      TAB11043540400143450152155YES

       

      select table_name,partition_name,num_rows,blocks,avg_row_len  from dba_tab_partitions where table_name='TAB1' and partition_name='TAB1_ODD_PART' ;

           

      TABLE_NAMEPARTITION_NAMENUM_ROWSBLOCKSAVG_ROW_LEN
      TAB1TAB1_ODD_PART54818200089548898156

       

      select table_name,partition_name,subpartition_name,num_rows,blocks,avg_row_len from dba_tab_subpartitions where table_name='TAB1'

      and partition_name='TAB1_ODD_PART' and subpartition_name='TAB1_ODD_21MAYPART' ;

            

      TABLE_NAMEPARTITION_NAMESUBPARTITION_NAMENUM_ROWSBLOCKSAVG_ROW_LEN
      SE_FINANCIAL_EVENTTAB1_ODD_PARTTAB1_ODD_21MAYPART66782001329804158

       

      select * from sys.aux_stats$

       

          

      SNAMEPNAMEPVAL1PVAL2
      SYSSTATS_INFOSTATUS COMPLETED
      SYSSTATS_INFODSTART 2/17/2012 19:15
      SYSSTATS_INFODSTOP 2/17/2012 19:15
      SYSSTATS_INFOFLAGS1
      SYSSTATS_MAINCPUSPEEDNW2268
      SYSSTATS_MAINIOSEEKTIM4
      SYSSTATS_MAINIOTFRSPEED4096
      SYSSTATS_MAINSREADTIM
      SYSSTATS_MAINMREADTIM
      SYSSTATS_MAINCPUSPEED
      SYSSTATS_MAINMBRC
      SYSSTATS_MAINMAXTHR
      SYSSTATS_MAINSLAVETHR

       

      FTS Cost =ceil(blocks / mbrc * ( ioseektim + mbrc * db_block_size / iotfrspeed ) / ( ioseektim+ db_block_size / iotfrspeed ) ) + 1

      = ceil(89548898 / 8 * ( 4 + 8 * 8192 / 4096 ) / ( 4+ 8192 / 4096 ) ) + 1 = ~37million

       

      It should have been

      =ceil(1329804 / 8 * ( 4 + 8 * 8192 / 4096 ) / ( 4+ 8192 / 4096 ) ) + 1= 554086.

       

      Query:-

       

      select * from TAB1

      WHERE LIST_ID IN (1,3 ) AND

      PT_DATE BETWEEN to_date('05/21/2020 00:00:00','MM/DD/YYYY HH24:MI:SS') AND to_date('05/21/2020 23:59:59','MM/DD/YYYY HH24:MI:SS');

       

      Execution Plan
      ----------------------------------------------------------
      Plan hash value: 2715138252
      
      ------------------------------------------------------------------------------------------------------------------
      | Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
      ------------------------------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT            |                    | 21127 |  3197K|    37M  (1)| 00:24:21 |    |  |
      |   1 |  PARTITION LIST INLIST      |                    | 21127 |  3197K|    37M  (1)| 00:24:21 |KEY(I) |KEY(I) |
      |   2 |   PARTITION RANGE ITERATOR  |                    | 21127 |  3197K|    37M  (1)| 00:24:21 |   KEY |   KEY |
      |*  3 |    TABLE ACCESS STORAGE FULL| TAB1               | 21127 |  3197K|    37M  (1)| 00:24:21 |KEY(I) |KEY(I) |
      ------------------------------------------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
      
         3 - storage("PT_DATE"<=TO_DATE(' 2020-05-21 23:59:59', 'syyyy-mm-dd hh24:mi:ss') AND
                    "PT_DATE">=TO_DATE(' 2020-05-21 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
             filter("PT_DATE"<=TO_DATE(' 2020-05-21 23:59:59', 'syyyy-mm-dd hh24:mi:ss') AND
                    "PT_DATE">=TO_DATE(' 2020-05-21 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
      
        • 1. Re: Cost Calculation Issue
          Jonathan Lewis

          To answer the non-cost part of the question:

           

          WHERE LIST_ID IN (1,3 ) AND

          PT_DATE BETWEEN to_date('05/21/2020 00:00:00','MM/DD/YYYY HH24:MI:SS') AND to_date('05/21/2020 23:59:59','MM/DD/YYYY HH24:MI:SS');

           

          12c is smarter than 11g at working out partition elimination on list partitions (See https://jonathanlewis.wordpress.com/2010/11/03/list-partitions/  ). It looks as if you either have one partition that id defined to hold nothing but 1 and 3, or two partitions one holding just 1 and one holding just 3 - in either case the optimizer would be able to decide that it didn't need to apply that predicate.

           

          As a general guideline, be careful about querying date-based (sub)partitions with BETWEEN.  A date-based partition will contain data that is greater than or equal to the upper bound of the previous partition and strictly less than its own upper bound. If you certain that Oracle ought to be able to recognise that the subpartition you're interested in is holding and can only hold) exactly the data for one day you should find that your date predicate will "disappear" if you express it as:

           

               PT_DATE >=  to_date('05/21/2020 00:00:00','MM/DD/YYYY HH24:MI:SS')

          AND  PT_DATE < to_date('05/21/2020 23:59:59','MM/DD/YYYY HH24:MI:SS');

          AND  PT_DATE < to_date('05/22/2020 00:00:00','MM/DD/YYYY HH24:MI:SS');

           

           

           

          To avoid answering the question about cost - there are many bizarre details in the way the optimizer struggles with subpartitions. If it looks as if the optimizer is using the partition-level stats instead of summing a couple of sub-partitions than that's probably what it's doing. Historically the optimizer didn't bother with using subpartition stats at all, and there are probably still a number of anomalies and "best guess" ideas built into the code - perhaps the INLIST costing simply assumes the worst cast that it will fail to eliminate any subpartitions.

           

          Regards

          Jonathan Lewis

           

          UPDATE - corrected .after Mark Powell's comment below:

          • 2. Re: Cost Calculation Issue
            Mark D Powell

            All, >> PT_DATE >=  to_date('05/21/2020 00:00:00','MM/DD/YYYY HH24:MI:SS')

            AND  PT_DATE < to_date('05/21/2020 23:59:59','MM/DD/YYYY HH24:MI:SS')  <<

            - -

            I may not understand the problem correctly because of the partitioning but if each partition is a day wouldn't you want the starting time as midnight of the target day and the ending time as less than midnight of the next?  As written the code seems to skip rows whose time is the last second of the day and which I would expect this time to be in the day's partition.

            - -

            HTH -- Mark D Powell --

            • 3. Re: Cost Calculation Issue
              Jonathan Lewis

              Mark D Powell wrote:

               

              All, >> PT_DATE >= to_date('05/21/2020 00:00:00','MM/DD/YYYY HH24:MI:SS')

              AND PT_DATE < to_date('05/21/2020 23:59:59','MM/DD/YYYY HH24:MI:SS') <<

              - -

              I may not understand the problem correctly because of the partitioning but if each partition is a day wouldn't you want the starting time as midnight of the target day and the ending time as less than midnight of the next? As written the code seems to skip rows whose time is the last second of the day and which I would expect this time to be in the day's partition.

              - -

              HTH -- Mark D Powell --

               

              Absolutely correct, thanks.

               

              I did a cut-n-paste and edited in the BETWEEN, but forgot to correct the end date.

               

              Regards

              Jonathan Lewis

              • 4. Re: Cost Calculation Issue
                933257

                As you said i checked, and in my case both LIST_ID (1,3) are belong to one partition only i.e. only 1,3 and nothing else. So that's why optimizer hide/eliminated that from the predicate section.

                 

                Got one of your blog post , pointing to Randolfs blog explaining similar issue wrt subpartition statistics not being considered during costing. Though it was pointing issues with version 10g, but it appears its still an issue in latest versions(like mine in 12.1).

                 

                https://jonathanlewis.wordpress.com/2010/03/17/partition-stats/

                 

                https://oracle-randolf.blogspot.com/2007/03/subpartitions-and-optimizer-statistics.html       

                 

                Just for information, I tried replacing between with the >= and < , and seems its still showing those in predicate section. Also i see the subpartition DDL for two of the subpartitions and the are defined as below, which looks okay to me.

                 

                SUBPARTITION TAB1_ODD_20MAYPART VALUES LESS THAN (TO_DATE(' 2020-05-21 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),

                SUBPARTITION TAB1_ODD_21MAYPART VALUES LESS THAN (TO_DATE(' 2020-05-22 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')),

                 

                SQL>  select * from TAB1
                  2  WHERE LIST_ID IN (1,3) AND
                  3  PT_DATE >=  to_date('05/21/2020 00:00:00','MM/DD/YYYY HH24:MI:SS') and PT_DATE < to_date('05/22/2020 00:00:00','MM/DD/YYYY HH24:MI:SS');
                Elapsed: 00:00:00.04
                
                Execution Plan
                ----------------------------------------------------------
                Plan hash value: 2715138252
                
                ------------------------------------------------------------------------------------------------------------------
                | Id  | Operation                   | Name               | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
                ------------------------------------------------------------------------------------------------------------------
                |   0 | SELECT STATEMENT            |                    | 21081 |  3211K|    37M  (1)| 00:24:39 |    |  |
                |   1 |  PARTITION LIST INLIST      |                    | 21081 |  3211K|    37M  (1)| 00:24:39 |KEY(I) |KEY(I) |
                |   2 |   PARTITION RANGE ITERATOR  |                    | 21081 |  3211K|    37M  (1)| 00:24:39 |   KEY |   KEY |
                |*  3 |    TABLE ACCESS STORAGE FULL| TAB1               | 21081 |  3211K|    37M  (1)| 00:24:39 |KEY(I) |KEY(I) |
                ------------------------------------------------------------------------------------------------------------------
                
                Predicate Information (identified by operation id):
                ---------------------------------------------------
                
                   3 - storage("PT_DATE"<TO_DATE(' 2020-05-22 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
                              "PT_DATE">=TO_DATE(' 2020-05-21 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
                       filter("PT_DATE"<TO_DATE(' 2020-05-22 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
                              "PT_DATE">=TO_DATE(' 2020-05-21 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
                
                • 5. Re: Cost Calculation Issue
                  Jonathan Lewis

                  I can't spot anything that should stop the optimizer from being very precise in its query. It looks like you've specified exactly one partition and one subpartition. You might try expressing the date predicate using EXACTLY the same formatting as shown in the high value listing.  I've run up a quick test on a list/range partitioned table which behaves correctly in 12.1.0.2 -  my predicate identifies exactly two partitions by list value (each partition is defined to hold exactly one value), and the ranges are all declared as full month ranges (i.e. they were all decalared as "values less than 1st of month")

                   

                   

                  SQL> l

                    1  select *

                    2  from t1

                    3  where number_list_col in (2,4)

                    4  and date_range_col >= to_date('01-Feb-2019 00:00:00','dd-Mon-yyyy hh24:mi:ss')

                    5  and date_range_col <  to_date('01-Mar_2019 00:00:00','dd-Mon-yyyy hh24:mi:ss')

                    6  -- and date_range_col >= to_date('01-Feb-2019','dd-Mon-yyyy')

                    7  -- and date_range_col <  to_date('01-Mar_2019','dd-Mon-yyyy')

                    8  -- and date_range_col >= to_date('02012019','mmddyyyy')

                    9* -- and date_range_col <  to_date('03012019','mmddyyyy')

                  SQL> /

                   

                  Execution Plan

                  ----------------------------------------------------------

                  Plan hash value: 1418099900

                   

                  ------------------------------------------------------------------------------------------------

                  | Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time    | Pstart|  Pstop |

                  ------------------------------------------------------------------------------------------------

                  |  0  | SELECT STATEMENT        |      |     1 |   103 |     2  (0)| 00:00:01 |       |        |

                  |  1  |  PARTITION LIST INLIST  |      |     1 |   103 |     2  (0)| 00:00:01 |KEY(I) | KEY(I) |

                  |  2  |   PARTITION RANGE SINGLE|      |     1 |   103 |     2  (0)| 00:00:01 |     2 |      2 |

                  |  3  |    TABLE ACCESS FULL    | T1   |     1 |   103 |     2  (0)| 00:00:01 |KEY(I) | KEY(I) |

                  ------------------------------------------------------------------------------------------------

                   

                  Even with three difference dat formatting options in my predicate the plan turned out the same - minimum data segments and elimination of predicates.

                  Apart from the formatting the only other thing I can think of is the the ranges for the sunpartitions of your two partitions are not an exact match.

                   

                  Regards

                  Jonathan Lewis