9 Replies Latest reply on Jan 16, 2020 6:34 PM by Jonathan Lewis

    index recommendation

    kaericn

      Dear community,

       

      I have the below table def and query.

       

      The volume is 200 million plus rows.

       

      What index would it suited to speed up the query ?

       

      This table is mixed work load both transactional (insert and update ) and read.

       

      Index idea:

       

      CREATE index tbl_1 on tbl(cnt,r_creation_dt,c_id,a_action,last_modified_by);
      
      
      CREATE index tbl_2 on tbl(cnt,c_status,r_creation_dt);
      
      
      CREATE bitmap index tbl_3 on tbl(c_status);
      

       

       

      tbl defintion below:

       

      R_ID          NOT NULL VARCHAR2(30) 
      C_ID                     NUMBER       
      N_ID                     VARCHAR2(40) 
      CREATED_BY         NOT NULL VARCHAR2(30) 
      LAST_MODIFIED_BY            VARCHAR2(30) 
      C_STATUS           CHAR(1)      
      A_ACTION            CHAR(1)      
      R_CREATION_DT          TIMESTAMP(6) 
      CNT                     NUMBER(38)  
      

       

       

      SQL:

       

      SELECT
          c_id,
          a_action,
          cnt,
          last_modified_by
      FROM
          tbl
      WHERE
          c_status IN(
              'N',
              'F'
          )
          AND cnt<=5
      GROUP BY
          cnt,
          r_creation_dt,
          case_id,
          anonymize_action,
          last_modified_by
          
      ORDER BY
          r_creation_dt
      FETCH FIRST 1000 ROWS ONLY;
      
        • 1. Re: index recommendation
          AndrewSayer

          You almost definitely dont want to go anywhere near a bitmap index if your table needs to be updatable by multiple sessions.

           

          Tell is about the selectivity of these filters, how many rows have the c_status values you care about? how many rows have a cnt value <=5 ? How many rows fit both those filters?

           

          How often will the query run? Does it need to be completely accurate or can it give slightly old data?

          • 2. Re: index recommendation
            Solomon Yakobson

            It is hard to tell without knowing your data and processes. Index can help your query but slow down insert/update/delete if there is heavy OLTP. Anyway, I'd try:

             

            SQL> create index tbl_idx1

              2    on tbl(

              3           cnt,

              4           c_status,

              5           r_creation_dt,

              6           c_id,

              7           a_action,

              8           last_modified_by

              9          )

            10  /

             

            Index created.

             

            SQL> explain plan for

              2  SELECT

              3      c_id,

              4      a_action,

              5      cnt,

              6      last_modified_by

              7  FROM

              8      tbl

              9  WHERE

            10      c_status IN(

            11          'N',

            12          'F'

            13      )

            14      AND cnt<=5

            15  GROUP BY

            16      cnt,

            17      r_creation_dt,

            18      c_id,

            19      a_action,

            20      last_modified_by

            21  ORDER BY

            22      r_creation_dt

            23  FETCH FIRST 1000 ROWS ONLY

            24  /

             

            Explained.

             

            SQL> select * from table(dbms_xplan.display)

              2  /

             

            PLAN_TABLE_OUTPUT

            --------------------------------------------------------------------------------------------------------

            Plan hash value: 2572020106

             

            -----------------------------------------------------------------------------------

            | Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

            -----------------------------------------------------------------------------------

            |   0 | SELECT STATEMENT       |          |  1000 | 72000 |     1 (100)| 00:00:01 |

            |*  1 |  VIEW                  |          |  1000 | 72000 |     1 (100)| 00:00:01 |

            |*  2 |   WINDOW NOSORT STOPKEY|          |     1 |    62 |     1 (100)| 00:00:01 |

            |   3 |    SORT GROUP BY       |          |     1 |    62 |     1 (100)| 00:00:01 |

            |*  4 |     INDEX RANGE SCAN   | TBL_IDX1 |     1 |    62 |     0   (0)| 00:00:01 |

            -----------------------------------------------------------------------------------

             

            PLAN_TABLE_OUTPUT

            --------------------------------------------------------------------------------------------------------

             

            Predicate Information (identified by operation id):

            ---------------------------------------------------

             

               1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1000)

               2 - filter(ROW_NUMBER() OVER ( ORDER BY "R_CREATION_DT")<=1000)

               4 - access("CNT"<=5)

                   filter("C_STATUS"='F' OR "C_STATUS"='N')

             

            Note

            -----

             

            PLAN_TABLE_OUTPUT

            --------------------------------------------------------------------------------------------------------

               - dynamic statistics used: dynamic sampling (level=2)

             

            23 rows selected.

             

            SQL>

             

            Or:

             

            SQL> create index tbl_idx1

              2    on tbl(

              3           c_status,

              4           cnt,

              5           r_creation_dt,

              6           c_id,

              7           a_action,

              8           last_modified_by

              9          )

            10  /

             

            Index created.

             

            SQL> explain plan for

              2  SELECT

              3      c_id,

              4      a_action,

              5      cnt,

              6      last_modified_by

              7  FROM

              8      tbl

              9  WHERE

            10      c_status IN(

            11          'N',

            12          'F'

            13      )

            14      AND cnt<=5

            15  GROUP BY

            16      cnt,

            17      r_creation_dt,

            18      c_id,

            19      a_action,

            20      last_modified_by

            21  ORDER BY

            22      r_creation_dt

            23  FETCH FIRST 1000 ROWS ONLY

            24  /

             

            Explained.

             

            SQL> select * from table(dbms_xplan.display)

              2  /

             

            PLAN_TABLE_OUTPUT

            ------------------------------------------------------------------------------------

            Plan hash value: 2189129534

             

            -----------------------------------------------------------------------------------

            | Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |

            -----------------------------------------------------------------------------------

            |   0 | SELECT STATEMENT       |          |  1000 | 72000 |     2  (50)| 00:00:01 |

            |*  1 |  VIEW                  |          |  1000 | 72000 |     2  (50)| 00:00:01 |

            |*  2 |   WINDOW NOSORT STOPKEY|          |     1 |    62 |     2  (50)| 00:00:01 |

            |   3 |    SORT GROUP BY       |          |     1 |    62 |     2  (50)| 00:00:01 |

            |   4 |     INLIST ITERATOR    |          |       |       |            |          |

            |*  5 |      INDEX RANGE SCAN  | TBL_IDX1 |     1 |    62 |     1   (0)| 00:00:01 |

             

            PLAN_TABLE_OUTPUT

            ------------------------------------------------------------------------------------

             

            Predicate Information (identified by operation id):

            ---------------------------------------------------

             

               1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1000)

               2 - filter(ROW_NUMBER() OVER ( ORDER BY "R_CREATION_DT")<=1000)

               5 - access(("C_STATUS"='F' OR "C_STATUS"='N') AND "CNT"<=5)

             

            Note

            -----

             

            PLAN_TABLE_OUTPUT

            ------------------------------------------------------------------------------------

               - dynamic statistics used: dynamic sampling (level=2)

             

            23 rows selected.

             

            SQL>

             

            SY.

            • 3. Re: index recommendation
              John_K

              Why not just create the indexes and look at the execution plan (and real-time performance) that the different options produce?

              • 4. Re: index recommendation
                BEDE

                I think r_creation_dt is to be the first in the index, considering that you will need the first 1000 rows or, possibly, the rows in a given date interval.

                Is there any possibility that one or more of those columns be null?

                If in that table data is inserted on a daily basis and r_creation_dt is not null and is meant to be the timestamp when the row was inserted, maybe it would be good to partition that table by range r_creation_dt with interval.

                • 5. Re: index recommendation
                  AndrewSayer

                  BEDE wrote:

                   

                  I think r_creation_dt is to be the first in the index, considering that you will need the first 1000 rows or, possibly, the rows in a given date interval.

                  Is there any possibility that one or more of those columns be null?

                  If in that table data is inserted on a daily basis and r_creation_dt is not null and is meant to be the timestamp when the row was inserted, maybe it would be good to partition that table by range r_creation_dt with interval.

                  All rows within a group need to be identified before the rows can by returned, there's not a legal execution plan that will use an index on r_creation_dt to avoid that sort. Same applies for partitioning.

                  • 6. Re: index recommendation
                    Solomon Yakobson

                    BEDE wrote:

                     

                    I think r_creation_dt is to be the first in the index, considering that you will need the first 1000 rows or, possibly, the rows in a given date interval.

                     

                    There is no filter/access condition on r_creation_dt so best you'd get is index full scan vs index range scan when index first column is CNT or C_STATUS.

                     

                    SY.

                    • 7. Re: index recommendation
                      Joerg.Sobottka

                      The goal of every execution plan is to do as less block gets (from memory and disk) as possible. If this is the only select at your table, you can go forward with the following 2 solutions, imho both should work (but the optimizer isn't anytime predictable).:

                       

                      One is to put all columns into the index, the column with the highest selectivity on first place (cnt or c_status), see Solomons post => you get an index scan without table access

                      The second one is to put only cnt, c_status and r_id into the index, also here the column with the highest selectivity first. The optimizer than should/will choose an index scan with an additional table access afterwards.

                      What's faster or better? That depends, if you need the output of the query to select e.g. details in a next step it could be helpful that the table data is already loaded into the buffer cache.

                      Best is, you just try it.

                       

                      By the way - I have never seen remarkable downgrade of performance on DML due to 1 or 3 indexes created on a table (except with function-based indexes based on own application functions)

                      • 8. Re: index recommendation
                        Jonathan Lewis

                        Andrew's first response is the most important one.  You (we) need to know more about the numbers than you've told us. (And you almost certainly need to get rid of the bitmap index to minimise time lost to locking problems.

                         

                        I'm going to guess that you've read something that says "a column with only a few distinct values should have a bitmap index", leading you to create a bitmap index on c_status. The directive is very badly stated - if every value in the table appears in a very large number of rows a bitmap index is not going to make a magical difference to the performance as most of the work will be about visiting a large number of rows in the table. The real benefit of bitmap indexes comes from the optimizer's ability to combine many of them using bit-oriented operations.

                         

                        So - forget the bitmap index, but ask how many rows there are for each distinct value. Is this an example of a common pattern where one (or maybe two) values cover almost all the data and all the other values correspond to a tiny fraction of data ?  In particular, do 'N' and 'F' cover almost all the 200M rows, or do they account for only a few thousand.

                         

                        What about cnt ?  It is always positive, is it always integer, how many distinct values appear, is the number of rows where cnt <=5 tiny or huge ?

                         

                        Then you need to think of the combinations - maybe there are lots of N/F rows, but maybe almost all N/F rows have cnt>5. The only predicates you've got to minimise the visits to the table are from c_status and cnt, so you need to know whether or not you can get the performance you want by getting EXACTLY the right set of rows without geting any other rows from the table.  If so then a really appropriate index on the two columns would be a starting point.  If the query would be too slow even with a high precision index you would then need to think about creating the most efficient index that would include all the information you needed.

                         

                        For example - assume that the number of rows matching your filter predicates is small (and you don't have lots of variations in these predicates, then you could do the following:

                         

                        alter table tbl add(

                                nf_r_creation_dt generated always as (

                                        case

                                                when c_status in ('N','F') and cnt <= 5

                                                then r_creation_dt

                                        end

                                ) virtual

                        );

                         

                        create index tbl_f1 on tbl(nf_r_creation_dt);

                         

                        This will give you an index which is as small as it can be and will identify exactly the rows you want to select from the table and no others. Then (after gathering stats on the virtual column) if you add the predicate  "and nf_r_creation_dt is not null" to your query the optimizer will be able to do a full scan of this index and access the data by rowid from the table.

                         

                        Regards

                        Jonathan Lewis

                        • 9. Re: index recommendation
                          Jonathan Lewis

                          AndrewSayer wrote:


                          All rows within a group need to be identified before the rows can by returned, there's not a legal execution plan that will use an index on r_creation_dt to avoid that sort. Same applies for partitioning.

                          Technically the following index would (starting with the relevant column, and covering all the required data) - see plan below - it does "sort" but it's a "nosort sort" which stops after the minimum amount of data:

                           

                          create index tbl_i1 on tbl(

                                  r_creation_dt,

                                  cnt,

                                  c_id,

                                  a_action,

                                  last_modified_by,

                                  c_status

                          );

                           

                          ----------------------------------------------------------------------------------------------------

                          | Id  | Operation              | Name   | Starts | E-Rows | A-Rows |    A-Time  | Buffers | Reads  |

                          ----------------------------------------------------------------------------------------------------

                          |   0 | SELECT STATEMENT       |        |      1 |        |   1000 |00:00:00.01 |     109 |     27 |

                          |*  1 |  VIEW                  |        |      1 |   1000 |   1000 |00:00:00.01 |     109 |     27 |

                          |*  2 |  WINDOW NOSORT STOPKEY |        |      1 |   1000 |   1000 |00:00:00.01 |     109 |     27 |

                          |   3 |    SORT GROUP BY NOSORT|        |      1 |   1000 |   1000 |00:00:00.01 |     109 |     27 |

                          |*  4 |    INDEX FULL SCAN     | TBL_I1 |      1 |   1000 |   1002 |00:00:00.01 |     109 |     27 |

                          ----------------------------------------------------------------------------------------------------

                           

                          Predicate Information (identified by operation id):

                          ---------------------------------------------------

                            1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=1000)

                            2 - filter(ROW_NUMBER() OVER ( ORDER BY "R_CREATION_DT")<=1000)

                            4 - access("CNT"<=5)

                                filter(("CNT"<=5 AND INTERNAL_FUNCTION("C_STATUS") AND "R_CREATION_DT" IS NOT NULL))

                           

                          The flaw with this is, as SY said, you start doing an index full scan, so if you have to go a long way through the index before you find any relevant data it's very expensive.  Of course it would be possible to combine this "full index scan" strategy with the function-based / virtual column index, creating 6 virtual columns based on the supplied predicate in a way that produced index entries ONLY for the data that is needed. Smallest index, with guaranteed minimum work to get the 1st 1,000 aggregates.

                           

                          Regards

                          Jonathan Lewis