11 Replies Latest reply: Dec 30, 2012 12:30 PM by rp0428 RSS

    GROUP BY doesn't use index without predicate - why?

    982092
      Hi - I think this is pretty basic question, but I'm stumped. It seems that unless I use a WHERE clause the index is not used. But by using a where clause - even though it doesn't exclude any records - the index is used. Perhaps this is just how the engine works, but it makes no sense to me. Does anyone have insight?

      We are running 11.2.0.1

      Situation:
      I have a table with 6million rows with recent stats, etc. This table has activity events. One column in this table is DAY which contains a number between 1 and 31 representing day of the month. This column is indexed.
      Currently the table only includes data from a single month.

      This query does not use the index even with index hint and has a cost of 296649 in Explain plan with FULL as the access option for the table:

      select
      day, count(*)
      from archive.log_usagelogs
      group by day;

      This query uses the index with a FAST FULL SCAN and has explain plan cost of 4710.

      select
      day, count(*)
      from archive.log_usagelogs
      where day between 1 and 31
      group by day;

      Both return the same results. The 2nd runs much faster than the first.

      I ran both with /*+gather_plan_statistics*/ and here are the results:

      SQL_ID 5hukqw3822vq8, child number 0
      -------------------------------------
      select /*+gather_plan_statistics*/  day, count(*) from
      archive.log_usagelogs group by day

      Plan hash value: 3127609364

      -------------------------------------------------------------------------------------------------------------------------
      | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
      -------------------------------------------------------------------------------------------------------------------------
      | 0 | SELECT STATEMENT | | 1 | | 14 |00:00:09.31 | 1093K| | | |
      | 1 | HASH GROUP BY | | 1 | 13 | 14 |00:00:09.31 | 1093K| 1135K| 1135K| 1232K (0)|
      | 2 | TABLE ACCESS FULL| LOG_USAGELOGS | 1 | 6526K| 6526K|00:00:22.19 | 1093K| | | |
      -------------------------------------------------------------------------------------------------------------------------

      ___________

      SQL_ID 53s7zqhr5zymv, child number 0
      -------------------------------------
      select /*+gather_plan_statistics*/  day, count(*) from
      archive.log_usagelogs where day between 1 and 30 group by day

      Plan hash value: 1799032578

      --------------------------------------------------------------------------------------------------------------------------------
      | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
      --------------------------------------------------------------------------------------------------------------------------------
      | 0 | SELECT STATEMENT | | 1 | | 13 |00:00:04.06 | 17335 | | | |
      | 1 | HASH GROUP BY | | 1 | 13 | 13 |00:00:04.06 | 17335 | 1135K| 1135K| 5126K (0)|
      |* 2 | INDEX FAST FULL SCAN| IDX_USAGELOGS_DAY | 1 | 6526K| 6526K|00:00:11.37 | 17335 | | | |
      --------------------------------------------------------------------------------------------------------------------------------

      Predicate Information (identified by operation id):
      ---------------------------------------------------

      2 - filter(("DAY">=1 AND "DAY"<=30))
        • 1. Re: GROUP BY doesn't use index without predicate - why?
          L-MachineGun
          That is the standard.
          How would Oracle know if you want (or don't want) to use an index except you include a condition?
          :p
          • 2. Re: GROUP BY doesn't use index without predicate - why?
            Solomon Yakobson
            L-MachineGun wrote:
            How would Oracle know if you want (or don't want) to use an index except you include a condition?
            :p
            ??? What it has to do with OP wanting/not wanting to use index? Most likely column day is NULLable. When you add
            WHERE day between 1 and 31
            oracle knows you are looking for not NULL day values, therefore it can use index. Without where clause and column being NULLable it can't use index since NULLs are not indexed and are not present in index (unless it is a composite column index).

            SY.
            • 3. Re: GROUP BY doesn't use index without predicate - why?
              user10857924
              Agree with Solomon, unless you not define column as NOT NULL Optimizer does not know whether there are any rows with NULL values or not as Oracle does not make index entry if all columns are nullable and under such case Index access does not make sense and thats the reasoin it will opt to go for Table FULL scans.

              HTH

              Edited by: user10857924 on Dec 28, 2012 3:13 PM
              • 4. Re: GROUP BY doesn't use index without predicate - why?
                rp0428
                >
                But by using a where clause - even though it doesn't exclude any records - the index is used.
                >
                But your WHERE clause DOES exclude records; it excludes records that do NOT have a 'day' value in the range specified.
                select 
                day, count(*)
                from archive.log_usagelogs
                where day between 1 and 31
                group by day;
                Depending on the column datatype the table MIGHT have data for millions of values that are not between 1 and 31.

                Without a WHERE clause Oracle has no choice but to do a full table scan.

                With that WHERE clause Oracle knows it can use the index since the index MUST HAVE values between 1 and 31.

                Suggest you don't name columns 'day' or similar.
                • 5. Re: GROUP BY doesn't use index without predicate - why?
                  Solomon Yakobson
                  rp0428 wrote:

                  Without a WHERE clause Oracle has no choice but to do a full table scan.
                  Not true. As long as index contains everything we need, optimizer, most likely, will use it:
                  SQL> create table emp1 as select * from emp
                    2  /
                  
                  Table created.
                  
                  SQL> alter table emp1
                    2  modify deptno not null
                    3  /
                  
                  Table altered.
                  
                  SQL> create index emp1_idx1
                    2  on emp1(deptno)
                    3  /
                  
                  Index created.
                  
                  SQL> explain plan for
                    2  select  deptno,
                    3          count(*)
                    4    from  emp1
                    5    group by
                    6  deptno
                    7  /
                  
                  Explained.
                  
                  SQL> @?\rdbms\admin\utlxpls
                  
                  PLAN_TABLE_OUTPUT
                  --------------------------------------------------------------------------------------------------------------------------------
                  
                  Plan hash value: 554072232
                  
                  ----------------------------------------------------------------------------------
                  | Id  | Operation            | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
                  ----------------------------------------------------------------------------------
                  |   0 | SELECT STATEMENT     |           |    14 |   182 |     1   (0)| 00:00:01 |
                  |   1 |  SORT GROUP BY NOSORT|           |    14 |   182 |     1   (0)| 00:00:01 |
                  |   2 |   INDEX FULL SCAN    | EMP1_IDX1 |    14 |   182 |     1   (0)| 00:00:01 |
                  ----------------------------------------------------------------------------------
                  
                  Note
                  
                  PLAN_TABLE_OUTPUT
                  --------------------------------------------------------------------------------------------------------------------------------
                  
                  -----
                     - dynamic sampling used for this statement (level=2)
                  
                  13 rows selected.
                  
                  SQL>
                  SY.
                  • 6. Re: GROUP BY doesn't use index without predicate - why?
                    rp0428
                    I think you missed that these two statements go together. ;)
                    >
                    Depending on the column datatype the table MIGHT have data for millions of values that are not between 1 and 31.

                    Without a WHERE clause Oracle has no choice but to do a full table scan.
                    >
                    For a table with millions of values that are not between 1 and 31 and few that are Oracle has no choice but to do a full table scan.
                    • 7. Re: GROUP BY doesn't use index without predicate - why?
                      Solomon Yakobson
                      rp0428 wrote:
                      I think you missed that these two statements go together. ;)
                      Question was why index is not used when there is no where clause and is used when there is one. Since comparing column day with number used index we can safely state column day datatype is numeric. And based on column name and condition (between 1 and 31) we can safely assume the only possible column day value is null. But that's besides the point. I really have no clue what links:
                      Depending on the column datatype the table MIGHT have data for millions of values that are not between 1 and 31.
                      and
                      Without a WHERE clause Oracle has no choice but to do a full table scan.
                      If there is no WHERE clause column datatype is irrelevant. What is relevant, in terms of using/not using index, is column day NULL/NOT NULL attribute if column is indexed and is a single index column. If there is a multi-column index where column day is index first column optimizer might use index even if column day is NULLable, as long as at least one other indexed column is NOT NULL.

                      SY.
                      • 8. Re: GROUP BY doesn't use index without predicate - why?
                        rp0428
                        You are still missing the point and picking nits instead of trying to understand the point I was making.

                        The datatype is relevant to the two comments I was making.
                        >
                        Depending on the column datatype the table MIGHT have data for millions of values that are not between 1 and 31.
                        >
                        I said that because OP did NOT tell us what the datatype of the column is. And if the datatype is NUMBER(2) then it CANNOT 'have data for millions of values that are not between 1 and 31'.

                        Whereas if the datatype is NUMBER then it CAN have millions of values outside that range.

                        And if there are millions of values outside the range and only a few in it then, like I said
                        >
                        Without a WHERE clause Oracle has no choice but to do a full table scan.
                        >
                        If you didn't understand what I meant or was trying to say you should have said so rather than take parts of it out of context.
                        • 9. Re: GROUP BY doesn't use index without predicate - why?
                          Dom Brooks
                          This query does not use the index even with index hint ...
                          
                          select
                          day, count(*)
                          from archive.log_usagelogs
                          group by day;
                          When a query does not obey a hint, it's usually because the hint is invalid.

                          As Solomon has pointed out, the most likely reason is that the DAY column is nullable in your table definition.

                          Because the definition of the table allows NULLs in this column, the optimizer cannot use the index as a skinny version of the table.
                          NULLs are not indexed in single column indexes.

                          It may be that DAY is never NULL but unless you declare it as such then the optimizer cannot use this index in this way.

                          As soon as you add a predicate like :
                          where day between 1 and 31
                          this implicitly excludes all nulls.

                          So even though you say this does not exclude any records, it gives the optimizer the option to use the index because all non-null entries are guaranteed to be in the index.

                          This is why the correct definition of columns is so important. And giving the optimizer the correct information about nulls, primary keys, foreign keys, etc can be so crucial for performance.
                          • 10. Re: GROUP BY doesn't use index without predicate - why?
                            Solomon Yakobson
                            rp0428 wrote:
                            If you didn't understand what I meant or was trying to say you should have said so rather than take parts of it out of context.
                            And if there are millions of values outside the range and only a few in it then, like I said

                            Without a WHERE clause Oracle has no choice but to do a full table scan.
                            No, I still have no clue. You said: "Depending on the column datatype the table MIGHT have data for millions of values that are not between 1 and 31". OK, so what? Are you saying if column is number and not number(2) optimizer will choose a different plan for:
                            select
                            day, count(*)
                            from archive.log_usagelogs
                            group by day;
                            And, btw, NUMBER and NUMBER(20) are same datatype, just different precision and scale.

                            SY.

                            Edited by: Solomon Yakobson on Dec 30, 2012 6:35 AM
                            • 11. Re: GROUP BY doesn't use index without predicate - why?
                              rp0428
                              >
                              No, I still have no clue. You said: "Depending on the column datatype the table MIGHT have data for millions of values that are not between 1 and 31". OK, so what? Are you saying if column is number and not number(2) optimizer will choose a different plan for:

                              select
                              day, count(*)
                              from archive.log_usagelogs
                              group by day;
                              >
                              No - that isn't what I am saying. Your example has no WHERE clause.

                              And you again used that ONE statement when I tried to clarify for you that BOTH statements go together:
                              >
                              Depending on the column datatype the table MIGHT have data for millions of values that are not between 1 and 31.
                              . . .
                              Without a WHERE clause Oracle has no choice but to do a full table scan.
                              >
                              And, as I tried to clarify I made the first of those statements
                              >
                              Depending on the column datatype the table MIGHT have data for millions of values that are not between 1 and 31.
                              >
                              Because if I had just said this
                              >
                              The table MIGHT have data for millions of values that are not between 1 and 31.
                              >
                              A nit-picker, instead of trying to understand the meaning of what was being said, would say something like
                              >
                              There couldn't be MILLIONS of values not between 1 and 31 because the column might be defined as NUMBER(2).
                              >
                              So if an (apparently unsuccessful) attempt to ward off nit-picking I added the 'depending on the column datatype' part.

                              And now, instead of trying to understand the MEANING of what I was saying, you've chosen to try to nit-pick that
                              >
                              And, btw, NUMBER and NUMBER(20) are same datatype, just different precision and scale.
                              >
                              No - they are in the same datatype CLASS but represent DIFFERENT physical datatypes. In one sense they have the same 'datatype short name' but a different 'datatype long name'.

                              The precision and scale are an inherent part of the NUMBER datatype so different precision and/or scale means different datatype.

                              The class of VARCHAR2 datatypes is, perhaps, the best example.

                              Do you believe that the VARCHAR2 datatype exists? Using the logic you used for the numeric example above the VARCHAR2 and VARCHAR2(1) would be the same datatype. Except that VARCHAR2 doesn't exist.

                              It just doesn't seem possible to create a table using the VARCHAR2 datatype. These tries both give exceptions.
                              create table foo (name VARCHAR2);
                              
                              create table foo (name VARCHAR2());
                              And yet this works
                              create table foo (name VARCHAR2(1));
                              But if NUMBER and NUMBER(20) are same datatype shouldn't VARCHAR2, VARCHAR2() and VARCHAR2(1) be the same datatype?

                              NO - because the datatype attributes are an inherent part of the datatype. It just so happens that Oracle defines default attributes for the numeric datatype but not for VARCHAR2. They could have chosen to define a default but didn't.

                              The term VARCHAR2 or NUMBER might refer to the 'class' of datatypes or to a specific member of the class, which is the way I was using it.

                              From the SQL Language doc
                              http://docs.oracle.com/cd/B28359_01/server.111/b28286/sql_elements001.htm
                              >
                              Datatypes

                              Each value manipulated by Oracle Database has a datatype. The datatype of a value associates a fixed set of properties with the value.
                              . . .
                              When you create a table or cluster, you must specify a datatype for each of its columns. When you create a procedure or stored function, you must specify a datatype for each of its arguments. These datatypes define the domain of values that each column can contain or each argument can have.
                              >
                              The KEY statement is the last sentence:
                              >
                              These datatypes define the domain of values that each column can contain or each argument can have.
                              >
                              That domain is an integral part of the datatype and it MUST be defined either implicitly or explicitly.

                              In the future when you don't understand something just say so. There is no shame in that. Just ask the poster to clarify what they meant. Or provide an example and ask if that is what they were trying to say.

                              There are many terms used in the docs and the industry that can have multiple meanings and connotations. Try to accept that your use of a term may not be the only one that is relevant.