7 Replies Latest reply on Aug 21, 2010 6:18 AM by William Robertson

    Question about improving group by performance

    698295
      Hello,

      I am trying to work around an issue with an application I'm using. I can't change a query it is using and it takes forever to return.

      The vendor has narrowed down the solution to removing an expensive group by expression. We won't be able to get a fix for the application though for several months. We can however change the data and the database.

      My question for you gurus out there is if you had a query (Keep in mind the actual query is tons more complex but I don't think there is a clear way to present it) ...

      select var1, var2
      from mytab
      group by var1,var2;

      What could I do with the data or the database that would make it run faster?

      Would having everything in var2 be the same make any difference in the time the expression would run? Would partitioning help - this would be hard with the actual data condition? There are already indexes.

      Thanks in advance for helping this novice.
        • 1. Re: Question about improving group by performance
          BelMan
          What Oracle Version Do you have ?
          How many rows do you have in this table?
          do you have index on this table or not ?
          since you have group by so the sort will be on PGA ,what is your PGA size?
          • 2. Re: Question about improving group by performance
            BelMan
            you can post the exection plan

            explain plan for
            select var1, var2
            from mytab
            group by var1,var2;

            then select * from table(dbms_xplan.disply)
            1 person found this helpful
            • 3. Re: Question about improving group by performance
              William Robertson
              Assuming the execution plan is OK, can you throw resources at it? Making sure it has enough memory to process it in one pass might be a first step, also parallel processing. A materialized view might mean you only have to do it once.

              Edited by: William Robertson on Aug 20, 2010 12:20 AM
              1 person found this helpful
              • 4. Re: Question about improving group by performance
                user503699
                amcgator wrote:
                Hello,

                I am trying to work around an issue with an application I'm using. I can't change a query it is using and it takes forever to return.

                The vendor has narrowed down the solution to removing an expensive group by expression. We won't be able to get a fix for the application though for several months. We can however change the data and the database.
                If you are on 10g, you may want to look at the possibility of using DBMS_ADVANCE_REWRITE package.
                The documentation link is: http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_advrwr.htm#CFAGBJDG
                and a small but complete test case is: http://dioncho.wordpress.com/2009/03/06/optimizing-unoptimizeable-sql-dbms_advanced_rewrite/
                1 person found this helpful
                • 5. Re: Question about improving group by performance
                  698295
                  Hi,

                  Thanks all for your responses.

                  We are using Oracle 10g. DBMS_ADVANCE_REWRITE looks very interesting and I will read about it. A previous post asked about PGA... what is that?

                  Also I can't change the query so would a materilzed view buy me anything? ( I have never used one. )

                  I will post the query since the explain plan probably won't make sense with out it.
                  explain plan for
                  select ID, STOCK_COUNT_ID, LOCATION_ID, STORE_ID, COUNTER_ID, RECOUNTER_ID, APPROVER_ID, STATUS, LOCATION_DESCRIPTION, STORE_AREA,
                  ITEM_COUNT, ITEM_DISCREPANT_COUNT
                  from 
                  STOCK_COUNT_LOCATION_V locv
                  where 
                  locv.stock_count_id = 5223
                  and (locv.status = 4 ) 
                  and not exists 
                      (select distinct location_id from stock_count_line line1 
                      where line1.stock_count_id = 5223
                     and line1.location_id = locv.location_id
                      and line1.counted_quantity is null) 
                  and not exists 
                      (select /*+ NO_PUSH_SUBQ */ line2.item_id from stock_count_line line2 
                         where line2.stock_count_id = 5223
                          and line2.counted_quantity is null 
                  and (line2.item_id, locv.location_id) in 
                       (select line3.item_id, line3.location_id from stock_count_line line3  
                       where line3.stock_count_id = 5223
                       and line3.location_id = locv.location_id
                  
                          group by line3.item_id, line3.location_id
                  ));
                  I really don't quite understand the output but here is the explain plan for the above.
                  PLAN_TABLE_OUTPUT
                  
                  Plan hash value: 3795157980
                   
                  ---------------------------------------------------------------------------------------------
                  | Id  | Operation                      | Name                       | Rows  | Bytes | Cost  |
                  ---------------------------------------------------------------------------------------------
                  |   0 | SELECT STATEMENT               |                            |     1 |   276 |    82 |
                  |   1 |  SORT GROUP BY                 |                            |     1 |    45 |       |
                  |*  2 |   INDEX RANGE SCAN             | UK_STOCK_COUNT_LINE     |     1 |    45 |     3 |
                  |   3 |  SORT GROUP BY                 |                            |     1 |    47 |       |
                  |*  4 |   TABLE ACCESS BY INDEX ROWID  | STOCK_COUNT_LINE        |     1 |    47 |     4 |
                  |*  5 |    INDEX RANGE SCAN            | STOCK_COUNT_LINE_I3     |     4 |       |     3 |
                  |*  6 |  FILTER                        |                            |       |       |       |
                  |   7 |   NESTED LOOPS ANTI            |                            |     1 |   276 |    27 |
                  |   8 |    NESTED LOOPS                |                            |     1 |   251 |     2 |
                  |*  9 |     TABLE ACCESS BY INDEX ROWID| STOCK_COUNT_LOCATION    |     1 |   207 |     1 |
                  |* 10 |      INDEX RANGE SCAN          | PK_STOCK_COUNT_LOCATION |     1 |       |       |
                  |  11 |     TABLE ACCESS BY INDEX ROWID| LO_LCN                     |     1 |    44 |     1 |
                  |* 12 |      INDEX UNIQUE SCAN         | PK_LO_LCN                  |     1 |       |       |
                  |* 13 |    TABLE ACCESS BY INDEX ROWID | STOCK_COUNT_LINE        |   178 |  4450 |    25 |
                  |* 14 |     INDEX RANGE SCAN           | UK_STOCK_COUNT_LINE     |    10 |       |    24 |
                  |* 15 |   TABLE ACCESS BY INDEX ROWID  | STOCK_COUNT_LINE        |     9 |   198 |    24 |
                  |* 16 |    INDEX RANGE SCAN            | STOCK_COUNT_LINE_I1     |    72 |       |    19 |
                  |* 17 |     FILTER                     |                            |       |       |       |
                  |  18 |      SORT GROUP BY             |                            |     1 |    36 |    31 |
                  |* 19 |       INDEX RANGE SCAN         | UK_STOCK_COUNT_LINE     |     3 |   108 |    25 |
                  ---------------------------------------------------------------------------------------------
                   
                  Predicate Information (identified by operation id):
                  ---------------------------------------------------
                   
                     2 - access("SLINE"."STOCK_COUNT_ID"=:B1 AND "SLINE"."STORE_ID"=:B2 AND 
                                "SLINE"."LOCATION_ID"=:B3)
                         filter("SLINE"."LOCATION_ID"=:B1)
                     4 - filter("SLINE"."DISCREPANT"='Y' AND "SLINE"."LOCATION_ID"=:B1)
                     5 - access("SLINE"."STOCK_COUNT_ID"=:B1 AND "SLINE"."STORE_ID"=:B2)
                     6 - filter( NOT EXISTS (SELECT /*+ NO_PUSH_SUBQ */ 0 FROM "STOCK_COUNT_LINE" 
                                "SYS_ALIAS_1" WHERE "LINE2"."STOCK_COUNT_ID"=5223 AND "LINE2"."COUNTED_QUANTITY" IS 
                                NULL AND  EXISTS (SELECT 0 FROM "STOCK_COUNT_LINE" "LINE3" WHERE 
                                "LINE3"."STOCK_COUNT_ID"=5223 AND "LINE3"."LOCATION_ID"=:B1 GROUP BY 
                                "LINE3"."ITEM_ID","LINE3"."LOCATION_ID" HAVING "LINE3"."ITEM_ID"=:B2 AND 
                                "LINE3"."LOCATION_ID"=:B3)))
                     9 - filter("SLOC"."STATUS"=4)
                    10 - access("SLOC"."STOCK_COUNT_ID"=5223)
                    12 - access("LOC"."ID_LCN"="SLOC"."LOCATION_ID" AND 
                                "LOC"."ID_STR_RT"="SLOC"."STORE_ID")
                    13 - filter("LINE1"."COUNTED_QUANTITY" IS NULL)
                    14 - access("LINE1"."STOCK_COUNT_ID"=5223 AND 
                                "LINE1"."LOCATION_ID"="SLOC"."LOCATION_ID")
                         filter("LINE1"."LOCATION_ID"="SLOC"."LOCATION_ID")
                    15 - filter("LINE2"."COUNTED_QUANTITY" IS NULL)
                    16 - access("LINE2"."STOCK_COUNT_ID"=5223)
                         filter( EXISTS (SELECT 0 FROM "STOCK_COUNT_LINE" "LINE3" WHERE 
                                "LINE3"."STOCK_COUNT_ID"=5223 AND "LINE3"."LOCATION_ID"=:B1 GROUP BY 
                                "LINE3"."ITEM_ID","LINE3"."LOCATION_ID" HAVING "LINE3"."ITEM_ID"=:B2 AND 
                                "LINE3"."LOCATION_ID"=:B3))
                    17 - filter("LINE3"."ITEM_ID"=:B1 AND "LINE3"."LOCATION_ID"=:B2)
                    19 - access("LINE3"."STOCK_COUNT_ID"=5223 AND "LINE3"."LOCATION_ID"=:B1)
                         filter("LINE3"."LOCATION_ID"=:B1)
                   
                  Note
                  -----
                     - cpu costing is off (consider enabling it)
                  Thanks again for your thoughts.

                  Edited by: amcgator on Aug 20, 2010 11:16 AM
                  • 7. Re: Question about improving group by performance
                    William Robertson
                    Even though you can't change the code, it's nice to be able to see what you're dealing with (Tuning by SQL Formatting ;) )
                    SELECT id
                         , stock_count_id
                         , location_id
                         , store_id
                         , counter_id
                         , recounter_id
                         , approver_id
                         , status
                         , location_description
                         , store_area
                         , item_count
                         , item_discrepant_count
                    FROM   stock_count_location_v locv
                    WHERE  locv.stock_count_id = 5223
                    AND    locv.status = 4
                    AND    NOT EXISTS 
                           ( SELECT DISTINCT location_id
                             FROM   stock_count_line line1 
                             WHERE  line1.stock_count_id = 5223
                             AND    line1.location_id = locv.location_id
                             AND    line1.counted_quantity IS NULL ) 
                    AND    NOT EXISTS 
                           ( SELECT /*+ NO_PUSH_SUBQ */ line2.item_id
                             FROM   stock_count_line line2 
                             WHERE  line2.stock_count_id = 5223
                             AND    line2.counted_quantity IS NULL 
                             AND    (line2.item_id, locv.location_id) IN 
                                    ( SELECT line3.item_id, line3.location_id
                                      FROM   stock_count_line line3  
                                      WHERE  line3.stock_count_id = 5223
                                      AND    line3.location_id = locv.location_id
                                      GROUP BY line3.item_id, line3.location_id ) );
                    You can see straight away that the DISTINCT and GROUP BY are unnecessary. The optimizer seems to have ignored the DISTINCT but not the GROUP BY. Your vendor might try removing the hint as well.

                    Have you checked the stats? http://structureddata.org/2007/11/21/troubleshooting-bad-execution-plans/
                    1 person found this helpful