12 Replies Latest reply: Jun 1, 2013 6:12 AM by 922306 RSS

    Fastest way to fetch DISTINCT values from partitioned table

    922306
      What is the fastest way to fetch DISTINCT values from partitioned table?
      1) DISTINCT/UNIQUE
      2) GROUP BY
      3) PARTITION BY OVER()
      4) MAX(ROWID)
      Table Definition
      CREATE TABLE STG_SOS_SALES_FACT_STUDY
      (
        CNTRY_KEY     NUMBER,
        STUDY_ID      NUMBER,
        PRD_KEY_YEAR  NUMBER,
        PRD_KEY_WEEK  NUMBER,
        DATE_FROM     DATE,
        DATE_TO       DATE,
        STR_KEY       NUMBER,
        STR_SRVC      NUMBER,
        CTG_ID        NUMBER,
        PRDC_KEY      NUMBER,
        UNITS         NUMBER,
        DOL           NUMBER(17,2),
        FL_ID         NUMBER
      )
      
      -> PARTITION BY RANGE (PRD_KEY_YEAR, PRD_KEY_WEEK)
      -> SUBPARTITION BY LIST (CNTRY_KEY)
      
      ** Local Partition Indexes
      1) CN_SD_CTG_PRD_PRDC_IDX = STG_SOS_SALES_FACT_STUDY (PRD_KEY_YEAR, PRD_KEY_WEEK, CNTRY_KEY, STUDY_ID, CTG_ID, PRDC_KEY)
      2) CN_SD_PRD_STR_CTG_IDX = STG_SOS_SALES_FACT_STUDY (PRD_KEY_YEAR, PRD_KEY_WEEK, CNTRY_KEY, STUDY_ID, STR_KEY)
      #Query:
      SELECT DISTINCT PRD_KEY_WEEK, PRD_KEY_YEAR
      FROM stg_sos_sales_fact_study a
      WHERE CNTRY_KEY = 484
      AND ( ( (A.PRD_KEY_WEEK BETWEEN 14 AND 52 AND A.PRD_KEY_YEAR = 2012)
      OR (A.PRD_KEY_WEEK BETWEEN 1 AND 14 AND A.PRD_KEY_YEAR = 2013)))
      AND STUDY_ID IN (22573064, 35328585, 22573064);
      ** Explain Plan:
      Plan
      SELECT STATEMENT  ALL_ROWSCost: 6,235  Bytes: 629  Cardinality: 37                           
            8 HASH UNIQUE  Cost: 6,235  Bytes: 629  Cardinality: 37                      
               7 CONCATENATION                 
                    3 PARTITION RANGE ITERATOR  Cost: 1,985  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 3  Partitions accessed #194 - #207          
                        2 PARTITION LIST SINGLE  Cost: 1,985  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 4  Partitions determined by Key Values     
             1 INDEX SKIP SCAN INDEX SOS_ODS.CN_SD_PRD_STR_CTG_IDX Cost: 1,985  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 4  Partitions determined by Key Values
                 6 PARTITION RANGE ITERATOR  Cost: 4,243  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 6  Partitions accessed #155 - #193          
                   5 PARTITION LIST SINGLE  Cost: 4,243  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 7  Partitions determined by Key Values     
                    4 INDEX SKIP SCAN INDEX SOS_ODS.CN_SD_PRD_STR_CTG_IDX Cost: 4,243  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 7  Partitions determined by Key Values
      The above query is taking around 6-7 minutes to fetch the data.

      Edited by: meet_sanc on 12 May, 2013 11:34 AM
        • 1. Re: Fastest way to fetch DISTINCT values from partitioned table
          sb92075
          use UNIQUE INDEX then every row is guaranteed to be distinct
          • 2. Re: Fastest way to fetch DISTINCT values from partitioned table
            922306
            The existing indexes are not unique indexes.

            Is there any way, we can push unique index hint?
            • 3. Re: Fastest way to fetch DISTINCT values from partitioned table
              Etbin
              Maybe (guessing)
              select distinct
                     prd_key_week,
                     prd_key_year
                from stg_sos_sales_fact_study
               where cntry_key = 484
                 and 100 * prd_key_year + prd_key_week between 201214 and 201314
                 and study_id in (22573064,35328585,22573064)
              Regards

              Etbin
              • 4. Re: Fastest way to fetch DISTINCT values from partitioned table
                Martin Preiss
                your execution plan shows an index skip scan: so the engine skips the columns PRD_KEY_YEAR and PRD_KEY_WEEK and starts the scan with CNTRY_KEY. But since there is only one value for the combination (PRD_KEY_YEAR, PRD_KEY_WEEK, CNTRY_KEY) in each subpartition this should not be a problem. There is no TABLE ACCESS in the plan: all the relevant information is found in the index. It would be interesting to get a sql trace of the execution (or rowsource statistics or sql monitoring if you have the fitting release and the necessary licences) to determine in which step the time is spend. But if my interpretation is correct and the skip scan is not an issue, and if the 6-7 min are not the best your server can provide - then you could be facing a problem with Hash Aggregation (HASH UNIQUE) that Randolf Geist wrote about in his blog: http://oracle-randolf.blogspot.de/2011/01/hash-aggregation.html. He mentions there some possible workarounds: using manual workarea size policy, or deactivating the hash aggregation with the hint NO_USE_HASH_AGGREGATION - and I would check if these options have a beneficial effect on your query.

                Regards

                Martin
                • 5. Re: Fastest way to fetch DISTINCT values from partitioned table
                  922306
                  Thanks Martin.

                  I tried using NO_USE_HASH_AGGREGATION hint, but there is no improvement.
                  The parameter valuees are as below:

                  pga_aggregate_target=2G
                  workarea_size_policy=AUTO
                  sort_area_size=65536
                  SELECT /*+ NO_USE_HASH_AGGREGATION */
                           DISTINCT PRD_KEY_WEEK, PRD_KEY_YEAR
                    FROM stg_sos_sales_fact_study x
                   WHERE     CNTRY_KEY = 484
                         AND ( (   (PRD_KEY_WEEK BETWEEN 14 AND 52 AND PRD_KEY_YEAR = 2012)
                                OR (PRD_KEY_WEEK BETWEEN 1 AND 14 AND PRD_KEY_YEAR = 2013)))
                         AND STUDY_ID IN (22573064, 35328585, 22573064)
                  
                  Plan
                  SELECT STATEMENT  ALL_ROWSCost: 7,235  Bytes: 629  Cardinality: 37                           
                    8 SORT UNIQUE  Cost: 7,235  Bytes: 629  Cardinality: 37                      
                       7 CONCATENATION                 
                          3 PARTITION RANGE ITERATOR  Cost: 2,304  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 3  Partitions accessed #194 - #207          
                             2 PARTITION LIST SINGLE  Cost: 2,304  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 4  Partitions determined by Key Values     
                                1 INDEX SKIP SCAN INDEX SOS_ODS.CN_SD_PRD_STR_CTG_IDX Cost: 2,304  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 4  Partitions determined by Key Values
                                   6 PARTITION RANGE ITERATOR  Cost: 4,923  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 6  Partitions accessed #155 - #193          
                                      5 PARTITION LIST SINGLE  Cost: 4,923  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 7  Partitions determined by Key Values     
                                         4 INDEX SKIP SCAN INDEX SOS_ODS.CN_SD_PRD_STR_CTG_IDX Cost: 4,923  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 7  Partitions determined by Key Values
                  Also I am not able to push parallel hint for this table.

                  Any other option to improve the performance?
                  • 6. Re: Fastest way to fetch DISTINCT values from partitioned table
                    Martin Preiss
                    did you try to use workarea_size_policy = manual (and a fitting value for SORT_AREA_SIZE) for the session? By the way: what is your release version?
                    • 7. Re: Fastest way to fetch DISTINCT values from partitioned table
                      922306
                      I have not tried with using workarea_size_policy = manual
                      The query will be executed through applicaiton & we are not allowed to change to the oracle parameters.

                      The Oracle release version - 11.2.0.1.0 (2 node RAC)
                      • 8. Re: Fastest way to fetch DISTINCT values from partitioned table
                        Martin Preiss
                        perhaps you could try to set the workarea_size_policy for a test session to determine if the HASH UNIQUE operation is a problem at all. If you have a problem with the hash aggregation then I think there are ways to bring the execution engine to a more sensible resource usage.
                        • 9. Re: Fastest way to fetch DISTINCT values from partitioned table
                          922306
                          The PARALLEL hint is improving the performance:
                          SELECT /*+  PARALLEL 4 */
                                  DISTINCT PRD_KEY_WEEK, PRD_KEY_YEAR
                              FROM STG_SOS_SALES_FACT_STUDY A
                             WHERE     A.CNTRY_KEY = 484
                                   AND ( (   (A.PRD_KEY_WEEK BETWEEN 14 AND 52 AND A.PRD_KEY_YEAR = 2012)
                                          OR (A.PRD_KEY_WEEK BETWEEN 1 AND 14 AND A.PRD_KEY_YEAR = 2013)))
                                   AND STUDY_ID IN (22573064, 35328585, 22573064);
                          Execution Time: 1 min 29 secs
                          • 10. Re: Fastest way to fetch DISTINCT values from partitioned table
                            Martin Preiss
                            a higher degree of parallelism improves the performance of the operation - but at the cost of a higher resource usage: the operation is executed by n parallel prozesses. If that's ok - and your server is big enough to provide the resources for concurrent operations then parallelization is an option.
                            • 11. Re: Fastest way to fetch DISTINCT values from partitioned table
                              Randolf Geist
                              meet_sanc wrote:
                              Any other option to improve the performance?
                              Try to use a systematic approach - which means you first need to understand where your statement spends its time. If you have a Diagnostics + Tuning Pack license, pull an active Real Time SQL Monitoring report for your statement execution, otherwise enable Rowsource Statistics (e.g. set STATISTICS_LEVEL =ALL in your session) and use DBMS_XPLAN.DISPLAY_CURSOR to see instrumentation at execution plan line level.

                              You can follow for example my rather old blogpost here: http://oracle-randolf.blogspot.com/2009/02/basic-sql-statement-performance.html

                              Randolf
                              • 12. Re: Fastest way to fetch DISTINCT values from partitioned table
                                922306
                                The following approach is giving results in 5-8 seconds.
                                We have used list of year & week for IN clause to fetch data from respective partitions.
                                SELECT PRD_KEY_WEEK, PRD_KEY_YEAR   from stg_sos_sales_fact_study A
                                WHERE A.CNTRY_KEY = 484
                                   AND (A.PRD_KEY_YEAR,A.PRD_KEY_WEEK) in (
                                (2012,14),(2012,15),(2012,16),(2012,17),(2012,18),(2012,19),(2012,20),(2012,21), (2012,22),(2012,23),(2012,24),(2012,25),(2012,26),(2012,27),
                                (2012,28),(2012,29),(2012,30),(2012,31),(2012,32),(2012,33),(2012,34),(2012,35),(2012,36),(2012,37),(2012,38),(2012,39),(2012,40),(2012,41),
                                (2012,42),(2012,43),(2012,44),(2012,45),(2012,46),(2012,47),(2012,48),(2012,49),(2012,50),(2012,51),(2012,52),(2013,1),(2013,2),(2013,3),
                                (2013,4),(2013,5),(2013,6),(2013,7),(2013,8),(2013,9),(2013,10),(2013,11),(2013,12),(2013,13),(2013,14)
                                )
                                   AND STUDY_ID IN (22573064, 35328585, 22573064)
                                GROUP BY PRD_KEY_WEEK, PRD_KEY_YEAR