This discussion is archived
12 Replies Latest reply: Jun 1, 2013 4:12 AM by 922306 RSS

Fastest way to fetch DISTINCT values from partitioned table

922306 Newbie
Currently Being Moderated
What is the fastest way to fetch DISTINCT values from partitioned table?
1) DISTINCT/UNIQUE
2) GROUP BY
3) PARTITION BY OVER()
4) MAX(ROWID)
Table Definition
CREATE TABLE STG_SOS_SALES_FACT_STUDY
(
  CNTRY_KEY     NUMBER,
  STUDY_ID      NUMBER,
  PRD_KEY_YEAR  NUMBER,
  PRD_KEY_WEEK  NUMBER,
  DATE_FROM     DATE,
  DATE_TO       DATE,
  STR_KEY       NUMBER,
  STR_SRVC      NUMBER,
  CTG_ID        NUMBER,
  PRDC_KEY      NUMBER,
  UNITS         NUMBER,
  DOL           NUMBER(17,2),
  FL_ID         NUMBER
)

-> PARTITION BY RANGE (PRD_KEY_YEAR, PRD_KEY_WEEK)
-> SUBPARTITION BY LIST (CNTRY_KEY)

** Local Partition Indexes
1) CN_SD_CTG_PRD_PRDC_IDX = STG_SOS_SALES_FACT_STUDY (PRD_KEY_YEAR, PRD_KEY_WEEK, CNTRY_KEY, STUDY_ID, CTG_ID, PRDC_KEY)
2) CN_SD_PRD_STR_CTG_IDX = STG_SOS_SALES_FACT_STUDY (PRD_KEY_YEAR, PRD_KEY_WEEK, CNTRY_KEY, STUDY_ID, STR_KEY)
#Query:
SELECT DISTINCT PRD_KEY_WEEK, PRD_KEY_YEAR
FROM stg_sos_sales_fact_study a
WHERE CNTRY_KEY = 484
AND ( ( (A.PRD_KEY_WEEK BETWEEN 14 AND 52 AND A.PRD_KEY_YEAR = 2012)
OR (A.PRD_KEY_WEEK BETWEEN 1 AND 14 AND A.PRD_KEY_YEAR = 2013)))
AND STUDY_ID IN (22573064, 35328585, 22573064);
** Explain Plan:
Plan
SELECT STATEMENT  ALL_ROWSCost: 6,235  Bytes: 629  Cardinality: 37                           
      8 HASH UNIQUE  Cost: 6,235  Bytes: 629  Cardinality: 37                      
         7 CONCATENATION                 
              3 PARTITION RANGE ITERATOR  Cost: 1,985  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 3  Partitions accessed #194 - #207          
                  2 PARTITION LIST SINGLE  Cost: 1,985  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 4  Partitions determined by Key Values     
       1 INDEX SKIP SCAN INDEX SOS_ODS.CN_SD_PRD_STR_CTG_IDX Cost: 1,985  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 4  Partitions determined by Key Values
           6 PARTITION RANGE ITERATOR  Cost: 4,243  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 6  Partitions accessed #155 - #193          
             5 PARTITION LIST SINGLE  Cost: 4,243  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 7  Partitions determined by Key Values     
              4 INDEX SKIP SCAN INDEX SOS_ODS.CN_SD_PRD_STR_CTG_IDX Cost: 4,243  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 7  Partitions determined by Key Values
The above query is taking around 6-7 minutes to fetch the data.

Edited by: meet_sanc on 12 May, 2013 11:34 AM
  • 1. Re: Fastest way to fetch DISTINCT values from partitioned table
    sb92075 Guru
    Currently Being Moderated
    use UNIQUE INDEX then every row is guaranteed to be distinct
  • 2. Re: Fastest way to fetch DISTINCT values from partitioned table
    922306 Newbie
    Currently Being Moderated
    The existing indexes are not unique indexes.

    Is there any way, we can push unique index hint?
  • 3. Re: Fastest way to fetch DISTINCT values from partitioned table
    Etbin Guru
    Currently Being Moderated
    Maybe (guessing)
    select distinct
           prd_key_week,
           prd_key_year
      from stg_sos_sales_fact_study
     where cntry_key = 484
       and 100 * prd_key_year + prd_key_week between 201214 and 201314
       and study_id in (22573064,35328585,22573064)
    Regards

    Etbin
  • 4. Re: Fastest way to fetch DISTINCT values from partitioned table
    Martin Preiss Expert
    Currently Being Moderated
    your execution plan shows an index skip scan: so the engine skips the columns PRD_KEY_YEAR and PRD_KEY_WEEK and starts the scan with CNTRY_KEY. But since there is only one value for the combination (PRD_KEY_YEAR, PRD_KEY_WEEK, CNTRY_KEY) in each subpartition this should not be a problem. There is no TABLE ACCESS in the plan: all the relevant information is found in the index. It would be interesting to get a sql trace of the execution (or rowsource statistics or sql monitoring if you have the fitting release and the necessary licences) to determine in which step the time is spend. But if my interpretation is correct and the skip scan is not an issue, and if the 6-7 min are not the best your server can provide - then you could be facing a problem with Hash Aggregation (HASH UNIQUE) that Randolf Geist wrote about in his blog: http://oracle-randolf.blogspot.de/2011/01/hash-aggregation.html. He mentions there some possible workarounds: using manual workarea size policy, or deactivating the hash aggregation with the hint NO_USE_HASH_AGGREGATION - and I would check if these options have a beneficial effect on your query.

    Regards

    Martin
  • 5. Re: Fastest way to fetch DISTINCT values from partitioned table
    922306 Newbie
    Currently Being Moderated
    Thanks Martin.

    I tried using NO_USE_HASH_AGGREGATION hint, but there is no improvement.
    The parameter valuees are as below:

    pga_aggregate_target=2G
    workarea_size_policy=AUTO
    sort_area_size=65536
    SELECT /*+ NO_USE_HASH_AGGREGATION */
             DISTINCT PRD_KEY_WEEK, PRD_KEY_YEAR
      FROM stg_sos_sales_fact_study x
     WHERE     CNTRY_KEY = 484
           AND ( (   (PRD_KEY_WEEK BETWEEN 14 AND 52 AND PRD_KEY_YEAR = 2012)
                  OR (PRD_KEY_WEEK BETWEEN 1 AND 14 AND PRD_KEY_YEAR = 2013)))
           AND STUDY_ID IN (22573064, 35328585, 22573064)
    
    Plan
    SELECT STATEMENT  ALL_ROWSCost: 7,235  Bytes: 629  Cardinality: 37                           
      8 SORT UNIQUE  Cost: 7,235  Bytes: 629  Cardinality: 37                      
         7 CONCATENATION                 
            3 PARTITION RANGE ITERATOR  Cost: 2,304  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 3  Partitions accessed #194 - #207          
               2 PARTITION LIST SINGLE  Cost: 2,304  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 4  Partitions determined by Key Values     
                  1 INDEX SKIP SCAN INDEX SOS_ODS.CN_SD_PRD_STR_CTG_IDX Cost: 2,304  Bytes: 1,031,900  Cardinality: 60,700  Partition #: 4  Partitions determined by Key Values
                     6 PARTITION RANGE ITERATOR  Cost: 4,923  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 6  Partitions accessed #155 - #193          
                        5 PARTITION LIST SINGLE  Cost: 4,923  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 7  Partitions determined by Key Values     
                           4 INDEX SKIP SCAN INDEX SOS_ODS.CN_SD_PRD_STR_CTG_IDX Cost: 4,923  Bytes: 2,719,371  Cardinality: 159,963  Partition #: 7  Partitions determined by Key Values
    Also I am not able to push parallel hint for this table.

    Any other option to improve the performance?
  • 6. Re: Fastest way to fetch DISTINCT values from partitioned table
    Martin Preiss Expert
    Currently Being Moderated
    did you try to use workarea_size_policy = manual (and a fitting value for SORT_AREA_SIZE) for the session? By the way: what is your release version?
  • 7. Re: Fastest way to fetch DISTINCT values from partitioned table
    922306 Newbie
    Currently Being Moderated
    I have not tried with using workarea_size_policy = manual
    The query will be executed through applicaiton & we are not allowed to change to the oracle parameters.

    The Oracle release version - 11.2.0.1.0 (2 node RAC)
  • 8. Re: Fastest way to fetch DISTINCT values from partitioned table
    Martin Preiss Expert
    Currently Being Moderated
    perhaps you could try to set the workarea_size_policy for a test session to determine if the HASH UNIQUE operation is a problem at all. If you have a problem with the hash aggregation then I think there are ways to bring the execution engine to a more sensible resource usage.
  • 9. Re: Fastest way to fetch DISTINCT values from partitioned table
    922306 Newbie
    Currently Being Moderated
    The PARALLEL hint is improving the performance:
    SELECT /*+  PARALLEL 4 */
            DISTINCT PRD_KEY_WEEK, PRD_KEY_YEAR
        FROM STG_SOS_SALES_FACT_STUDY A
       WHERE     A.CNTRY_KEY = 484
             AND ( (   (A.PRD_KEY_WEEK BETWEEN 14 AND 52 AND A.PRD_KEY_YEAR = 2012)
                    OR (A.PRD_KEY_WEEK BETWEEN 1 AND 14 AND A.PRD_KEY_YEAR = 2013)))
             AND STUDY_ID IN (22573064, 35328585, 22573064);
    Execution Time: 1 min 29 secs
  • 10. Re: Fastest way to fetch DISTINCT values from partitioned table
    Martin Preiss Expert
    Currently Being Moderated
    a higher degree of parallelism improves the performance of the operation - but at the cost of a higher resource usage: the operation is executed by n parallel prozesses. If that's ok - and your server is big enough to provide the resources for concurrent operations then parallelization is an option.
  • 11. Re: Fastest way to fetch DISTINCT values from partitioned table
    Randolf Geist Oracle ACE Director
    Currently Being Moderated
    meet_sanc wrote:
    Any other option to improve the performance?
    Try to use a systematic approach - which means you first need to understand where your statement spends its time. If you have a Diagnostics + Tuning Pack license, pull an active Real Time SQL Monitoring report for your statement execution, otherwise enable Rowsource Statistics (e.g. set STATISTICS_LEVEL =ALL in your session) and use DBMS_XPLAN.DISPLAY_CURSOR to see instrumentation at execution plan line level.

    You can follow for example my rather old blogpost here: http://oracle-randolf.blogspot.com/2009/02/basic-sql-statement-performance.html

    Randolf
  • 12. Re: Fastest way to fetch DISTINCT values from partitioned table
    922306 Newbie
    Currently Being Moderated
    The following approach is giving results in 5-8 seconds.
    We have used list of year & week for IN clause to fetch data from respective partitions.
    SELECT PRD_KEY_WEEK, PRD_KEY_YEAR   from stg_sos_sales_fact_study A
    WHERE A.CNTRY_KEY = 484
       AND (A.PRD_KEY_YEAR,A.PRD_KEY_WEEK) in (
    (2012,14),(2012,15),(2012,16),(2012,17),(2012,18),(2012,19),(2012,20),(2012,21), (2012,22),(2012,23),(2012,24),(2012,25),(2012,26),(2012,27),
    (2012,28),(2012,29),(2012,30),(2012,31),(2012,32),(2012,33),(2012,34),(2012,35),(2012,36),(2012,37),(2012,38),(2012,39),(2012,40),(2012,41),
    (2012,42),(2012,43),(2012,44),(2012,45),(2012,46),(2012,47),(2012,48),(2012,49),(2012,50),(2012,51),(2012,52),(2013,1),(2013,2),(2013,3),
    (2013,4),(2013,5),(2013,6),(2013,7),(2013,8),(2013,9),(2013,10),(2013,11),(2013,12),(2013,13),(2013,14)
    )
       AND STUDY_ID IN (22573064, 35328585, 22573064)
    GROUP BY PRD_KEY_WEEK, PRD_KEY_YEAR

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points