Forum Stats

  • 3,757,562 Users
  • 2,251,245 Discussions
  • 7,869,866 Comments

Discussions

Grouping Sets

Sven W.
Sven W. Member Posts: 10,533 Gold Crown
edited Jun 30, 2010 9:24AM in SQL & PL/SQL
Database Version 10.2.0.4

prelimary information

I'm preparing some statistical data. Here is my simplified example:
[email protected]> with testdata as (select 'Europe' site, 1 product, 10 amount from dual union all
  2                    select 'Europe' site, 2 product, 20 amount from dual union all
  3                    select 'USA' site, 1 product, 30 amount from dual union all
  4                    select 'USA' site, 3 product, 40 amount from dual union all
  5                    select 'Japan' site, 1 product, 50 amount from dual union all
  6                    select 'Japan' site, 2 product, 60 amount from dual )
  7  select * from testdata;

SITE      PRODUCT     AMOUNT
------ ---------- ----------
Europe          1         10
Europe          2         20
USA             1         30
USA             3         40
Japan           1         50
Japan           2         60

6 rows selected.

[email protected]> 
I now want to group on the regions (sites) and count the number of different products there.
Also I have a total distinct count which is important.

This is the current select:
-- number of different products on each site
with testdata as (select 'Europe' site, 1 product, 10 amount from dual union all
                  select 'Europe' site, 2 product, 20 amount from dual union all
                  select 'Europe' site, 2 product, 25 amount from dual union all
                  select 'USA' site, 1 product, 30 amount from dual union all
                  select 'USA' site, 3 product, 40 amount from dual union all
                  select 'Japan' site, 1 product, 50 amount from dual union all
                  select 'Japan' site, 2 product, 60 amount from dual )
select decode(grouping_id(site), 0, site, 'all sites') as site, 
       count(distinct product) "# unique products", sum(amount) sum_amo 
from testdata
group by grouping sets ((site), ());

SITE      # unique products    SUM_AMO
--------- ----------------- ----------
Europe                    2         55
Japan                     2        110
USA                       2         70
all sites                 3        235
problem

I now want to add a special line "all sites without Japan".

Somehow I can't find a decent way to add this row.

Wanted Output:
SITE      # unique products    SUM_AMO
--------- ----------------- ----------
Europe                    2         55
Japan                     2        110
USA                       2         70
all sites w/o Japan       3        125
all sites                 3        235
My real select runs over aproximately 6 million records and needs a few minutes to sort this (count distinct is slow). So I'm prefering a solution where I do not access the same table twice (else this would be too easy).

Any ideas?

Answers

  • MScallion
    MScallion Member Posts: 1,267
    edited Jun 29, 2010 11:27AM
    One way;
    with subq as
     (select site, sum(amount) sum_amount
      from testdata
      group by site)
    select site, sum_amount
    from (select 1 grp, site, sum_amount
          from subq
          union all
          select 2 grp,
                 'all sites w/o Japan',
                 sum(case
                        when site != 'Japan' then
                         sum_amount
                        else
                         0
                      end) sum_amount
          from subq
          union all
          select 3 grp, 'all sites', sum(sum_amount)
          from subq)
    order by grp, site;
     
    SITE                SUM_AMOUNT
    ------------------- ----------
    Europe                      55
    Japan                      110
    USA                         70
    all sites w/o Japan        125
    all sites                  235
    Edited by: MScallion on Jun 29, 2010 8:26 AM
    Changed not equal to !=
    MScallion
  • MScallion
    MScallion Member Posts: 1,267
    edited Jun 29, 2010 11:28AM
    Duplicate post due to web site issues

    Edited by: MScallion on Jun 29, 2010 8:28 AM
  • Centinul
    Centinul Member Posts: 6,871 Bronze Crown
    edited Jun 29, 2010 11:31AM
    Note sure if it's the most efficient but here is a shot:
    SQL> with testdata as (select 'Europe' site, 1 product, 10 amount from dual union all
      2                    select 'Europe' site, 2 product, 20 amount from dual union all
      3                    select 'Europe' site, 2 product, 25 amount from dual union all
      4                    select 'USA' site, 1 product, 30 amount from dual union all
      5                    select 'USA' site, 3 product, 40 amount from dual union all
      6                    select 'Japan' site, 1 product, 50 amount from dual union all
      7                    select 'Japan' site, 2 product, 60 amount from dual )
      8  SELECT DECODE
      9         ( GROUPING_ID(site_indicator,site)
     10         , 1 , 'all sites w/o Japan'
     11         , 3 , 'all sites'
     12         , site
     13         )                        AS site
     14       , COUNT(DISTINCT product)  AS "# unique products"
     15       , SUM(amount)              AS sum_amo
     16  FROM
     17  (
     18          SELECT site
     19               , product
     20               , amount
     21               , CASE
     22                     WHEN site != 'Japan'
     23                     THEN 0
     24                     ELSE 1
     25                 END                      AS site_indicator
     26          FROM   testdata
     27  )
     28  GROUP BY ROLLUP(site_indicator,site)
     29  HAVING ( GROUPING_ID(site_indicator,site)
     30         , NVL(site_indicator,0)
     31         ) NOT IN ( (1,1) )
     32  /
    
    SITE                   # unique products              SUM_AMO
    ------------------- -------------------- --------------------
    Europe                                 2                   55
    USA                                    2                   70
    Japan                                  2                  110
    all sites w/o Japan                    3                  125
    all sites                              3                  235
    Centinul
  • MScallion
    MScallion Member Posts: 1,267
    edited Jun 29, 2010 11:30AM
    Triplicate post due to web site issues

    Edited by: MScallion on Jun 29, 2010 8:29 AM
    MScallion
  • Sven W.
    Sven W. Member Posts: 10,533 Gold Crown
    edited Jun 30, 2010 3:14AM
    I couldn't respond yesterday because of forum issues.

    I like how Centinul used the different values of the grouping_id. I can't properly use ROLLUP clause, since there are more grouping sets in the original query. But thats not a big difference.

    I found a slightly different version yesterday myself:
    with testdata as (select 'Europe' site, 1 product, 10 amount from dual union all
                      select 'Europe' site, 2 product, 20 amount from dual union all
                      select 'Europe' site, 2 product, 25 amount from dual union all
                      select 'USA' site, 1 product, 30 amount from dual union all
                      select 'USA' site, 3 product, 40 amount from dual union all
                      select 'Japan' site, 1 product, 50 amount from dual union all
                      select 'Japan' site, 2 product, 60 amount from dual )
    select decode(grouping_id(site), 0, site, 'all sites')
             ||(case when site!='Japan' then ' w/o Japan' end) as site,
           count(distinct product) "# unique products", sum(amount) sum_amo 
           --,grouping_id (site) 
           --,grouping_id (case when site!='Japan' then ' w/o Japan' end)
    from testdata
    group by grouping sets((site),
                         (case when site!='Japan' then ' w/o Japan' end),
                         ())
    having not (grouping_id (case when site!='Japan' then ' w/o Japan' end) = 0
                and case when site!='Japan' then ' w/o Japan' end is null
                );
    
    SITE                   # unique products              SUM_AMO
    ------------------- -------------------- --------------------
    Europe                                 2                   55
    USA                                    2                   70
    Japan                                  2                  110
    all sites w/o Japan                    3                  125
    all sites                              3                  235
    Different is that no change in the from clause is required. Instead only a new grouping set is added. On the other hand this "site indicator" can be handled much better in future changes. I will see which version is best suited for the real case.

    Edited by: Sven W. on Jun 30, 2010 9:10 AM

    Edited by: Sven W. on Jun 30, 2010 9:12 AM - gave additional points to McScallion for beeing the first to respond
  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    I like model clause B-)
    someHow We have to use upsert all first at rules clause
    with testdata(site,product,amount) as (
    select 'Europe',1,10 from dual union all
    select 'Europe',2,20 from dual union all
    select 'Europe',2,25 from dual union all
    select 'USA'   ,1,30 from dual union all
    select 'USA'   ,3,40 from dual union all
    select 'Japan' ,1,50 from dual union all
    select 'Japan' ,2,60 from dual)
    select *
      from testdata
     model return updated rows
    dimension by(cast(site as varchar2(30)) as site,RowNum as rn)
    measures(product,amount,0 as uniP,0 as SUM_AMO)
    rules(
    upsert all uniP[any,0]= count(distinct product)[cv(),any],
    upsert all SUM_AMO[any,0] = sum(amount)[cv(),any],
    uniP['all sites',0] = count(distinct product)[any,rn > 0],
    uniP['all sites w/o Japan',0] = count(distinct product)[site!='Japan',rn > 0],
    SUM_AMO['all sites',0] = sum(amount)[any,rn > 0],
    SUM_AMO['all sites w/o Japan',0] = sum(amount)[site!='Japan',rn > 0])
    order by SITE;
    
    SITE                 RN    PRODUCT  AMOUNT  UNIP  SUM_AMO
    -------------------  --  ---------  ------  ----  -------
    Europe                0  null       null       2       55
    Japan                 0  null       null       2      110
    USA                   0  null       null       2       70
    all sites             0  null       null       3      235
    all sites w/o Japan   0  null       null       3      125
    Aketi Jyuuzou
  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    There is grouping sets solution ;-)
    select case grouping_ID((decode(SITE,'Japan',1)),(site))
           when 2+1 then 'all sites'
           when 0+1 then 'all sites w/o Japan'
           else SITE end as site,
    count(distinct PRODUCT) as UNIP,sum(AMOUNT) as SUM_AMO
      from testdata
    group by grouping sets ((),(decode(SITE,'Japan',1)),(site))
    having decode(SITE,'Japan',1) is null
    order by site;
    
    SITE                 UNIP    SUM_AMO
    -------------------  ----  ---------
    Europe                  2         55
    Japan                   2        110
    USA                     2         70
    all sites               3        235
    all sites w/o Japan     3        125
    Aketi Jyuuzou
This discussion has been closed.