Forum Stats

  • 3,733,254 Users
  • 2,246,738 Discussions
  • 7,856,636 Comments

Discussions

Groupping looking into previous group in one query... analythics help ?

457632
457632 Member Posts: 9
edited July 2008 in SQL & PL/SQL
Hi all,

I am trying to do a single query for the following situation, please let me know if this is possible, maybe using analythics.

My table:

PK_ID SAMPLE_ID HABITAT_TYPE EFFORT_NO SPECIES
----------------------------------------------------------------------------------------------------------
1 1 Riffle 1 A
2 1 Riffle 1 B
3 1 Riffle 1 C
4 1 Riffle 2 B
5 1 Riffle 2 D
6 1 Pool 3 C
7 1 Pool 3 D
8 1 Pool 3 E
9 1 Riffle 4 F
10 2 Pool 1 A

I want a query that groups the data by SAMPLE_ID, HABITAT_TYPE and EFFORT_NO, ordered by PK_ID. In the result, I need to count the DISTINCT SPECIES for the current EFFORT_NO and ALL the previous EFFORT_NOs. The result should look like this:


SAMPLE_ID HABITAT_TYPE EFFORT_NO COUNT
--------------------------------------------------------------------------------------------------
1 Riffle 1 3
1 Riffle 2 4
1 Pool 3 5
1 Riffle 4 6
2 Pool 1 1

So, for the first result row, A, B, and C should be counted as 3 from PK_IDs (1,2,3).

For the second result row, A, B, C and D should be counted as 4 from PK_IDs (1,2,3,4,5)

For the third result row, A, B, C, D and E should be counted as 5 from PK_IDs (1,2,3,4,5,6,7,8)

Can this be done with one sigle query ?

Thanks,
Danny

Comments

  • 457632
    457632 Member Posts: 9
    Clarification:

    I don't want some query that uses subqueries to requery the table after doing the groups to get the counts, I want an optimal query (if possible) that does the counts and the groups in one pass through the table.

    Thanks,
    Danny
  • 646947
    646947 Member Posts: 116
    How exactly can you ORDER BY PK_ID if you do not want to GROUP BY it?
  • 457632
    457632 Member Posts: 9
    That's the question... Can analythics help ? As far as I know, analythics are applied after the groupping... I'm looking for an idea, just as an academic exercise, how can this be done in the most optimal way ?

    Thanks,
    Danny
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,492 Red Diamond
    Hi,
    I want a query that groups the data by SAMPLE_ID, HABITAT_TYPE and
    EFFORT_NO, ordered by PK_ID.
    If you're not grouping by pk_id, then which pk_id do you want? The average?

    Anyway, I don't think you can do this without a sub-query. That does not mean you will query the table twice: you will query the table once (say, to do a GROUP BY query), and then query that result set (say, to do the analytic SUM). If the table has a million rows, and the data falls into 100 groups, the sub-query will do a single pass through the million rows, and the main query will do a pass through the 100 aggregate rows.

    You can do this with all analytic functions (no aggregates), but it will still involve a sub-query, since you can't nest analytic functions.
    You can also do it with scalar aub-queries, but that would be the least efficient of all.
  • 457632
    457632 Member Posts: 9
    Frank,

    If I'm groupping first, there's no way to get the DISTINCT counts for the next groups - I need to get the distinct counts for group 1 (as first group); for group 1 AND 2 (as second group); for group 1 AND 2 AND 3 (as third group) etc.

    Thanks,
    Danny
  • 457632
    457632 Member Posts: 9
    Basically, this is the query that gives me the desired result:

    SELECT gq.sample_id, gq.habitat_type, gq.effort_no,
    (SELECT COUNT (DISTINCT species)
    FROM roll_test b
    WHERE gq.sample_id = b.sample_id
    AND gq.effort_no >= b.effort_no) AS COUNT
    FROM (SELECT sample_id, habitat_type, effort_no
    FROM roll_test a
    GROUP BY sample_id, habitat_type, effort_no) gq
    ORDER BY 1, 3;

    The question is, can it be written in a different way (optimal) to do everything in one pass (grouppings and counting) ?

    Thanks,
    Danny
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,492 Red Diamond
    Hi,

    Sorry, I didn't understand the problem before.

    It seems that you want to order by effort_no, with every sample_id forming a different group: pk_id plays no role in this problem.

    You would like to do:
    ,	COUNT (DISTINCT species)
    OVER ( PARTITION BY sample_id
    ORDER BY effort_no
    , habitat_type -- ?
    ) AS cnt
    but you can't. You'll get the error ORA-30487 if you try to use an analytic function with both DISTINCT and ORDER BY.

    Instead of a scalar sub-query, like you posted, I would be inclined to do a self-join:
    SELECT 	a.sample_id
    , a.habitat_type
    , a.effort_no
    , COUNT (DISTINCT b.species) AS cnt
    FROM roll_test a
    JOIN roll_test b
    ON a.sample_id = b.sample_id
    AND a.effort_no >= b.effort_no
    GROUP BY a.sample_id, a.effort_no, a.habitat_type
    ORDER BY a.sample_id, a.effort_no, a.habitat_type
    ;
    This assumes that every row has a distinct combination of the four columns referenced above: sample_id, effort_no, hapitat_type and species. If not, it might be faster to GROUP by those columns first, in a sub-query, to make them unique.

    A composite index on sample_id and effort_no would probably make this faster.

    I'd be interested in hearing if the self-join above is significantly faster or slower than your solution.
  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    create table aaaa(PK_ID,SAMPLE_ID,HABITAT_TYPE,EFFORT_NO,SPECIES) as
    select 1,1, 'Riffle', 1, 'A' from dual union
    select 2,1, 'Riffle', 1, 'B' from dual union
    select 3,1, 'Riffle', 1, 'C' from dual union
    select 4,1, 'Riffle', 2, 'B' from dual union
    select 5,1, 'Riffle', 2, 'D' from dual union
    select 6,1, 'Pool' , 3, 'C' from dual union
    select 7,1, 'Pool' , 3, 'D' from dual union
    select 8,1, 'Pool' , 3, 'E' from dual union
    select 9,1, 'Riffle', 4, 'F' from dual union
    select 10,2, 'Pool' , 1, 'A' from dual;
    select SAMPLE_ID,HABITAT_TYPE,EFFORT_NO,
    sum(sum(willSum)) over(partition by SAMPLE_ID order by min(PK_ID)) as cnt
    from (select PK_ID,SAMPLE_ID,HABITAT_TYPE,EFFORT_NO,SPECIES,
    case Row_Number() over(partition by SAMPLE_ID,SPECIES order by PK_ID)
    when 1 then 1 else 0 end as willSum
    from aaaa)
    group by SAMPLE_ID,HABITAT_TYPE,EFFORT_NO
    order by SAMPLE_ID,min(PK_ID);
    SAMPLE_ID  HABITA  EFFORT_NO  CNT
    --------- ------ --------- ---
    1 Riffle 1 3
    1 Riffle 2 4
    1 Pool 3 5
    1 Riffle 4 6
    2 Pool 1 1
    similar threads
    593241
    569685
  • Randolf Geist
    Randolf Geist Member Posts: 2,214 Silver Trophy
    A bit late, but in the context of the similar threads mentioned and to complete the picture, here is how this could be done using the MODEL clause:
    with roll_test as (
    select 1 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 1 as effort_no, 'A' as species from dual union all
    select 2 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 1 as effort_no, 'B' as species from dual union all
    select 3 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 1 as effort_no, 'C' as species from dual union all
    select 4 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 2 as effort_no, 'B' as species from dual union all
    select 5 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 2 as effort_no, 'D' as species from dual union all
    select 6 as pk_id, 1 as sample_id, 'Pool' as habitat_type, 3 as effort_no, 'C' as species from dual union all
    select 7 as pk_id, 1 as sample_id, 'Pool' as habitat_type, 3 as effort_no, 'D' as species from dual union all
    select 8 as pk_id, 1 as sample_id, 'Pool' as habitat_type, 3 as effort_no, 'E' as species from dual union all
    select 9 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 4 as effort_no, 'F' as species from dual union all
    select 10 as pk_id, 2 as sample_id, 'Pool' as habitat_type, 1 as effort_no, 'A' as species from dual
    )
    select SAMPLE_ID,HABITAT_TYPE,EFFORT_NO, cnt
    from roll_test
    model
    return updated rows
    dimension by (
    rownum as rn,
    max(pk_id) over (partition by sample_id, habitat_type, effort_no) as pk_id,
    sample_id,
    habitat_type,
    effort_no)
    measures (species, 0 cnt)
    rules upsert all
    (cnt[0, any, any, any, any] = count(distinct species)[any, pk_id<=CV(), CV(), any, any]);
    Regards,
    Randolf

    Oracle related stuff:
    http://oracle-randolf.blogspot.com/

    SQLTools++ for Oracle:
    http://www.sqltools-plusplus.org:7676/
    http://sourceforge.net/projects/sqlt-pp/
  • 457632
    457632 Member Posts: 9
    Thanks guys for all your answers !

    While playing around with the queries, I found a weird behaviour if I try to use grouping sets in the query. Is this the normal behaviour or I stumbled across an Oracle bug ?

    The following 2 queries work as expected:

    Query 1 - works as expected:
    SELECT   sample_id,
    SUM (SUM (willsum)) OVER (PARTITION BY sample_id ORDER BY MIN
    (pk_id) )
    AS cnt
    FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
    CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
    WHEN 1
    THEN 1
    ELSE 0
    END AS willsum
    FROM roll_test)
    GROUP BY sample_id
    ORDER BY sample_id, MIN (pk_id);
    Query 2 - works as expected:
    SELECT   sample_id, habitat_type, effort_no,
    SUM (SUM (willsum)) OVER (PARTITION BY sample_id ORDER BY MIN
    (pk_id) )
    AS cnt
    FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
    CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
    WHEN 1
    THEN 1
    ELSE 0
    END AS willsum
    FROM roll_test)
    GROUP BY sample_id, habitat_type, effort_no
    ORDER BY sample_id, MIN (pk_id);
    Query 3 - applpying grouping sets to combine the above 2 queries returns wrong results:
    SELECT   grouping( sample_id ),
    grouping( habitat_type ),
    sample_id,
    habitat_type,
    effort_no,
    SUM (SUM (willsum)) OVER (PARTITION BY sample_id ORDER BY MIN
    (pk_id) )
    AS cnt
    FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
    CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
    WHEN 1
    THEN 1
    ELSE 0
    END AS willsum
    FROM roll_test)
    GROUP BY grouping sets ( sample_id, ( sample_id, habitat_type, effort_no ) )
    ORDER BY sample_id, MIN (pk_id);
    Anybody can explain this behaviour of grouping sets ?

    Thanks again !
  • 457632
    457632 Member Posts: 9
    Sorry, my bad, I found what was wrong with Query 3. Here is the correct query with grouping sets:
    SELECT   grouping( sample_id ),
    grouping( habitat_type ),
    sample_id,
    habitat_type,
    effort_no,
    SUM (SUM (willsum)) OVER (PARTITION BY grouping( sample_id ), grouping( habitat_type ), sample_id ORDER BY MIN
    (pk_id) )
    AS cnt
    FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
    CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
    WHEN 1
    THEN 1
    ELSE 0
    END AS willsum
    FROM roll_test)
    GROUP BY grouping sets ( sample_id, ( sample_id, habitat_type, effort_no ) )
    ORDER BY sample_id, MIN (pk_id);
    Thanks !
This discussion has been closed.