11 Replies Latest reply on Jul 30, 2008 1:53 PM by 457632

    Groupping looking into previous group in one query... analythics help ?

    457632
      Hi all,

      I am trying to do a single query for the following situation, please let me know if this is possible, maybe using analythics.

      My table:

      PK_ID SAMPLE_ID HABITAT_TYPE EFFORT_NO SPECIES
      ----------------------------------------------------------------------------------------------------------
      1 1 Riffle 1 A
      2 1 Riffle 1 B
      3 1 Riffle 1 C
      4 1 Riffle 2 B
      5 1 Riffle 2 D
      6 1 Pool 3 C
      7 1 Pool 3 D
      8 1 Pool 3 E
      9 1 Riffle 4 F
      10 2 Pool 1 A

      I want a query that groups the data by SAMPLE_ID, HABITAT_TYPE and EFFORT_NO, ordered by PK_ID. In the result, I need to count the DISTINCT SPECIES for the current EFFORT_NO and ALL the previous EFFORT_NOs. The result should look like this:


      SAMPLE_ID HABITAT_TYPE EFFORT_NO COUNT
      --------------------------------------------------------------------------------------------------
      1 Riffle 1 3
      1 Riffle 2 4
      1 Pool 3 5
      1 Riffle 4 6
      2 Pool 1 1

      So, for the first result row, A, B, and C should be counted as 3 from PK_IDs (1,2,3).

      For the second result row, A, B, C and D should be counted as 4 from PK_IDs (1,2,3,4,5)

      For the third result row, A, B, C, D and E should be counted as 5 from PK_IDs (1,2,3,4,5,6,7,8)

      Can this be done with one sigle query ?

      Thanks,
      Danny
        • 1. Re: Groupping looking into previous group in one query... analythics help ?
          457632
          Clarification:

          I don't want some query that uses subqueries to requery the table after doing the groups to get the counts, I want an optimal query (if possible) that does the counts and the groups in one pass through the table.

          Thanks,
          Danny
          • 2. Re: Groupping looking into previous group in one query... analythics help ?
            646947
            How exactly can you ORDER BY PK_ID if you do not want to GROUP BY it?
            • 3. Re: Groupping looking into previous group in one query... analythics help ?
              457632
              That's the question... Can analythics help ? As far as I know, analythics are applied after the groupping... I'm looking for an idea, just as an academic exercise, how can this be done in the most optimal way ?

              Thanks,
              Danny
              • 4. Re: Groupping looking into previous group in one query... analythics help ?
                Frank Kulash
                Hi,
                I want a query that groups the data by SAMPLE_ID, HABITAT_TYPE and
                EFFORT_NO, ordered by PK_ID.
                If you're not grouping by pk_id, then which pk_id do you want? The average?

                Anyway, I don't think you can do this without a sub-query. That does not mean you will query the table twice: you will query the table once (say, to do a GROUP BY query), and then query that result set (say, to do the analytic SUM). If the table has a million rows, and the data falls into 100 groups, the sub-query will do a single pass through the million rows, and the main query will do a pass through the 100 aggregate rows.

                You can do this with all analytic functions (no aggregates), but it will still involve a sub-query, since you can't nest analytic functions.
                You can also do it with scalar aub-queries, but that would be the least efficient of all.
                • 5. Re: Groupping looking into previous group in one query... analythics help ?
                  457632
                  Frank,

                  If I'm groupping first, there's no way to get the DISTINCT counts for the next groups - I need to get the distinct counts for group 1 (as first group); for group 1 AND 2 (as second group); for group 1 AND 2 AND 3 (as third group) etc.

                  Thanks,
                  Danny
                  • 6. Re: Groupping looking into previous group in one query... analythics help ?
                    457632
                    Basically, this is the query that gives me the desired result:

                    SELECT gq.sample_id, gq.habitat_type, gq.effort_no,
                    (SELECT COUNT (DISTINCT species)
                    FROM roll_test b
                    WHERE gq.sample_id = b.sample_id
                    AND gq.effort_no >= b.effort_no) AS COUNT
                    FROM (SELECT sample_id, habitat_type, effort_no
                    FROM roll_test a
                    GROUP BY sample_id, habitat_type, effort_no) gq
                    ORDER BY 1, 3;

                    The question is, can it be written in a different way (optimal) to do everything in one pass (grouppings and counting) ?

                    Thanks,
                    Danny
                    • 7. Self-Join
                      Frank Kulash
                      Hi,

                      Sorry, I didn't understand the problem before.

                      It seems that you want to order by effort_no, with every sample_id forming a different group: pk_id plays no role in this problem.

                      You would like to do:
                      ,     COUNT (DISTINCT species)
                                OVER     ( PARTITION BY     sample_id
                                       ORDER BY     effort_no
                                       ,          habitat_type -- ?
                                     )     AS cnt
                      but you can't. You'll get the error ORA-30487 if you try to use an analytic function with both DISTINCT and ORDER BY.

                      Instead of a scalar sub-query, like you posted, I would be inclined to do a self-join:
                      SELECT      a.sample_id
                      ,     a.habitat_type
                      ,     a.effort_no
                      ,     COUNT (DISTINCT b.species)     AS cnt
                      FROM     roll_test     a
                      JOIN     roll_test     b
                      ON     a.sample_id     = b.sample_id
                      AND     a.effort_no     >= b.effort_no
                      GROUP BY     a.sample_id,     a.effort_no,     a.habitat_type
                      ORDER BY     a.sample_id,     a.effort_no,     a.habitat_type
                      ;          
                      This assumes that every row has a distinct combination of the four columns referenced above: sample_id, effort_no, hapitat_type and species. If not, it might be faster to GROUP by those columns first, in a sub-query, to make them unique.

                      A composite index on sample_id and effort_no would probably make this faster.

                      I'd be interested in hearing if the self-join above is significantly faster or slower than your solution.
                      • 8. Re: Groupping looking into previous group in one query... analythics help ?
                        Aketi Jyuuzou
                        create table aaaa(PK_ID,SAMPLE_ID,HABITAT_TYPE,EFFORT_NO,SPECIES) as
                        select 1,1, 'Riffle', 1, 'A' from dual union
                        select  2,1, 'Riffle', 1, 'B' from dual union
                        select  3,1, 'Riffle', 1, 'C' from dual union
                        select  4,1, 'Riffle', 2, 'B' from dual union
                        select  5,1, 'Riffle', 2, 'D' from dual union
                        select  6,1, 'Pool'  , 3, 'C' from dual union
                        select  7,1, 'Pool'  , 3, 'D' from dual union
                        select  8,1, 'Pool'  , 3, 'E' from dual union
                        select  9,1, 'Riffle', 4, 'F' from dual union
                        select 10,2, 'Pool'  , 1, 'A' from dual;
                        select SAMPLE_ID,HABITAT_TYPE,EFFORT_NO,
                        sum(sum(willSum)) over(partition by SAMPLE_ID order by min(PK_ID)) as cnt
                        from (select PK_ID,SAMPLE_ID,HABITAT_TYPE,EFFORT_NO,SPECIES,
                              case Row_Number() over(partition by SAMPLE_ID,SPECIES order by PK_ID)
                              when 1 then 1 else 0 end as willSum
                                from aaaa)
                        group by SAMPLE_ID,HABITAT_TYPE,EFFORT_NO
                        order by SAMPLE_ID,min(PK_ID);
                        SAMPLE_ID  HABITA  EFFORT_NO  CNT
                        ---------  ------  ---------  ---
                                1  Riffle          1    3
                                1  Riffle          2    4
                                1  Pool            3    5
                                1  Riffle          4    6
                                2  Pool            1    1
                        similar threads
                        [SQL] how can i get this result....??(accumulation distinct count)
                        Rolling unique person count by month over a time period
                        • 9. Re: Groupping looking into previous group in one query... analythics help ?
                          Randolf Geist
                          A bit late, but in the context of the similar threads mentioned and to complete the picture, here is how this could be done using the MODEL clause:
                          with roll_test as (
                          select 1 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 1 as effort_no, 'A' as species from dual union all
                          select 2 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 1 as effort_no, 'B' as species from dual union all
                          select 3 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 1 as effort_no, 'C' as species from dual union all
                          select 4 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 2 as effort_no, 'B' as species from dual union all
                          select 5 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 2 as effort_no, 'D' as species from dual union all
                          select 6 as pk_id, 1 as sample_id, 'Pool' as habitat_type, 3 as effort_no, 'C' as species from dual union all
                          select 7 as pk_id, 1 as sample_id, 'Pool' as habitat_type, 3 as effort_no, 'D' as species from dual union all
                          select 8 as pk_id, 1 as sample_id, 'Pool' as habitat_type, 3 as effort_no, 'E' as species from dual union all
                          select 9 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 4 as effort_no, 'F' as species from dual union all
                          select 10 as pk_id, 2 as sample_id, 'Pool' as habitat_type, 1 as effort_no, 'A' as species from dual
                          )
                          select SAMPLE_ID,HABITAT_TYPE,EFFORT_NO, cnt
                          from roll_test
                          model
                          return updated rows
                          dimension by (
                          rownum as rn,
                          max(pk_id) over (partition by sample_id, habitat_type, effort_no) as pk_id,
                          sample_id,
                          habitat_type,
                          effort_no)
                          measures (species, 0 cnt)
                          rules upsert all
                          (cnt[0, any, any, any, any] = count(distinct species)[any, pk_id<=CV(), CV(), any, any]);
                          Regards,
                          Randolf

                          Oracle related stuff:
                          http://oracle-randolf.blogspot.com/

                          SQLTools++ for Oracle:
                          http://www.sqltools-plusplus.org:7676/
                          http://sourceforge.net/projects/sqlt-pp/
                          • 10. Re: Groupping looking into previous group in one query... analythics help ?
                            457632
                            Thanks guys for all your answers !

                            While playing around with the queries, I found a weird behaviour if I try to use grouping sets in the query. Is this the normal behaviour or I stumbled across an Oracle bug ?

                            The following 2 queries work as expected:

                            Query 1 - works as expected:
                            SELECT   sample_id,
                                     SUM (SUM (willsum)) OVER (PARTITION BY sample_id ORDER BY MIN
                                                                                            (pk_id) )
                                                                                                   AS cnt
                                FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
                                             CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
                                                WHEN 1
                                                   THEN 1
                                                ELSE 0
                                             END AS willsum
                                        FROM roll_test)
                            GROUP BY sample_id
                            ORDER BY sample_id, MIN (pk_id);
                            Query 2 - works as expected:
                            SELECT   sample_id, habitat_type, effort_no,
                                     SUM (SUM (willsum)) OVER (PARTITION BY sample_id ORDER BY MIN
                                                                                            (pk_id) )
                                                                                                   AS cnt
                                FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
                                             CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
                                                WHEN 1
                                                   THEN 1
                                                ELSE 0
                                             END AS willsum
                                        FROM roll_test)
                            GROUP BY sample_id, habitat_type, effort_no
                            ORDER BY sample_id, MIN (pk_id);
                            Query 3 - applpying grouping sets to combine the above 2 queries returns wrong results:
                            SELECT   grouping( sample_id ),
                                     grouping( habitat_type ),  
                                     sample_id,
                                     habitat_type,
                                     effort_no,
                                     SUM (SUM (willsum)) OVER (PARTITION BY sample_id ORDER BY MIN
                                                                                            (pk_id) )
                                                                                                   AS cnt
                                FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
                                             CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
                                                WHEN 1
                                                   THEN 1
                                                ELSE 0
                                             END AS willsum
                                        FROM roll_test)
                            GROUP BY grouping sets ( sample_id, ( sample_id, habitat_type, effort_no ) )
                            ORDER BY sample_id, MIN (pk_id);
                            Anybody can explain this behaviour of grouping sets ?

                            Thanks again !
                            • 11. Re: Groupping looking into previous group in one query... analythics help ?
                              457632
                              Sorry, my bad, I found what was wrong with Query 3. Here is the correct query with grouping sets:
                              SELECT   grouping( sample_id ),
                                       grouping( habitat_type ),  
                                       sample_id,
                                       habitat_type,
                                       effort_no,
                                       SUM (SUM (willsum)) OVER (PARTITION BY grouping( sample_id ), grouping( habitat_type ), sample_id ORDER BY MIN
                                                                                              (pk_id) )
                                                                                                     AS cnt
                                  FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
                                               CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
                                                  WHEN 1
                                                     THEN 1
                                                  ELSE 0
                                               END AS willsum
                                          FROM roll_test)
                              GROUP BY grouping sets ( sample_id, ( sample_id, habitat_type, effort_no ) )
                              ORDER BY sample_id, MIN (pk_id);
                              Thanks !