SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Groupping looking into previous group in one query... analythics help ?

457632Jul 24 2008 — edited Jul 30 2008

Hi all,

I am trying to do a single query for the following situation, please let me know if this is possible, maybe using analythics.

My table:

PK_ID SAMPLE_ID HABITAT_TYPE EFFORT_NO SPECIES
----------------------------------------------------------------------------------------------------------
1 1 Riffle 1 A
2 1 Riffle 1 B
3 1 Riffle 1 C
4 1 Riffle 2 B
5 1 Riffle 2 D
6 1 Pool 3 C
7 1 Pool 3 D
8 1 Pool 3 E
9 1 Riffle 4 F
10 2 Pool 1 A

I want a query that groups the data by SAMPLE_ID, HABITAT_TYPE and EFFORT_NO, ordered by PK_ID. In the result, I need to count the DISTINCT SPECIES for the current EFFORT_NO and ALL the previous EFFORT_NOs. The result should look like this:

SAMPLE_ID HABITAT_TYPE EFFORT_NO COUNT
--------------------------------------------------------------------------------------------------
1 Riffle 1 3
1 Riffle 2 4
1 Pool 3 5
1 Riffle 4 6
2 Pool 1 1

So, for the first result row, A, B, and C should be counted as 3 from PK_IDs (1,2,3).

For the second result row, A, B, C and D should be counted as 4 from PK_IDs (1,2,3,4,5)

For the third result row, A, B, C, D and E should be counted as 5 from PK_IDs (1,2,3,4,5,6,7,8)

Can this be done with one sigle query ?

Thanks,
Danny

457632

Clarification:

I don't want some query that uses subqueries to requery the table after doing the groups to get the counts, I want an optimal query (if possible) that does the counts and the groups in one pass through the table.

Thanks,
Danny

646947

How exactly can you ORDER BY PK_ID if you do not want to GROUP BY it?

457632

That's the question... Can analythics help ? As far as I know, analythics are applied after the groupping... I'm looking for an idea, just as an academic exercise, how can this be done in the most optimal way ?

Thanks,
Danny

Frank Kulash

Hi,

I want a query that groups the data by SAMPLE_ID, HABITAT_TYPE and
EFFORT_NO, ordered by PK_ID.

If you're not grouping by pk_id, then which pk_id do you want? The average?

Anyway, I don't think you can do this without a sub-query. That does not mean you will query the table twice: you will query the table once (say, to do a GROUP BY query), and then query that result set (say, to do the analytic SUM). If the table has a million rows, and the data falls into 100 groups, the sub-query will do a single pass through the million rows, and the main query will do a pass through the 100 aggregate rows.

You can do this with all analytic functions (no aggregates), but it will still involve a sub-query, since you can't nest analytic functions.
You can also do it with scalar aub-queries, but that would be the least efficient of all.

457632

Frank,

If I'm groupping first, there's no way to get the DISTINCT counts for the next groups - I need to get the distinct counts for group 1 (as first group); for group 1 AND 2 (as second group); for group 1 AND 2 AND 3 (as third group) etc.

Thanks,
Danny

457632

Basically, this is the query that gives me the desired result:

SELECT gq.sample_id, gq.habitat_type, gq.effort_no,
(SELECT COUNT (DISTINCT species)
FROM roll_test b
WHERE gq.sample_id = b.sample_id
AND gq.effort_no >= b.effort_no) AS COUNT
FROM (SELECT sample_id, habitat_type, effort_no
FROM roll_test a
GROUP BY sample_id, habitat_type, effort_no) gq
ORDER BY 1, 3;

The question is, can it be written in a different way (optimal) to do everything in one pass (grouppings and counting) ?

Thanks,
Danny

Frank Kulash

Hi,

Sorry, I didn't understand the problem before.

It seems that you want to order by effort_no, with every sample_id forming a different group: pk_id plays no role in this problem.

You would like to do:

,	COUNT (DISTINCT species)
		OVER	( PARTITION BY	sample_id
			  ORDER BY	effort_no
			  ,		habitat_type -- ?
			)	AS cnt

but you can't. You'll get the error ORA-30487 if you try to use an analytic function with both DISTINCT and ORDER BY.

Instead of a scalar sub-query, like you posted, I would be inclined to do a self-join:

SELECT 	a.sample_id
,	a.habitat_type
,	a.effort_no
,	COUNT (DISTINCT b.species)	AS cnt
FROM	roll_test	a
JOIN	roll_test	b
ON	a.sample_id	= b.sample_id
AND	a.effort_no	>= b.effort_no
GROUP BY	a.sample_id,	a.effort_no,	a.habitat_type
ORDER BY	a.sample_id,	a.effort_no,	a.habitat_type
;

This assumes that every row has a distinct combination of the four columns referenced above: sample_id, effort_no, hapitat_type and species. If not, it might be faster to GROUP by those columns first, in a sub-query, to make them unique.

A composite index on sample_id and effort_no would probably make this faster.

I'd be interested in hearing if the self-join above is significantly faster or slower than your solution.

Aketi Jyuuzou

create table aaaa(PK_ID,SAMPLE_ID,HABITAT_TYPE,EFFORT_NO,SPECIES) as
select 1,1, 'Riffle', 1, 'A' from dual union
select  2,1, 'Riffle', 1, 'B' from dual union
select  3,1, 'Riffle', 1, 'C' from dual union
select  4,1, 'Riffle', 2, 'B' from dual union
select  5,1, 'Riffle', 2, 'D' from dual union
select  6,1, 'Pool'  , 3, 'C' from dual union
select  7,1, 'Pool'  , 3, 'D' from dual union
select  8,1, 'Pool'  , 3, 'E' from dual union
select  9,1, 'Riffle', 4, 'F' from dual union
select 10,2, 'Pool'  , 1, 'A' from dual;

select SAMPLE_ID,HABITAT_TYPE,EFFORT_NO,
sum(sum(willSum)) over(partition by SAMPLE_ID order by min(PK_ID)) as cnt
from (select PK_ID,SAMPLE_ID,HABITAT_TYPE,EFFORT_NO,SPECIES,
      case Row_Number() over(partition by SAMPLE_ID,SPECIES order by PK_ID) 
      when 1 then 1 else 0 end as willSum
        from aaaa)
group by SAMPLE_ID,HABITAT_TYPE,EFFORT_NO
order by SAMPLE_ID,min(PK_ID);

SAMPLE_ID  HABITA  EFFORT_NO  CNT
---------  ------  ---------  ---
        1  Riffle          1    3
        1  Riffle          2    4
        1  Pool            3    5
        1  Riffle          4    6
        2  Pool            1    1

similar threads
593241
569685

Randolf Geist

A bit late, but in the context of the similar threads mentioned and to complete the picture, here is how this could be done using the MODEL clause:

with roll_test as (
select 1 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 1 as effort_no, 'A' as species from dual union all
select 2 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 1 as effort_no, 'B' as species from dual union all
select 3 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 1 as effort_no, 'C' as species from dual union all
select 4 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 2 as effort_no, 'B' as species from dual union all
select 5 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 2 as effort_no, 'D' as species from dual union all
select 6 as pk_id, 1 as sample_id, 'Pool' as habitat_type, 3 as effort_no, 'C' as species from dual union all
select 7 as pk_id, 1 as sample_id, 'Pool' as habitat_type, 3 as effort_no, 'D' as species from dual union all
select 8 as pk_id, 1 as sample_id, 'Pool' as habitat_type, 3 as effort_no, 'E' as species from dual union all
select 9 as pk_id, 1 as sample_id, 'Riffle' as habitat_type, 4 as effort_no, 'F' as species from dual union all
select 10 as pk_id, 2 as sample_id, 'Pool' as habitat_type, 1 as effort_no, 'A' as species from dual
)
select SAMPLE_ID,HABITAT_TYPE,EFFORT_NO, cnt
from roll_test
model
return updated rows
dimension by (
rownum as rn,
max(pk_id) over (partition by sample_id, habitat_type, effort_no) as pk_id,
sample_id,
habitat_type,
effort_no)
measures (species, 0 cnt)
rules upsert all
(cnt[0, any, any, any, any] = count(distinct species)[any, pk_id<=CV(), CV(), any, any]);

Regards,
Randolf

Oracle related stuff:
http://oracle-randolf.blogspot.com/

SQLTools++ for Oracle:
http://www.sqltools-plusplus.org:7676/
http://sourceforge.net/projects/sqlt-pp/

457632

Thanks guys for all your answers !

While playing around with the queries, I found a weird behaviour if I try to use grouping sets in the query. Is this the normal behaviour or I stumbled across an Oracle bug ?

The following 2 queries work as expected:

Query 1 - works as expected:

SELECT   sample_id,
         SUM (SUM (willsum)) OVER (PARTITION BY sample_id ORDER BY MIN
                                                                (pk_id) )
                                                                       AS cnt
    FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
                 CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
                    WHEN 1
                       THEN 1
                    ELSE 0
                 END AS willsum
            FROM roll_test)
GROUP BY sample_id
ORDER BY sample_id, MIN (pk_id);

Query 2 - works as expected:

SELECT   sample_id, habitat_type, effort_no,
         SUM (SUM (willsum)) OVER (PARTITION BY sample_id ORDER BY MIN
                                                                (pk_id) )
                                                                       AS cnt
    FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
                 CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
                    WHEN 1
                       THEN 1
                    ELSE 0
                 END AS willsum
            FROM roll_test)
GROUP BY sample_id, habitat_type, effort_no
ORDER BY sample_id, MIN (pk_id);

Query 3 - applpying grouping sets to combine the above 2 queries returns wrong results:

SELECT   grouping( sample_id ),
         grouping( habitat_type ),   
         sample_id, 
         habitat_type, 
         effort_no,
         SUM (SUM (willsum)) OVER (PARTITION BY sample_id ORDER BY MIN
                                                                (pk_id) )
                                                                       AS cnt
    FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
                 CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
                    WHEN 1
                       THEN 1
                    ELSE 0
                 END AS willsum
            FROM roll_test)
GROUP BY grouping sets ( sample_id, ( sample_id, habitat_type, effort_no ) )
ORDER BY sample_id, MIN (pk_id);

Anybody can explain this behaviour of grouping sets ?

Thanks again !

457632

Sorry, my bad, I found what was wrong with Query 3. Here is the correct query with grouping sets:

SELECT   grouping( sample_id ),
         grouping( habitat_type ),   
         sample_id, 
         habitat_type, 
         effort_no,
         SUM (SUM (willsum)) OVER (PARTITION BY grouping( sample_id ), grouping( habitat_type ), sample_id ORDER BY MIN
                                                                (pk_id) )
                                                                       AS cnt
    FROM (SELECT pk_id, sample_id, habitat_type, effort_no, species,
                 CASE ROW_NUMBER () OVER (PARTITION BY sample_id, species ORDER BY pk_id)
                    WHEN 1
                       THEN 1
                    ELSE 0
                 END AS willsum
            FROM roll_test)
GROUP BY grouping sets ( sample_id, ( sample_id, habitat_type, effort_no ) )
ORDER BY sample_id, MIN (pk_id);

Thanks !

1 - 11

Locked Post

New comments cannot be posted to this locked post.

Locked on Aug 27 2008

Added on Jul 24 2008

11 comments

2,109 views

SQL & PL/SQL

Groupping looking into previous group in one query... analythics help ?

Comments

Post Details