SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Identifying and grouping consecutive rows in sql

faanwarMay 10 2012 — edited May 12 2012

I have following data set:

CREATE TABLE APPS.T1
(
  ROW_NUM               NUMBER,
  EFFECTIVE_START_DATE  DATE                    NOT NULL,
  EFFECTIVE_END_DATE    DATE                    NOT NULL,
  STATUS                VARCHAR2(30 BYTE)
);


SET DEFINE OFF;
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (1, TO_DATE('07/01/2009 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('09/06/2009 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (2, TO_DATE('03/20/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/31/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (3, TO_DATE('08/06/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/22/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (4, TO_DATE('08/23/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/26/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (5, TO_DATE('08/27/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/27/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
COMMIT;


SELECT * FROM APPS.T1

   ROW_NUM EFFECTIVE EFFECTIVE STATUS                        
---------- --------- --------- ------------------------------
         1 01-JUL-09 06-SEP-09 VAC                           
         2 20-MAR-11 31-MAR-11 VAC                           
         3 06-AUG-11 22-AUG-11 VAC                           
         4 23-AUG-11 26-AUG-11 VAC                           
         5 27-AUG-11 27-AUG-11 VAC                           

5 rows selected.

My requirement was that row number 3, 4 and 5 be grouped and treated as a single vacation record such that

effective_start_date = 06-AUG-2011 and
effective_end_date = 27-AUG-2011

For this I wrote a query:

SELECT effective_start_date,
       effective_end_date,
       CASE
          WHEN LAG (effective_end_date, 1) OVER (ORDER BY row_num) + 1 = effective_start_date
             THEN 0
          WHEN LEAD (effective_start_date, 1) OVER (ORDER BY row_num) - 1 = effective_end_date
             THEN 0
          ELSE 1
       END row_num
  FROM (SELECT * FROM T1)

Now the data returned looks like:

EFFECTIVE EFFECTIVE    ROW_NUM
--------- --------- ----------
01-JUL-09 06-SEP-09          1
20-MAR-11 31-MAR-11          1
06-AUG-11 22-AUG-11          0
23-AUG-11 26-AUG-11          0
27-AUG-11 27-AUG-11          0

5 rows selected.

Now I can easily use MIN(effective_start_date) and MAX(effective_start_date) group by ROW_NUM to achieve the desired results

SELECT   MIN (effective_start_date) start_dt,
         MAX (effective_start_date) end_dt,
         row_num
    FROM (SELECT effective_start_date,
                 effective_end_date,
                 CASE
                    WHEN LAG (effective_end_date, 1) OVER (ORDER BY row_num) + 1 = effective_start_date
                       THEN 0
                    WHEN LEAD (effective_start_date, 1) OVER (ORDER BY row_num) - 1 = effective_end_date
                       THEN 0
                    ELSE 1
                 END row_num
            FROM (SELECT *
                    FROM t1))
GROUP BY row_num
  HAVING row_num = 0
UNION
SELECT effective_start_date start_dt,
       effective_start_date end_dt,
       row_num
  FROM (SELECT effective_start_date,
               effective_end_date,
               CASE
                  WHEN LAG (effective_end_date, 1) OVER (ORDER BY row_num) + 1 = effective_start_date
                     THEN 0
                  WHEN LEAD (effective_start_date, 1) OVER (ORDER BY row_num) - 1 = effective_end_date
                     THEN 0
                  ELSE 1
               END row_num
          FROM (SELECT *
                  FROM t1))
 WHERE row_num = 1

START_DT  END_DT       ROW_NUM
--------- --------- ----------
01-JUL-09 01-JUL-09          1
20-MAR-11 20-MAR-11          1
06-AUG-11 27-AUG-11          0

3 rows selected.

All done BUT the problem is that there may be several groups of consecutive rows like this. In that case each group should be identified distinctly for GROUP BY clause to work as expected.

I want to assign a unique number to each occurence of such group.

How can I achieve this? Any ideas?

Regards,

Faraz

Edited by: faanwar on May 10, 2012 3:36 PM

Sudhakar_B

Faraz,
Take a look at PARTITION BY clause. I believe that will help meet the requirement.

Search for date gap in this forum.
May be.. 4205170

vr

Nicosa-Oracle

Hi,

You need to use Tabibitosan Method from the FAQ : {message:id=9535978}
Direct link >> {message:id=3989177}

Edited by: Nicosa on May 10, 2012 3:04 PM (Added direct link)

Nicosa-Oracle

Well, actually, you'll need to tweak it a bit. something such as in :

Scott@my11g SQL>l
  1  with t (id, dstart, dend, status)
  2  as (
  3  	select 1,to_date('01-JUL-09','dd-MON-YY'),to_date('06-SEP-09','dd-MON-YY'),'VAC' from dual
  4  	union all select 2,to_date('20-MAR-11','dd-MON-YY'),to_date('31-MAR-11','dd-MON-YY'),'VAC' from dual
  5  	union all select 3,to_date('06-AUG-11','dd-MON-YY'),to_date('22-AUG-11','dd-MON-YY'),'VAC' from dual
  6  	union all select 4,to_date('23-AUG-11','dd-MON-YY'),to_date('26-AUG-11','dd-MON-YY'),'VAC' from dual
  7  	union all select 5,to_date('27-AUG-11','dd-MON-YY'),to_date('27-AUG-11','dd-MON-YY'),'VAC' from dual
  8  )
  9  ------ end of sample data ------
 10  select min(dstart) dstart, max(dend) dend, status, count(*) aggrows
 11  from (
 12  	select
 13  	id
 14  		,dstart
 15  		,dend
 16  		,status
 17  		,dend
 18  			-sum(dend-dstart) over (partition by status order by dstart)
 19  			-row_number() over (partition by status order by dstart) grp
 20  	from t
 21  )
 22  group by grp, status
 23* order by grp, status
Scott@my11g SQL>/

DSTART              DEND                STA    AGGROWS
------------------- ------------------- --- ----------
01/07/2009 00:00:00 06/09/2009 00:00:00 VAC          1
20/03/2011 00:00:00 31/03/2011 00:00:00 VAC          1
06/08/2011 00:00:00 27/08/2011 00:00:00 VAC          3

Aketi Jyuuzou

WOW very Cool arranged Tabibitosan method "Nicosa" ;-)
I have arranged your solution :8}

alter session set NLS_DATE_FORMAT = 'yyyy-mm-dd';

with t(staD,endD) as(
select date '2009-07-01',date '2009-09-06' from dual union
select date '2011-03-20',date '2011-03-31' from dual union
select date '2011-08-06',date '2011-08-22' from dual union
select date '2011-08-23',date '2011-08-26' from dual union
select date '2011-08-27',date '2011-08-27' from dual)
select staD,endD,
endD-staD+1 as "endD-staD+1",
sum(endD-staD+1) over (order by staD) as SumDistance,
endD - sum(endD-staD+1) over (order by staD) as gap
  from t
order by staD;

staD        endD        endD-staD+1  SumDistance  gap
----------  ----------  -----------  -----------  ----------
2009-07-01  2009-09-06           68           68  2009-06-30
2011-03-20  2011-03-31           12           80  2011-01-10
2011-08-06  2011-08-22           17           97  2011-05-17
2011-08-23  2011-08-26            4          101  2011-05-17
2011-08-27  2011-08-27            1          102  2011-05-17

I imagined that there are 2 people called X and A.
X always walks more than 1m (endD)
A always walks more then 1m (sum(endD-staD+1) over (order by staD))

Aketi Jyuuzou

And there is famous way :D

alter session set NLS_DATE_FORMAT = 'yyyy-mm-dd';

with t(staD,endD) as(
select date '2009-07-01',date '2009-09-06' from dual union
select date '2011-03-20',date '2011-03-31' from dual union
select date '2011-08-06',date '2011-08-22' from dual union
select date '2011-08-23',date '2011-08-26' from dual union
select date '2011-08-27',date '2011-08-27' from dual),
tmp1 as(
select staD,endD,
case when Lag(endD) over(order by staD)+1
        = staD then 0 else 1 end as WillSum
  from t),
tmp2 as(
select staD,endD,
sum(WillSum) over(order by staD) as grp
  from tmp1)
select min(staD),max(endD),count(*)
  from tmp2
group by grp
order by min(staD);

MIN(staD)   MAX(endD)   COUNT(*)
----------  ----------  --------
2009-07-01  2009-09-06         1
2011-03-20  2011-03-31         1
2011-08-06  2011-08-27         3

Nicosa-Oracle

Aketi Jyuuzou wrote:
WOW very Cool arranged Tabibitosan method "Nicosa" ;-)

Thanks Aketi, it's a pleasure and an honour to have you saying that !
It made my day !
:-)

I learned the tabibitosan method from your post I gave a link to. Hence, I owe you a big Thank You for that.

jihuyao

Here is another approach. Pick up the right combination of double lag/lead process. Use partition by in lag/lead function for multiple groups of the records

with t as (
select rownum id, t.* from
 (select to_date('01-JUL-09','dd-MON-YY') dstart,
      to_date('06-SEP-09','dd-MON-YY') dend,'VAC' status from dual
 union all select to_date('20-MAR-11','dd-MON-YY'),
      to_date('31-MAR-11','dd-MON-YY'),'VAC' from dual
 union all select to_date('06-AUG-11','dd-MON-YY'),
      to_date('22-AUG-11','dd-MON-YY'),'VAC' from dual
 union all select to_date('23-AUG-11','dd-MON-YY'),
      to_date('26-AUG-11','dd-MON-YY'),'VAC' from dual
 union all select to_date('27-AUG-11','dd-MON-YY'),
      to_date('27-AUG-11','dd-MON-YY'),'VAC' from dual) t
)
select id, status,
decode(group_start, null, dstart, group_start) group_start,
dend group_end,
group_cnt
from
(
select v.*,
lag(id, 1, 0) over (order by id) lag_id,
id-(lag(id, 1, 0) over (order by id)) group_cnt,
lag(lead_start) over (order by id) group_start,
lead_start-dend as gap
from
(
select t.*,
lag(dstart) over (order by id) lag_start,
lag(dend) over (order by id) lag_end,
lead(dstart) over (order by id) lead_start,
lead(dend) over (order by id) lead_end
from t
) v
where lead_start-dend >1 or lead_start-dend is null
) v1
/


        ID STA GROUP_STA GROUP_END  GROUP_CNT
---------- --- --------- --------- ----------
         1 VAC 01-JUL-09 06-SEP-09          1
         2 VAC 20-MAR-11 31-MAR-11          1
         5 VAC 06-AUG-11 27-AUG-11          3

faanwar

Fabulous! Thank you guys specially Nicosa and Aketi for such quick and useful responses. That was a big help to me.

Didn't know we could play with SQL this way. Learnt something new today :)

Thanks again

Faraz

Edited by: faanwar on May 12, 2012 11:33 AM

1 - 8

Locked Post

New comments cannot be posted to this locked post.

Locked on Jun 9 2012

Added on May 10 2012

#oracle, #plsql, #query, #sql, #sql-developer

8 comments

20,945 views

SQL & PL/SQL

Identifying and grouping consecutive rows in sql

Comments

Post Details