Forum Stats

  • 3,734,031 Users
  • 2,246,861 Discussions
  • 7,857,002 Comments

Discussions

Identifying and grouping consecutive rows in sql

faanwar
faanwar Member Posts: 31
edited May 2012 in SQL & PL/SQL
I have following data set:
CREATE TABLE APPS.T1
(
  ROW_NUM               NUMBER,
  EFFECTIVE_START_DATE  DATE                    NOT NULL,
  EFFECTIVE_END_DATE    DATE                    NOT NULL,
  STATUS                VARCHAR2(30 BYTE)
);


SET DEFINE OFF;
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (1, TO_DATE('07/01/2009 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('09/06/2009 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (2, TO_DATE('03/20/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/31/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (3, TO_DATE('08/06/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/22/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (4, TO_DATE('08/23/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/26/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
Insert into APPS.T1
   (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
 Values
   (5, TO_DATE('08/27/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/27/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
COMMIT;


SELECT * FROM APPS.T1

   ROW_NUM EFFECTIVE EFFECTIVE STATUS                        
---------- --------- --------- ------------------------------
         1 01-JUL-09 06-SEP-09 VAC                           
         2 20-MAR-11 31-MAR-11 VAC                           
         3 06-AUG-11 22-AUG-11 VAC                           
         4 23-AUG-11 26-AUG-11 VAC                           
         5 27-AUG-11 27-AUG-11 VAC                           

5 rows selected.
My requirement was that row number 3, 4 and 5 be grouped and treated as a single vacation record such that

effective_start_date = 06-AUG-2011 and
effective_end_date = 27-AUG-2011


For this I wrote a query:
SELECT effective_start_date,
       effective_end_date,
       CASE
          WHEN LAG (effective_end_date, 1) OVER (ORDER BY row_num) + 1 = effective_start_date
             THEN 0
          WHEN LEAD (effective_start_date, 1) OVER (ORDER BY row_num) - 1 = effective_end_date
             THEN 0
          ELSE 1
       END row_num
  FROM (SELECT * FROM T1)
Now the data returned looks like:
EFFECTIVE EFFECTIVE    ROW_NUM
--------- --------- ----------
01-JUL-09 06-SEP-09          1
20-MAR-11 31-MAR-11          1
06-AUG-11 22-AUG-11          0
23-AUG-11 26-AUG-11          0
27-AUG-11 27-AUG-11          0

5 rows selected.
Now I can easily use MIN(effective_start_date) and MAX(effective_start_date) group by ROW_NUM to achieve the desired results
SELECT   MIN (effective_start_date) start_dt,
         MAX (effective_start_date) end_dt,
         row_num
    FROM (SELECT effective_start_date,
                 effective_end_date,
                 CASE
                    WHEN LAG (effective_end_date, 1) OVER (ORDER BY row_num) + 1 = effective_start_date
                       THEN 0
                    WHEN LEAD (effective_start_date, 1) OVER (ORDER BY row_num) - 1 = effective_end_date
                       THEN 0
                    ELSE 1
                 END row_num
            FROM (SELECT *
                    FROM t1))
GROUP BY row_num
  HAVING row_num = 0
UNION
SELECT effective_start_date start_dt,
       effective_start_date end_dt,
       row_num
  FROM (SELECT effective_start_date,
               effective_end_date,
               CASE
                  WHEN LAG (effective_end_date, 1) OVER (ORDER BY row_num) + 1 = effective_start_date
                     THEN 0
                  WHEN LEAD (effective_start_date, 1) OVER (ORDER BY row_num) - 1 = effective_end_date
                     THEN 0
                  ELSE 1
               END row_num
          FROM (SELECT *
                  FROM t1))
 WHERE row_num = 1

START_DT  END_DT       ROW_NUM
--------- --------- ----------
01-JUL-09 01-JUL-09          1
20-MAR-11 20-MAR-11          1
06-AUG-11 27-AUG-11          0

3 rows selected.
All done BUT the problem is that there may be several groups of consecutive rows like this. In that case each group should be identified distinctly for GROUP BY clause to work as expected.

I want to assign a unique number to each occurence of such group.

How can I achieve this? Any ideas?

Regards,

Faraz

Edited by: faanwar on May 10, 2012 3:36 PM

Answers

  • Sudhakar_B
    Sudhakar_B Member Posts: 501
    Faraz,
    Take a look at PARTITION BY clause. I believe that will help meet the requirement.

    Search for date gap in this forum.
    May be.. 4205170

    vr
  • Nicosa-Oracle
    Nicosa-Oracle Member Posts: 1,319 Employee
    edited May 2012
    Hi,

    You need to use Tabibitosan Method from the FAQ : {message:id=9535978}
    Direct link >> {message:id=3989177}

    Edited by: Nicosa on May 10, 2012 3:04 PM (Added direct link)
  • Nicosa-Oracle
    Nicosa-Oracle Member Posts: 1,319 Employee
    Well, actually, you'll need to tweak it a bit. something such as in :
    [email protected] SQL>l
      1  with t (id, dstart, dend, status)
      2  as (
      3  	select 1,to_date('01-JUL-09','dd-MON-YY'),to_date('06-SEP-09','dd-MON-YY'),'VAC' from dual
      4  	union all select 2,to_date('20-MAR-11','dd-MON-YY'),to_date('31-MAR-11','dd-MON-YY'),'VAC' from dual
      5  	union all select 3,to_date('06-AUG-11','dd-MON-YY'),to_date('22-AUG-11','dd-MON-YY'),'VAC' from dual
      6  	union all select 4,to_date('23-AUG-11','dd-MON-YY'),to_date('26-AUG-11','dd-MON-YY'),'VAC' from dual
      7  	union all select 5,to_date('27-AUG-11','dd-MON-YY'),to_date('27-AUG-11','dd-MON-YY'),'VAC' from dual
      8  )
      9  ------ end of sample data ------
     10  select min(dstart) dstart, max(dend) dend, status, count(*) aggrows
     11  from (
     12  	select
     13  	id
     14  		,dstart
     15  		,dend
     16  		,status
     17  		,dend
     18  			-sum(dend-dstart) over (partition by status order by dstart)
     19  			-row_number() over (partition by status order by dstart) grp
     20  	from t
     21  )
     22  group by grp, status
     23* order by grp, status
    [email protected] SQL>/
    
    DSTART              DEND                STA    AGGROWS
    ------------------- ------------------- --- ----------
    01/07/2009 00:00:00 06/09/2009 00:00:00 VAC          1
    20/03/2011 00:00:00 31/03/2011 00:00:00 VAC          1
    06/08/2011 00:00:00 27/08/2011 00:00:00 VAC          3
  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    WOW very Cool arranged Tabibitosan method "Nicosa" ;-)
    I have arranged your solution :8}
    alter session set NLS_DATE_FORMAT = 'yyyy-mm-dd';
    
    with t(staD,endD) as(
    select date '2009-07-01',date '2009-09-06' from dual union
    select date '2011-03-20',date '2011-03-31' from dual union
    select date '2011-08-06',date '2011-08-22' from dual union
    select date '2011-08-23',date '2011-08-26' from dual union
    select date '2011-08-27',date '2011-08-27' from dual)
    select staD,endD,
    endD-staD+1 as "endD-staD+1",
    sum(endD-staD+1) over (order by staD) as SumDistance,
    endD - sum(endD-staD+1) over (order by staD) as gap
      from t
    order by staD;
    
    staD        endD        endD-staD+1  SumDistance  gap
    ----------  ----------  -----------  -----------  ----------
    2009-07-01  2009-09-06           68           68  2009-06-30
    2011-03-20  2011-03-31           12           80  2011-01-10
    2011-08-06  2011-08-22           17           97  2011-05-17
    2011-08-23  2011-08-26            4          101  2011-05-17
    2011-08-27  2011-08-27            1          102  2011-05-17
    I imagined that there are 2 people called X and A.
    X always walks more than 1m (endD)
    A always walks more then 1m (sum(endD-staD+1) over (order by staD))
  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    And there is famous way :D
    alter session set NLS_DATE_FORMAT = 'yyyy-mm-dd';
    
    with t(staD,endD) as(
    select date '2009-07-01',date '2009-09-06' from dual union
    select date '2011-03-20',date '2011-03-31' from dual union
    select date '2011-08-06',date '2011-08-22' from dual union
    select date '2011-08-23',date '2011-08-26' from dual union
    select date '2011-08-27',date '2011-08-27' from dual),
    tmp1 as(
    select staD,endD,
    case when Lag(endD) over(order by staD)+1
            = staD then 0 else 1 end as WillSum
      from t),
    tmp2 as(
    select staD,endD,
    sum(WillSum) over(order by staD) as grp
      from tmp1)
    select min(staD),max(endD),count(*)
      from tmp2
    group by grp
    order by min(staD);
    
    MIN(staD)   MAX(endD)   COUNT(*)
    ----------  ----------  --------
    2009-07-01  2009-09-06         1
    2011-03-20  2011-03-31         1
    2011-08-06  2011-08-27         3
  • Nicosa-Oracle
    Nicosa-Oracle Member Posts: 1,319 Employee
    Aketi Jyuuzou wrote:
    WOW very Cool arranged Tabibitosan method "Nicosa" ;-)
    Thanks Aketi, it's a pleasure and an honour to have you saying that !
    It made my day !
    :-)

    I learned the tabibitosan method from your post I gave a link to. Hence, I owe you a big Thank You for that.
  • jihuyao
    jihuyao Member Posts: 462
    Here is another approach. Pick up the right combination of double lag/lead process. Use partition by in lag/lead function for multiple groups of the records

    with t as (
    select rownum id, t.* from
     (select to_date('01-JUL-09','dd-MON-YY') dstart,
          to_date('06-SEP-09','dd-MON-YY') dend,'VAC' status from dual
     union all select to_date('20-MAR-11','dd-MON-YY'),
          to_date('31-MAR-11','dd-MON-YY'),'VAC' from dual
     union all select to_date('06-AUG-11','dd-MON-YY'),
          to_date('22-AUG-11','dd-MON-YY'),'VAC' from dual
     union all select to_date('23-AUG-11','dd-MON-YY'),
          to_date('26-AUG-11','dd-MON-YY'),'VAC' from dual
     union all select to_date('27-AUG-11','dd-MON-YY'),
          to_date('27-AUG-11','dd-MON-YY'),'VAC' from dual) t
    )
    select id, status,
    decode(group_start, null, dstart, group_start) group_start,
    dend group_end,
    group_cnt
    from
    (
    select v.*,
    lag(id, 1, 0) over (order by id) lag_id,
    id-(lag(id, 1, 0) over (order by id)) group_cnt,
    lag(lead_start) over (order by id) group_start,
    lead_start-dend as gap
    from
    (
    select t.*,
    lag(dstart) over (order by id) lag_start,
    lag(dend) over (order by id) lag_end,
    lead(dstart) over (order by id) lead_start,
    lead(dend) over (order by id) lead_end
    from t
    ) v
    where lead_start-dend >1 or lead_start-dend is null
    ) v1
    /
    
    
            ID STA GROUP_STA GROUP_END  GROUP_CNT
    ---------- --- --------- --------- ----------
             1 VAC 01-JUL-09 06-SEP-09          1
             2 VAC 20-MAR-11 31-MAR-11          1
             5 VAC 06-AUG-11 27-AUG-11          3
  • faanwar
    faanwar Member Posts: 31
    edited May 2012
    Fabulous! Thank you guys specially Nicosa and Aketi for such quick and useful responses. That was a big help to me.

    Didn't know we could play with SQL this way. Learnt something new today :)

    Thanks again

    Faraz

    Edited by: faanwar on May 12, 2012 11:33 AM
This discussion has been closed.