8 Replies Latest reply: May 12, 2012 1:33 AM by faanwar RSS

    Identifying and grouping consecutive rows in sql

    faanwar
      I have following data set:
      CREATE TABLE APPS.T1
      (
        ROW_NUM               NUMBER,
        EFFECTIVE_START_DATE  DATE                    NOT NULL,
        EFFECTIVE_END_DATE    DATE                    NOT NULL,
        STATUS                VARCHAR2(30 BYTE)
      );
      
      
      SET DEFINE OFF;
      Insert into APPS.T1
         (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
       Values
         (1, TO_DATE('07/01/2009 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('09/06/2009 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
      Insert into APPS.T1
         (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
       Values
         (2, TO_DATE('03/20/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('03/31/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
      Insert into APPS.T1
         (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
       Values
         (3, TO_DATE('08/06/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/22/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
      Insert into APPS.T1
         (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
       Values
         (4, TO_DATE('08/23/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/26/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
      Insert into APPS.T1
         (ROW_NUM, EFFECTIVE_START_DATE, EFFECTIVE_END_DATE, STATUS)
       Values
         (5, TO_DATE('08/27/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), TO_DATE('08/27/2011 00:00:00', 'MM/DD/YYYY HH24:MI:SS'), 'VAC');
      COMMIT;
      
      
      SELECT * FROM APPS.T1
      
         ROW_NUM EFFECTIVE EFFECTIVE STATUS                        
      ---------- --------- --------- ------------------------------
               1 01-JUL-09 06-SEP-09 VAC                           
               2 20-MAR-11 31-MAR-11 VAC                           
               3 06-AUG-11 22-AUG-11 VAC                           
               4 23-AUG-11 26-AUG-11 VAC                           
               5 27-AUG-11 27-AUG-11 VAC                           
      
      5 rows selected.
      My requirement was that row number 3, 4 and 5 be grouped and treated as a single vacation record such that

      effective_start_date = 06-AUG-2011 and
      effective_end_date = 27-AUG-2011


      For this I wrote a query:
      SELECT effective_start_date,
             effective_end_date,
             CASE
                WHEN LAG (effective_end_date, 1) OVER (ORDER BY row_num) + 1 = effective_start_date
                   THEN 0
                WHEN LEAD (effective_start_date, 1) OVER (ORDER BY row_num) - 1 = effective_end_date
                   THEN 0
                ELSE 1
             END row_num
        FROM (SELECT * FROM T1)
      Now the data returned looks like:
      EFFECTIVE EFFECTIVE    ROW_NUM
      --------- --------- ----------
      01-JUL-09 06-SEP-09          1
      20-MAR-11 31-MAR-11          1
      06-AUG-11 22-AUG-11          0
      23-AUG-11 26-AUG-11          0
      27-AUG-11 27-AUG-11          0
      
      5 rows selected.
      Now I can easily use MIN(effective_start_date) and MAX(effective_start_date) group by ROW_NUM to achieve the desired results
      SELECT   MIN (effective_start_date) start_dt,
               MAX (effective_start_date) end_dt,
               row_num
          FROM (SELECT effective_start_date,
                       effective_end_date,
                       CASE
                          WHEN LAG (effective_end_date, 1) OVER (ORDER BY row_num) + 1 = effective_start_date
                             THEN 0
                          WHEN LEAD (effective_start_date, 1) OVER (ORDER BY row_num) - 1 = effective_end_date
                             THEN 0
                          ELSE 1
                       END row_num
                  FROM (SELECT *
                          FROM t1))
      GROUP BY row_num
        HAVING row_num = 0
      UNION
      SELECT effective_start_date start_dt,
             effective_start_date end_dt,
             row_num
        FROM (SELECT effective_start_date,
                     effective_end_date,
                     CASE
                        WHEN LAG (effective_end_date, 1) OVER (ORDER BY row_num) + 1 = effective_start_date
                           THEN 0
                        WHEN LEAD (effective_start_date, 1) OVER (ORDER BY row_num) - 1 = effective_end_date
                           THEN 0
                        ELSE 1
                     END row_num
                FROM (SELECT *
                        FROM t1))
       WHERE row_num = 1
      
      START_DT  END_DT       ROW_NUM
      --------- --------- ----------
      01-JUL-09 01-JUL-09          1
      20-MAR-11 20-MAR-11          1
      06-AUG-11 27-AUG-11          0
      
      3 rows selected.
      All done BUT the problem is that there may be several groups of consecutive rows like this. In that case each group should be identified distinctly for GROUP BY clause to work as expected.

      I want to assign a unique number to each occurence of such group.

      How can I achieve this? Any ideas?

      Regards,

      Faraz

      Edited by: faanwar on May 10, 2012 3:36 PM
        • 1. Re: Identifying and grouping consecutive rows in sql
          Sudhakar_B
          Faraz,
          Take a look at PARTITION BY clause. I believe that will help meet the requirement.

          Search for date gap in this forum.
          May be.. Re: Find Missing Date Gap

          vr
          • 2. Re: Identifying and grouping consecutive rows in sql
            Nicosa-Oracle
            Hi,

            You need to use Tabibitosan Method from the FAQ : {message:id=9535978}
            Direct link >> {message:id=3989177}

            Edited by: Nicosa on May 10, 2012 3:04 PM (Added direct link)
            • 3. Re: Identifying and grouping consecutive rows in sql
              Nicosa-Oracle
              Well, actually, you'll need to tweak it a bit. something such as in :
              Scott@my11g SQL>l
                1  with t (id, dstart, dend, status)
                2  as (
                3       select 1,to_date('01-JUL-09','dd-MON-YY'),to_date('06-SEP-09','dd-MON-YY'),'VAC' from dual
                4       union all select 2,to_date('20-MAR-11','dd-MON-YY'),to_date('31-MAR-11','dd-MON-YY'),'VAC' from dual
                5       union all select 3,to_date('06-AUG-11','dd-MON-YY'),to_date('22-AUG-11','dd-MON-YY'),'VAC' from dual
                6       union all select 4,to_date('23-AUG-11','dd-MON-YY'),to_date('26-AUG-11','dd-MON-YY'),'VAC' from dual
                7       union all select 5,to_date('27-AUG-11','dd-MON-YY'),to_date('27-AUG-11','dd-MON-YY'),'VAC' from dual
                8  )
                9  ------ end of sample data ------
               10  select min(dstart) dstart, max(dend) dend, status, count(*) aggrows
               11  from (
               12       select
               13       id
               14            ,dstart
               15            ,dend
               16            ,status
               17            ,dend
               18                 -sum(dend-dstart) over (partition by status order by dstart)
               19                 -row_number() over (partition by status order by dstart) grp
               20       from t
               21  )
               22  group by grp, status
               23* order by grp, status
              Scott@my11g SQL>/
              
              DSTART              DEND                STA    AGGROWS
              ------------------- ------------------- --- ----------
              01/07/2009 00:00:00 06/09/2009 00:00:00 VAC          1
              20/03/2011 00:00:00 31/03/2011 00:00:00 VAC          1
              06/08/2011 00:00:00 27/08/2011 00:00:00 VAC          3
              • 4. Re: Identifying and grouping consecutive rows in sql
                Aketi Jyuuzou
                WOW very Cool arranged Tabibitosan method "Nicosa" ;-)
                I have arranged your solution :8}
                alter session set NLS_DATE_FORMAT = 'yyyy-mm-dd';
                
                with t(staD,endD) as(
                select date '2009-07-01',date '2009-09-06' from dual union
                select date '2011-03-20',date '2011-03-31' from dual union
                select date '2011-08-06',date '2011-08-22' from dual union
                select date '2011-08-23',date '2011-08-26' from dual union
                select date '2011-08-27',date '2011-08-27' from dual)
                select staD,endD,
                endD-staD+1 as "endD-staD+1",
                sum(endD-staD+1) over (order by staD) as SumDistance,
                endD - sum(endD-staD+1) over (order by staD) as gap
                  from t
                order by staD;
                
                staD        endD        endD-staD+1  SumDistance  gap
                ----------  ----------  -----------  -----------  ----------
                2009-07-01  2009-09-06           68           68  2009-06-30
                2011-03-20  2011-03-31           12           80  2011-01-10
                2011-08-06  2011-08-22           17           97  2011-05-17
                2011-08-23  2011-08-26            4          101  2011-05-17
                2011-08-27  2011-08-27            1          102  2011-05-17
                I imagined that there are 2 people called X and A.
                X always walks more than 1m (endD)
                A always walks more then 1m (sum(endD-staD+1) over (order by staD))
                • 5. Re: Identifying and grouping consecutive rows in sql
                  Aketi Jyuuzou
                  And there is famous way :D
                  alter session set NLS_DATE_FORMAT = 'yyyy-mm-dd';
                  
                  with t(staD,endD) as(
                  select date '2009-07-01',date '2009-09-06' from dual union
                  select date '2011-03-20',date '2011-03-31' from dual union
                  select date '2011-08-06',date '2011-08-22' from dual union
                  select date '2011-08-23',date '2011-08-26' from dual union
                  select date '2011-08-27',date '2011-08-27' from dual),
                  tmp1 as(
                  select staD,endD,
                  case when Lag(endD) over(order by staD)+1
                          = staD then 0 else 1 end as WillSum
                    from t),
                  tmp2 as(
                  select staD,endD,
                  sum(WillSum) over(order by staD) as grp
                    from tmp1)
                  select min(staD),max(endD),count(*)
                    from tmp2
                  group by grp
                  order by min(staD);
                  
                  MIN(staD)   MAX(endD)   COUNT(*)
                  ----------  ----------  --------
                  2009-07-01  2009-09-06         1
                  2011-03-20  2011-03-31         1
                  2011-08-06  2011-08-27         3
                  • 6. Re: Identifying and grouping consecutive rows in sql
                    Nicosa-Oracle
                    Aketi Jyuuzou wrote:
                    WOW very Cool arranged Tabibitosan method "Nicosa" ;-)
                    Thanks Aketi, it's a pleasure and an honour to have you saying that !
                    It made my day !
                    :-)

                    I learned the tabibitosan method from your post I gave a link to. Hence, I owe you a big Thank You for that.
                    • 7. Re: Identifying and grouping consecutive rows in sql
                      jihuyao
                      Here is another approach. Pick up the right combination of double lag/lead process. Use partition by in lag/lead function for multiple groups of the records

                      with t as (
                      select rownum id, t.* from
                       (select to_date('01-JUL-09','dd-MON-YY') dstart,
                            to_date('06-SEP-09','dd-MON-YY') dend,'VAC' status from dual
                       union all select to_date('20-MAR-11','dd-MON-YY'),
                            to_date('31-MAR-11','dd-MON-YY'),'VAC' from dual
                       union all select to_date('06-AUG-11','dd-MON-YY'),
                            to_date('22-AUG-11','dd-MON-YY'),'VAC' from dual
                       union all select to_date('23-AUG-11','dd-MON-YY'),
                            to_date('26-AUG-11','dd-MON-YY'),'VAC' from dual
                       union all select to_date('27-AUG-11','dd-MON-YY'),
                            to_date('27-AUG-11','dd-MON-YY'),'VAC' from dual) t
                      )
                      select id, status,
                      decode(group_start, null, dstart, group_start) group_start,
                      dend group_end,
                      group_cnt
                      from
                      (
                      select v.*,
                      lag(id, 1, 0) over (order by id) lag_id,
                      id-(lag(id, 1, 0) over (order by id)) group_cnt,
                      lag(lead_start) over (order by id) group_start,
                      lead_start-dend as gap
                      from
                      (
                      select t.*,
                      lag(dstart) over (order by id) lag_start,
                      lag(dend) over (order by id) lag_end,
                      lead(dstart) over (order by id) lead_start,
                      lead(dend) over (order by id) lead_end
                      from t
                      ) v
                      where lead_start-dend >1 or lead_start-dend is null
                      ) v1
                      /
                      
                      
                              ID STA GROUP_STA GROUP_END  GROUP_CNT
                      ---------- --- --------- --------- ----------
                               1 VAC 01-JUL-09 06-SEP-09          1
                               2 VAC 20-MAR-11 31-MAR-11          1
                               5 VAC 06-AUG-11 27-AUG-11          3
                      • 8. Re: Identifying and grouping consecutive rows in sql
                        faanwar
                        Fabulous! Thank you guys specially Nicosa and Aketi for such quick and useful responses. That was a big help to me.

                        Didn't know we could play with SQL this way. Learnt something new today :)

                        Thanks again

                        Faraz

                        Edited by: faanwar on May 12, 2012 11:33 AM