Forum Stats

  • 3,734,036 Users
  • 2,246,863 Discussions
  • 7,857,005 Comments

Discussions

Help with Analytic Function

Jambalahot
Jambalahot Member Posts: 28
edited July 2012 in SQL & PL/SQL
Here is a sample data from table :
ID    NAME             Start                  
1	SARA	        01-JAN-2006	
2	SARA	        03-FEB-2006	
3	LAMBDA	        21-MAR-2006	
4	SARA	        13-APR-2006	
5	LAMBDA	        01-JAN-2007	
6	LAMBDA	        01-SEP-2007	
I would like to get this :
Name        Start               Stop
SARA        01-JAN-2006    20-MAR-2006
LAMBDA      21-MAR-2006     12-APR-2006
SARA        13-APR-2006     31-DEC-2006
LAMBDA      01-JAN-2007      <null>
I Have tried using partition and lead function but partition over name is combining all Sara rows and Lambda rows into one group/partition which is not I am trying to get.
Is there any analytic function or other way of doing this to combine the date ranges only when same person is appearing conescutively?
Thanks.

Best Answer

  • Boneist
    Boneist Member Posts: 4,983 Gold Trophy
    edited July 2012 Accepted Answer
    This can be easily achieved by using tabibitosan:

    First off, you have to identify the "groups" each name in the list belongs to
    with sample_data as (select 1 id, 'SARA' name, to_date('01/01/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 2 id, 'SARA' name, to_date('03/02/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 3 id, 'LAMBDA' name, to_date('21/03/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 4 id, 'SARA' name, to_date('13/04/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 5 id, 'LAMBDA' name, to_date('01/01/2007', 'dd/mm/yyyy') start_date from dual union all
                         select 6 id, 'LAMBDA' name, to_date('01/09/2007', 'dd/mm/yyyy') start_date from dual)
    select id,
           name,
           start_date,
           lead(start_date, 1, to_date('31/12/9999', 'dd/mm/yyyy')) over (order by start_date) next_start_date,
           row_number() over (order by start_date)
             - row_number() over (partition by name order by start_date) grp
    from   sample_data;
    
            ID NAME   START_DATE NEXT_START_DATE        GRP
    ---------- ------ ---------- --------------- ----------
             1 SARA   01/01/2006 03/02/2006               0
             2 SARA   03/02/2006 21/03/2006               0
             3 LAMBDA 21/03/2006 13/04/2006               2
             4 SARA   13/04/2006 01/01/2007               1
             5 LAMBDA 01/01/2007 01/09/2007               3
             6 LAMBDA 01/09/2007 31/12/9999               3
    You can see that the group number is generated by comparing the overall rownumber of the set of rows (in order) with the rownumber of the set of rows per name (in the same order) - when there's a gap because another name appears inbetween, the group number changes.

    Once you've identified the group number for each set of rows, it's then easy to find the min / max values in that group:
                         
    with sample_data as (select 1 id, 'SARA' name, to_date('01/01/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 2 id, 'SARA' name, to_date('03/02/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 3 id, 'LAMBDA' name, to_date('21/03/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 4 id, 'SARA' name, to_date('13/04/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 5 id, 'LAMBDA' name, to_date('01/01/2007', 'dd/mm/yyyy') start_date from dual union all
                         select 6 id, 'LAMBDA' name, to_date('01/09/2007', 'dd/mm/yyyy') start_date from dual),
         tabibitosan as (select id,
                                name,
                                start_date,
                                lead(start_date, 1, to_date('31/12/9999', 'dd/mm/yyyy')) over (order by start_date) next_start_date,
                                row_number() over (order by start_date)
                                  - row_number() over (partition by name order by start_date) grp
                         from   sample_data)
    select name,
           min(start_date) start_date,
           max(next_start_date) stop_date
    from   tabibitosan
    group by name, grp
    order by start_date;
    
    NAME   START_DATE STOP_DATE 
    ------ ---------- ----------
    SARA   01/01/2006 21/03/2006
    LAMBDA 21/03/2006 13/04/2006
    SARA   13/04/2006 01/01/2007
    LAMBDA 01/01/2007 31/12/9999
    If you want the max date to appear as null, you'll have to use a cast or decode to change it - I'll leave that as an exercise for you to do! I'll also leave it to you to work out how to get the previous day for the stop_date.

Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,497 Red Diamond
    edited July 2012
    Hi,

    LEAD (start) OVER (PARTITION BY name ...) finds the start from the next row with the same name, but you want the next row with a different name. That would simply be the next row if you never had consecutive rows with the same name, so begin by removing the duplicate consecutive rows.

    Here's one way:
    WITH   got_prev_name 	    AS
    (
    	SELECT	id, name, start
    	,	LAG (name) OVER (ORDER BY  start)	AS prev_name
    	FROM	table_x
    )
    SELECT	name
    ,	start
    ,	LEAD (start) OVER (ORDER BY  start) - 1		AS stop
    FROM	got_prev_name
    WHERE	prev_name	!= name
    OR	prev_name	IS NULL
    ;
    If you'd care to post CREATE TABLE and INSERT statements for your sample data, then I could test this.

    This assumes that name is not NULL, and that start is unique. If those conditins are not true, it can be adjusted.
  • indra budiantho
    indra budiantho Member Posts: 1,387
    /* Formatted on 7/20/2012 6:05:13 PM (QP5 v5.139.911.3011) */
    WITH t AS (SELECT 1 ID, 'SARA' NAME, '01-JAN-2006' st FROM DUAL
               UNION ALL
               SELECT 2, 'SARA', '03-FEB-2006' FROM DUAL
               UNION ALL
               SELECT 3, 'LAMBDA', '21-MAR-2006' FROM DUAL
               UNION ALL
               SELECT 4, 'SARA', '13-APR-2006' FROM DUAL
               UNION ALL
               SELECT 5, 'LAMBDA', '01-JAN-2007' FROM DUAL
               UNION ALL
               SELECT 6, 'LAMBDA', '01-SEP-2007' FROM DUAL),
         q
            AS (SELECT t.*,
                       LEAD (name) OVER (ORDER BY TO_DATE (st, 'dd-mon-yyyy')) ld,
                       LAG (name) OVER (ORDER BY TO_DATE (st, 'dd-mon-yyyy')) lg
                  FROM t)
    SELECT s.id,
           s.name,
           s.st,
           (LEAD (TO_DATE (st, 'dd-mon-yyyy') - 1)
               OVER (ORDER BY TO_DATE (st, 'dd-mon-yyyy')))
              stoop
      FROM (SELECT q.*
              FROM q
             WHERE ld != name AND lg != name
            UNION
            SELECT q.*
              FROM q
             WHERE name = ld OR (ld IS NULL AND name != lg)) s;
    output:
    1 SARA 01-JAN-2006 3/20/2006
    3 LAMBDA 21-MAR-2006 4/12/2006
    4 SARA 13-APR-2006 12/31/2006
    5 LAMBDA 01-JAN-2007
    indra budiantho
  • Boneist
    Boneist Member Posts: 4,983 Gold Trophy
    edited July 2012 Accepted Answer
    This can be easily achieved by using tabibitosan:

    First off, you have to identify the "groups" each name in the list belongs to
    with sample_data as (select 1 id, 'SARA' name, to_date('01/01/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 2 id, 'SARA' name, to_date('03/02/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 3 id, 'LAMBDA' name, to_date('21/03/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 4 id, 'SARA' name, to_date('13/04/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 5 id, 'LAMBDA' name, to_date('01/01/2007', 'dd/mm/yyyy') start_date from dual union all
                         select 6 id, 'LAMBDA' name, to_date('01/09/2007', 'dd/mm/yyyy') start_date from dual)
    select id,
           name,
           start_date,
           lead(start_date, 1, to_date('31/12/9999', 'dd/mm/yyyy')) over (order by start_date) next_start_date,
           row_number() over (order by start_date)
             - row_number() over (partition by name order by start_date) grp
    from   sample_data;
    
            ID NAME   START_DATE NEXT_START_DATE        GRP
    ---------- ------ ---------- --------------- ----------
             1 SARA   01/01/2006 03/02/2006               0
             2 SARA   03/02/2006 21/03/2006               0
             3 LAMBDA 21/03/2006 13/04/2006               2
             4 SARA   13/04/2006 01/01/2007               1
             5 LAMBDA 01/01/2007 01/09/2007               3
             6 LAMBDA 01/09/2007 31/12/9999               3
    You can see that the group number is generated by comparing the overall rownumber of the set of rows (in order) with the rownumber of the set of rows per name (in the same order) - when there's a gap because another name appears inbetween, the group number changes.

    Once you've identified the group number for each set of rows, it's then easy to find the min / max values in that group:
                         
    with sample_data as (select 1 id, 'SARA' name, to_date('01/01/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 2 id, 'SARA' name, to_date('03/02/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 3 id, 'LAMBDA' name, to_date('21/03/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 4 id, 'SARA' name, to_date('13/04/2006', 'dd/mm/yyyy') start_date from dual union all
                         select 5 id, 'LAMBDA' name, to_date('01/01/2007', 'dd/mm/yyyy') start_date from dual union all
                         select 6 id, 'LAMBDA' name, to_date('01/09/2007', 'dd/mm/yyyy') start_date from dual),
         tabibitosan as (select id,
                                name,
                                start_date,
                                lead(start_date, 1, to_date('31/12/9999', 'dd/mm/yyyy')) over (order by start_date) next_start_date,
                                row_number() over (order by start_date)
                                  - row_number() over (partition by name order by start_date) grp
                         from   sample_data)
    select name,
           min(start_date) start_date,
           max(next_start_date) stop_date
    from   tabibitosan
    group by name, grp
    order by start_date;
    
    NAME   START_DATE STOP_DATE 
    ------ ---------- ----------
    SARA   01/01/2006 21/03/2006
    LAMBDA 21/03/2006 13/04/2006
    SARA   13/04/2006 01/01/2007
    LAMBDA 01/01/2007 31/12/9999
    If you want the max date to appear as null, you'll have to use a cast or decode to change it - I'll leave that as an exercise for you to do! I'll also leave it to you to work out how to get the previous day for the stop_date.
  • Nicosa-Oracle
    Nicosa-Oracle Member Posts: 1,319 Employee
    Hi,

    Also wanted to give it a try, but without having some hard-coded value in it :
    [email protected] SQL>!cat afiedt.buf
    with sample_data as (
    	select 1 id, 'SARA' name, to_date('01/01/2006', 'dd/mm/yyyy') start_date from dual union all
    	select 2 id, 'SARA' name, to_date('03/02/2006', 'dd/mm/yyyy') start_date from dual union all
    	select 3 id, 'LAMBDA' name, to_date('21/03/2006', 'dd/mm/yyyy') start_date from dual union all
    	select 4 id, 'SARA' name, to_date('13/04/2006', 'dd/mm/yyyy') start_date from dual union all
    	select 5 id, 'LAMBDA' name, to_date('01/01/2007', 'dd/mm/yyyy') start_date from dual union all
    	select 6 id, 'LAMBDA' name, to_date('01/09/2007', 'dd/mm/yyyy') start_date from dual
    )
    select * from (
    	select name, min(s) s,e
    	from (
    		select
    			name
    			,start_date s
    			,grp
    			,last_value(next_date respect nulls)
    				over (partition by grp,name order by start_date rows between unbounded preceding and unbounded following) e
    		from (
    			select
    				id
    				,name
    				,start_date
    				,prev_date
    				,next_date
    				,start_date
    					- ((row_number() over (partition by name order by start_date))-1)
    					- nvl(sum(start_date-prev_date) over (partition by name order by start_date),0) grp
    			from (
    				select
    					id
    					,name
    					,start_date
    					,(lag(start_date) over (order by start_date))+1 prev_date
    					,(lead(start_date) over (order by start_date))-1 next_date
    				from sample_data
    			)
    		)
    	)
    	group by grp,name,e
    )
    order by s
    /
    
    [email protected] SQL>/
    
    NAME   S          E
    ------ ---------- ----------
    SARA   2006/01/01 2006/03/20
    LAMBDA 2006/03/21 2006/04/12
    SARA   2006/04/13 2006/12/31
    LAMBDA 2007/01/01
    I'm pretty sure that could be simplified...
    Nicosa-Oracle
  • Jambalahot
    Jambalahot Member Posts: 28
    Thanks Everyone for the feedback.
    All the solutions provided worked perfect for me.
    Learnt few things from your input. Appreciate it.
  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    There is One solution using Tabibitosan.
    But I think solution of "Frank Kulash" is very simple :-)
    ALTER SESSION SET NLS_DATE_FORMAT = 'yyyy-mm-dd';
    
    with t(ID,NAME,StaD) as(
    select 1,'AAAA',date '2006-01-01' from dual union
    select 2,'AAAA',date '2006-02-03' from dual union
    select 3,'BBBB',date '2006-03-21' from dual union
    select 4,'AAAA',date '2006-04-13' from dual union
    select 5,'BBBB',date '2007-01-01' from dual union
    select 6,'BBBB',date '2007-09-01' from dual),
    tmp as(
    select ID,Name,StaD,
     Row_Number() over(                  order by StaD)
    -Row_Number() over(partition by Name order by StaD) as Dis
      from t)
    select min(ID) as ID,Name,min(StaD) as StaD,
    Lead(min(StaD)) over(order by min(StaD)) -1 as Stop
      from tmp
    group by Name,Dis
    order by ID;
    
    ID  NAME  STAD        STOP
    --  ----  ----------  ----------
     1  AAAA  2006-01-01  2006-03-20
     3  BBBB  2006-03-21  2006-04-12
     4  AAAA  2006-04-13  2006-12-31
     5  BBBB  2007-01-01  null
This discussion has been closed.