This discussion is archived
11 Replies Latest reply: Feb 21, 2013 2:26 PM by Frank Kulash RSS

Date overlap - sequence split

user12997203 Newbie
Currently Being Moderated
my source will be like this

ID               START DT     END DT             A     B     C     D
AAAA     01/01/1968     01/01/1997     Y     N     N     N
AAAA     01/01/1968     01/01/1999     N     Y     N     N
AAAA     01/01/1988     01/01/1997     N     N     Y     N

and i need break the sequence as below

ID              START DT                END DT             A     B     C     D
AAAA     01/01/1968     01/01/1988     Y     Y     N     N
AAAA     01/01/1988     01/01/1997     Y     Y     Y     N
AAAA     01/01/1997     01/01/1999     N     Y     N     N


i am with analytical function lag but it is giving one line .

can any one help me how to code this logic
  • 1. Re: Date overlap - sequence split
    SomeoneElse Guru
    Currently Being Moderated
    You need to say more stuff.

    What are your columns ABCD and how/why should they be updated? (is that really their names?)
  • 2. Re: Date overlap - sequence split
    Frank Kulash Guru
    Currently Being Moderated
    Hi,

    Whenever you have a problem, please post a little sample data (CREATE TABLE and INSERT statements, relevant columns only) from all tables involved, so that the people who want to help you can re-create the problem and test their ideas.
    Also post the results you want from that data, and an explanation of how you get those results from that data, with specific examples.
    Describe what role each of the columns plays in the problem. For example, it's hard to guess how the output you posted depends on id, since all the rows have the same value for id.
    Always say which version of Oracle you're using (for example, 11.2.0.2.0).
    See the forum FAQ {message:id=9360002}
    user12997203 wrote:
    my source will be like this

    ID               START DT     END DT             A     B     C     D
    AAAA     01/01/1968     01/01/1997     Y     N     N     N
    AAAA     01/01/1968     01/01/1999     N     Y     N     N
    AAAA     01/01/1988     01/01/1997     N     N     Y     N
    The forum FAQ also explains how to use \
     tags to post formatted text. 
    
    and i need break the sequence as below

    ID              START DT                END DT             A     B     C     D
    AAAA     01/01/1968     01/01/1988     Y     Y     N     N
    AAAA     01/01/1988     01/01/1997     Y     Y     Y     N
    AAAA     01/01/1997     01/01/1999     N     Y     N     N
    How do you get those results? In the output columns A, B, C and D, do you want 'Y' when some of the corresponding table columns have 'Y', but others have 'N'? >
    i am with analytical function lap but it is giving one line .
    Sorry, it's unclear what you mean.  It would help if you posted your code, even if it's not perfect. When you say "lap", do you mean "la<b>G</b>"?
    can any one help me how to code this logic
  • 3. Re: Date overlap - sequence split
    user12997203 Newbie
    Currently Being Moderated
    Hi

    apologize for not giving complete information in previous thread.

    here is the source table script:
    CREATE TABLE TEST_OVLP
    (
    M_ID VARCHAR2(11 BYTE) ,
    STRT_DT DATE,
    END_DT DATE,
    A VARCHAR2(1 BYTE),
    B VARCHAR2(1 BYTE),
    C VARCHAR2(1 BYTE),
    D VARCHAR2(1 BYTE)
    );


    insert into TEST_OVLP(M_ID,strt_dt,end_dt,a,b,c,d)values('AAAA',to_date('01/01/1968','mm/dd/YYYY'),to_date('01/01/1997','mm/dd/yyyy'),'Y','N','N','N');
    insert into TEST_OVLP(M_ID,strt_dt,end_dt,a,b,c,d)values('AAAA',to_date('01/01/1968','mm/dd/yyyy'),to_date('01/01/1999','mm/dd/yyyy'),'N','Y','N','N');
    insert into TEST_OVLP(M_ID,strt_dt,end_dt,a,b,c,d)values('AAAA',to_date('01/01/1988','mm/dd/yyyy'),to_date('01/01/1997','mm/dd/yyyy'),'N','N','Y','N');
    COMMIT;

    A,B,C, D are indicators for specific coverage types.So for a specific id i want to break down start dates and end dates and make indicator = 'Y' for coverage types they already have between the new range of start date and end date.

    i tried to use lag function but it is making as one line.

    Edited by: user12997203 on Feb 21, 2013 10:29 AM

    Edited by: user12997203 on Feb 21, 2013 11:17 AM
  • 4. Re: Date overlap - sequence split
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    user12997203 wrote:
    Hi

    apologize for not giving complete information in previous thread.
    I still don't understand what role m_id plays in this problem, and I can't guess what results you would want if m_id had different values.

    Aside from that, this does what you requested:
    WITH     coverage_change          AS
    (
         SELECT     strt_dt                         AS dt
         ,     CASE WHEN a = 'Y' THEN  1 ELSE 0 END     AS a_num
         ,     CASE WHEN b = 'Y' THEN  1 ELSE 0 END     AS b_num
         ,     CASE WHEN c = 'Y' THEN  1 ELSE 0 END     AS c_num
         ,     CASE WHEN d = 'Y' THEN  1 ELSE 0 END     AS d_num
         FROM     test_ovlp
        UNION ALL
         SELECT     end_dt                         AS dt
         ,     CASE WHEN a = 'Y' THEN -1 ELSE 0 END     AS a_num
         ,     CASE WHEN b = 'Y' THEN -1 ELSE 0 END     AS b_num
         ,     CASE WHEN c = 'Y' THEN -1 ELSE 0 END     AS c_num
         ,     CASE WHEN d = 'Y' THEN -1 ELSE 0 END     AS d_num
         FROM     test_ovlp
    )
    ,     net_coverage     AS
    (
         SELECT DISTINCT
                 dt
         ,     SUM (a_num) OVER (ORDER BY dt)          AS a_net
         ,     SUM (b_num) OVER (ORDER BY dt)          AS b_net
         ,     SUM (c_num) OVER (ORDER BY dt)          AS c_net
         ,     SUM (d_num) OVER (ORDER BY dt)          AS d_net
         ,     DENSE_RANK () OVER (ORDER BY dt DESC)     AS from_end
         ,     MAX (dt) OVER ()                       AS max_dt
         FROM     coverage_change
    )
    SELECT       dt                              AS strt_dt
    ,       LEAD (dt, 1, max_dt) OVER (ORDER BY dt)     AS end_dt
    ,       CASE WHEN a_net > 0 THEN 'Y' ELSE 'N' END     AS a
    ,       CASE WHEN b_net > 0 THEN 'Y' ELSE 'N' END     AS b
    ,       CASE WHEN c_net > 0 THEN 'Y' ELSE 'N' END     AS c
    ,       CASE WHEN d_net > 0 THEN 'Y' ELSE 'N' END     AS d
    FROM       net_coverage
    WHERE       from_end     > 1
    ORDER BY  dt
    ;
    Output:
    STRT_DT    END_DT     A B C D
    ---------- ---------- - - - -
    01/01/1968 01/01/1988 Y Y N N
    01/01/1988 01/01/1997 Y Y Y N
    01/01/1997 01/01/1999 N Y N N
    Depending on what m_id is doing in this problem, you may just need to add m_id to all the SELECT clauses, and add "PARTITION BY m_id" to all the analytic clauses.
  • 5. Re: Date overlap - sequence split
    user12997203 Newbie
    Currently Being Moderated
    thanks a lot Frank.

    i may have some thousands of distinct id and different ranges. i will add the m_id condition based on the information you have given and let you know by testing some sample cases.
  • 6. Re: Date overlap - sequence split
    user12997203 Newbie
    Currently Being Moderated
    i have added m_id condition to query given by frank and it is working fine. but i would like the end date to be next record start date - 1 one day fro that particular m_id

    i have modifies that and it is working but last record end is also getting subtracted by one day
    WITH     coverage_change  AS
    (
         SELECT     m_id as m_id,strt_dt     AS dt
         ,     CASE WHEN a = 'Y' THEN  1 ELSE 0 END     AS a_num
         ,     CASE WHEN b = 'Y' THEN  1 ELSE 0 END     AS b_num
         ,     CASE WHEN c = 'Y' THEN  1 ELSE 0 END     AS c_num
         ,     CASE WHEN d = 'Y' THEN  1 ELSE 0 END     AS d_num
         FROM     test_ovlp
        UNION ALL
         SELECT     m_id as m_id,end_dt     AS dt
         ,     CASE WHEN a = 'Y' THEN -1 ELSE 0 END     AS a_num
         ,     CASE WHEN b = 'Y' THEN -1 ELSE 0 END     AS b_num
         ,     CASE WHEN c = 'Y' THEN -1 ELSE 0 END     AS c_num
         ,     CASE WHEN D = 'Y' THEN -1 ELSE 0 END     AS D_NUM
         FROM     test_ovlp
    )
    ,     net_coverage     AS
    (
         SELECT DISTINCT
                 M_ID AS M_ID,DT
         ,     SUM (A_NUM) OVER (PARTITION BY M_ID ORDER BY DT)          AS A_NET
         ,     SUM (B_NUM) OVER (PARTITION BY M_ID ORDER BY DT)          AS B_NET
         ,     SUM (C_NUM) OVER (PARTITION BY M_ID ORDER BY DT)          AS C_NET
         ,     SUM (D_NUM) OVER (PARTITION BY M_ID ORDER BY DT)          AS D_NET
         ,     DENSE_RANK () OVER (PARTITION BY M_ID ORDER BY dt DESC)     AS from_end
         ,     MAX (dt) OVER ()                       AS max_dt
         FROM     coverage_change
    )
    SELECT       M_ID AS M_ID,DT                              AS STRT_DT
    ,       (LEAD (dt, 1, max_dt) OVER (PARTITION BY M_ID ORDER BY dt)-1)     AS end_dt
    ,       CASE WHEN a_net > 0 THEN 'Y' ELSE 'N' END     AS a
    ,       CASE WHEN b_net > 0 THEN 'Y' ELSE 'N' END     AS b
    ,       CASE WHEN c_net > 0 THEN 'Y' ELSE 'N' END     AS c
    ,       CASE WHEN d_net > 0 THEN 'Y' ELSE 'N' END     AS d
    FROM       net_coverage
    WHERE       from_end     > 1
    ORDER BY  dt;
    STRT_DT    END_DT     A B C D
    ---------- ---------- - - - -
    01/01/1968 12/31/1987 Y Y N N
    01/01/1988 12/31/1996 Y Y Y N
    01/01/1997 12/31/1998 N Y N N
    expected result :
    STRT_DT    END_DT     A B C D
    ---------- ---------- - - - -
    01/01/1968 12/31/1987 Y Y N N
    01/01/1988 12/31/1996 Y Y Y N
    01/01/1997 01/01/1999 N Y N N
    Edited by: user12997203 on Feb 21, 2013 12:22 PM
  • 7. Re: Date overlap - sequence split
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    user12997203 wrote:
    i have added m_id condition to query given by frank and it is working fine. but i would like the end date to be next record start date - 1 one day fro that particular m_id

    i have modifies that and it is working but last record end is also getting subtracted by one day
    So you're subtracting 1 day from all the end_dates, but you don't want to subtract it from the very last end_dt, the one that comes from max_dt.

    The simplest solution is to change max_dt, so that it's 1 day later than the date in your table. Then you can subtract 1 day from all the end_dts, just like you're doing now. So near the end of net_coverage, change the line that currently says
         ,     MAX (dt) OVER ()                          AS max_dt
    to
         ,     MAX (dt) OVER () + 1                      AS max_dt
     

    Incidentally,
    WITH     coverage_change  AS
    (
         SELECT     m_id as m_id, ...
    There's no point in using an alias that's the same as the column name. You can simply say:
    ...     SELECT  m_id, ...
  • 8. Re: Date overlap - sequence split
    user12997203 Newbie
    Currently Being Moderated
    Hi Frank,

    That is throwing me an error because one of my record has the max date (i.e it has value of 12-31-9999) ,so i cannot increment it by 1.

    this is the error i am seeing:
    ORA-01841: (full) year must be between -4713 and +9999, and not be 0
    01841. 00000 -  "(full) year must be between -4713 and +9999, and not be 0"
    *Cause:    Illegal year entered
    *Action:   Input year in the specified range
    Is there any other way to keep the end_date of the last record same?
  • 9. Re: Date overlap - sequence split
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    user12997203 wrote:
    Hi Frank,

    That is throwing me an error because one of my record has the max date (i.e it has value of 12-31-9999) ,so i cannot increment it by 1.
    ...
    Is there any other way to keep the end_date of the last record same?
    Yes, and I should have seen it earlier. Sorry about that.

    Forget about the change I suggested in my last message.
    In the main query, subtract 1 directly from dt in the argument to LEAD, and not from the value returned by LEAD.
    That is, the end of net_coverage and the beginning of the main query whould look like this:
    ...
         ,     MAX (dt) OVER ()                     AS max_dt
         FROM     coverage_change
    )
    SELECT       m_id
    ,       dt                              AS strt_dt
    ,       LEAD ( dt - 1          -- subtract 1 here ...
                , 1
                , max_dt
                ) OVER ( PARTITION BY  m_id 
                                ORDER BY      dt
                    )          -- ... and not here
                                       AS end_dt
    ,       CASE WHEN a_net > 0 THEN 'Y' ELSE 'N' END     AS a
    ...
  • 10. Re: Date overlap - sequence split
    user12997203 Newbie
    Currently Being Moderated
    Hi Frank,

    That is wonderful.But i want to test it more so i added three more rows with diff dates and these records have a gap between the dates .
    Insert into TEST_OVLP (M_ID,STRT_DT,END_DT,A,B,C,D) values ('BBBB',to_date('01-SEP-95 00:00:00','DD-MON-RR hh24:mi:ss'),to_date('31-DEC-99 00:00:00','DD-MON-RR hh24:mi:ss'),'Y','N','N','N');
    Insert into TEST_OVLP (M_ID,STRT_DT,END_DT,A,B,C,D) values ('BBBB',to_date('01-SEP-96 00:00:00','DD-MON-RR hh24:mi:ss'),to_date('31-DEC-99 00:00:00','DD-MON-RR hh24:mi:ss'),'N','Y','N','N');
    Insert into TEST_OVLP (M_ID,STRT_DT,END_DT,A,B,C,D) values ('BBBB',to_date('01-JUN-00 00:00:00','DD-MON-RR hh24:mi:ss'),to_date('30-SEP-00 00:00:00','DD-MON-RR hh24:mi:ss'),'N','N','Y','N');
    commit;
    and then the result is
    WITH     coverage_change  AS
    (
         SELECT     m_id as m_id,strt_dt     AS dt
         ,     CASE WHEN a = 'Y' THEN  1 ELSE 0 END     AS a_num
         ,     CASE WHEN b = 'Y' THEN  1 ELSE 0 END     AS b_num
         ,     CASE WHEN c = 'Y' THEN  1 ELSE 0 END     AS c_num
         ,     CASE WHEN d = 'Y' THEN  1 ELSE 0 END     AS d_num
         FROM     test_ovlp
        UNION ALL
         SELECT     m_id as m_id,end_dt     AS dt
         ,     CASE WHEN a = 'Y' THEN -1 ELSE 0 END     AS a_num
         ,     CASE WHEN b = 'Y' THEN -1 ELSE 0 END     AS b_num
         ,     CASE WHEN c = 'Y' THEN -1 ELSE 0 END     AS c_num
         ,     CASE WHEN D = 'Y' THEN -1 ELSE 0 END     AS D_NUM
         FROM     test_ovlp
    )
    ,     net_coverage     AS
    (
         SELECT DISTINCT
                 M_ID AS M_ID,DT
         ,     SUM (A_NUM) OVER (PARTITION BY M_ID ORDER BY DT)          AS A_NET
         ,     SUM (B_NUM) OVER (PARTITION BY M_ID ORDER BY DT)          AS B_NET
         ,     SUM (C_NUM) OVER (PARTITION BY M_ID ORDER BY DT)          AS C_NET
         ,     SUM (D_NUM) OVER (PARTITION BY M_ID ORDER BY DT)          AS D_NET
         ,     dense_rank () over (partition by m_id order by dt desc)     as from_end
         ,     MAX (dt) OVER ()                        AS max_dt
         FROM     coverage_change
    )
    select       m_id as m_id,dt                              as strt_dt
    ,       (LEAD (dt-1, 1, max_dt) OVER (PARTITION BY M_ID ORDER BY dt))     AS end_dt
    ,       CASE WHEN a_net > 0 THEN 'Y' ELSE 'N' END     AS a
    ,       CASE WHEN b_net > 0 THEN 'Y' ELSE 'N' END     AS b
    ,       CASE WHEN c_net > 0 THEN 'Y' ELSE 'N' END     AS c
    ,       CASE WHEN d_net > 0 THEN 'Y' ELSE 'N' END     AS d
    FROM       net_coverage
    where       from_end     > 1
    ORDER BY  m_id,dt;
    Result:
    M_ID                     STRT_DT                   END_DT      A     B     C     D
    BBBB        01-SEP-95 00:00:00     31-AUG-96 00:00:00     Y     N     N     N
    BBBB        01-SEP-96 00:00:00     30-DEC-99 00:00:00     Y     Y     N     N
    *BBBB     31-DEC-99 00:00:00     31-MAY-00 00:00:00     N     N     N     N*
    BBBB       01-JUN-00 00:00:00     30-SEP-00 00:00:00     N     N     Y     N
    AAAA          01-JAN-68 00:00:00  31-DEC-87 00:00:00     Y     Y     N     N
    AAAA         01-JAN-88 00:00:00         31-DEC-96 00:00:00     Y     Y     Y     N
    AAAA         01-JAN-97 00:00:00         30-SEP-00 00:00:00  N     Y     N     N
    in the result set i really dont need a record if it is all N's and the start date of the next record if it has a gap in the dates should be same.

    expected output:
    M_ID                     STRT_DT                   END_DT     A     B     C     D
    BBBB    01-SEP-95 00:00:00     31-AUG-96 00:00:00     Y     N     N     N
    BBBB    01-SEP-96 00:00:00     31-DEC-99 00:00:00     Y     Y     N     N
    BBBB    01-JUN-00 00:00:00     30-SEP-00 00:00:00     N     N     Y     N
    AAAA          01-JAN-68 00:00:00       31-DEC-87 00:00:00     Y     Y     N     N
    AAAA         01-JAN-88 00:00:00         31-DEC-96 00:00:00     Y     Y     Y     N
    AAAA         01-JAN-97 00:00:00         30-SEP-00 00:00:00  N     Y     N     N
  • 11. Re: Date overlap - sequence split
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    user12997203 wrote:
    ... in the result set i really dont need a record if it is all N's
    Okay, I understand this much
    and the start date of the next record if it has a gap in the dates should be same.
    You lost me. If we just get rid of the output rows with all 'N's, then we get these results:
    expected output:
    M_ID                     STRT_DT                   END_DT     A     B     C     D
    BBBB    01-SEP-95 00:00:00     31-AUG-96 00:00:00     Y     N     N     N
    BBBB    01-SEP-96 00:00:00     31-DEC-99 00:00:00     Y     Y     N     N
    BBBB    01-JUN-00 00:00:00     30-SEP-00 00:00:00     N     N     Y     N
    AAAA          01-JAN-68 00:00:00       31-DEC-87 00:00:00     Y     Y     N     N
    AAAA         01-JAN-88 00:00:00         31-DEC-96 00:00:00     Y     Y     Y     N
    AAAA         01-JAN-97 00:00:00         30-SEP-00 00:00:00  N     Y     N     N
    The only complication is that we have to remove the rows with all 'N's after end_dt is computed, because end_dt depends on the dt from the row with all 'N's.

    Here's one way to do that:
    WITH     coverage_change  AS
    (
         SELECT     m_id
         ,     strt_dt                         AS dt
         ,     CASE WHEN a = 'Y' THEN  1 ELSE 0 END     AS a_num
         ,     CASE WHEN b = 'Y' THEN  1 ELSE 0 END     AS b_num
         ,     CASE WHEN c = 'Y' THEN  1 ELSE 0 END     AS c_num
         ,     CASE WHEN d = 'Y' THEN  1 ELSE 0 END     AS d_num
         FROM     test_ovlp
        UNION ALL
         SELECT     m_id
         ,     end_dt                         AS dt
         ,     CASE WHEN a = 'Y' THEN -1 ELSE 0 END     AS a_num
         ,     CASE WHEN b = 'Y' THEN -1 ELSE 0 END     AS b_num
         ,     CASE WHEN c = 'Y' THEN -1 ELSE 0 END     AS c_num
         ,     CASE WHEN D = 'Y' THEN -1 ELSE 0 END     AS D_NUM
         FROM     test_ovlp
    )
    ,     net_coverage     AS
    (
         SELECT DISTINCT
                 m_id
         ,     dt
         ,     SUM (A_NUM) OVER (PARTITION BY M_ID ORDER BY DT)     AS a_net
         ,     SUM (B_NUM) OVER (PARTITION BY M_ID ORDER BY DT)     AS b_net
         ,     SUM (C_NUM) OVER (PARTITION BY M_ID ORDER BY DT)     AS c_net
         ,     SUM (D_NUM) OVER (PARTITION BY M_ID ORDER BY DT)     AS d_net
         ,     DENSE_RANK () OVER (PARTITION BY M_ID ORDER BY dt DESC)     AS from_end
         ,     MAX (dt) OVER ()                                        AS max_dt
         FROM     coverage_change
    )
    ,     got_end_dt     AS
    (
         SELECT       m_id
         ,       dt                              AS strt_dt
         ,       LEAD ( dt - 1
                     , 1
                            , max_dt
                            ) OVER ( PARTITION BY  m_id 
                                           ORDER BY      dt
                               )     
                                            AS end_dt
            ,       CASE WHEN a_net > 0 THEN 'Y' ELSE 'N' END     AS a
         ,       CASE WHEN b_net > 0 THEN 'Y' ELSE 'N' END     AS b
         ,       CASE WHEN c_net > 0 THEN 'Y' ELSE 'N' END     AS c
         ,       CASE WHEN d_net > 0 THEN 'Y' ELSE 'N' END     AS d
         FROM       net_coverage
         WHERE       from_end     > 1
    )
    SELECT       m_id, strt_dt, end_dt
    ,       a, b, c, d
    FROM       got_end_dt
    WHERE       'Y'          IN (a, b, c, d)
    ORDER BY  m_id          DESC
    ,            strt_dt;
    The first 2 sub-queries, coverage_change and net_coverage, are exactly the same as in my last message.
    What used to be the main query (except for the ORDER BY clause) is now the sub-query called got_end_dt.
    The new main query just excludes the all-'N' rows, and sorts.

    If this is really what you need to do, then the query can probably be simplified a little. I put some tricky code into net_coverage to avoid another sub-query, but now, since we need another sub-query anyway (to remove the all-'N' rows) there might be a better way.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points