Forum Stats

  • 3,759,024 Users
  • 2,251,494 Discussions
  • 7,870,473 Comments

Discussions

Need help with Oracle SQL to merge records based on Effective and Term dates

user13040446
user13040446 Member Posts: 4
edited Mar 18, 2014 10:31AM in SQL & PL/SQL

Hello all,

I need some help figuring out this little challenge.

I have tables of groups and flags and effective and term dates relative to these flags as per the following example :

Group_nameFlag_AFlag_BEff_dateTerm_date
Group_AYY2011010199991231
Group_ANN2010010120101231
Group_ANN2009010120091231
Group_ANN2006010120081231
Group_ANY2004010120051231
Group_AYY2003010120031231
Group_BNY2004010199991231
Group_BNY2003010120031231

As you can see, group_A had the same flag combination (N,N) for three successive time periods. I would like to merge all time periods having the same flags into one. In which the effective date will be that of the earliest time period (underlined) and the term date will be the latest (underlined)

So the end result would look like this :

Group_nameFlag_AFlag_BEff_dateTerm_date
Group_AYY2011010199991231
Group_ANN2006010120101231
Group_ANY2004010120051231
Group_AYY2003010120031231
Group_BNY2003010199991231

Thanks for your help

Here's the DDL script

drop table TMP_group_test;

create table TMP_group_test (group_name varchar2 (8)

,flag_a varchar2(1)

,flag_b varchar2(1)

,eff_date varchar2(8)

,term_date varchar2(8)

                            );

insert into TMP_group_test values ('Group_A', 'Y', 'Y', '20110101',          '99991231');

insert into TMP_group_test values ('Group_A', 'N', 'N', '20100101',          '20101231');

insert into TMP_group_test values ('Group_A', 'N', 'N', '20090101',          '20091231');

insert into TMP_group_test values ('Group_A', 'N', 'N', '20060101',          '20081231');

insert into TMP_group_test values ('Group_A', 'N', 'Y', '20040101',          '20051231');

insert into TMP_group_test values ('Group_A', 'Y', 'Y', '20030101',          '20031231');

insert into TMP_group_test values ('Group_B', 'N', 'Y', '20040101',          '99991231');

insert into TMP_group_test values ('Group_B', 'N', 'Y', '20030101',          '20031231');

commit;

Message was edited by: user13040446

This is the closest I've been to the solution


I create two ranks ;

Rnk1 : partition by group name, order by eff_date desc : this rank will sort the records from the most recent and reset for each group\

Rnk2: (dense) partition by group name , order by flag_A, flagb : this rank gives each combination of group\flag a number so that they are categorised as “families”

Then I use the analytical min function

Min(eff_date) over (partition by group_name, rnk2) : the idea is that for each member of the same family, the new effective date is the min of the family (and the max for the term date),  at the end I just need a distinct so that the duplicates are gone

Now the problem. As you can see from the query below, record 1 and record 6 (as identified by rownum) are identified as the same family, because they have the same flag combination, but they are not successive so each should keep his own effective date.

If only I can make it distinguish between those two that would solve my problem


Query:


select rownum,group_name, flag_a, flag_b, eff_date, term_date, rnk1, rnk2

,min(eff_date) over (partition by group_name, rnk2) min_eff

from

(

select rownum,

group_name, flag_a, flag_b,eff_date, term_date

,rank() over (partition by group_name order by eff_date desc) rnk1

,dense_rank() over (partition by group_name order by flag_A,flag_B ) rnk2

from dsreports.tmp_group_test

)order by rownum

Tagged:
user13040446

Best Answer

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,060 Red Diamond
    edited Mar 17, 2014 1:16PM Accepted Answer

    Hi,

    user13040446 wrote:
    
    Hi KSI, 
    Thanks for your input, you were able to distinguish between those highlighted rows but lost rows 2,3,4 which are supposed to have the same min effective date = 20060101.
    Please refer to the desired chart to see the end result I would like to reach
    
    Thanks Again
    
    

    That first reply is basically correct, but in the main query, you want to use the aggregate MIN function, not the analytic function, and GROUP BY the columns with the common values, like this:

    WITH    got_output_group    AS
    (
        SELECT  group_name, flag_a, flag_b, eff_date, term_date
        ,       ROW_NUMBER () OVER ( PARTITION BY  group_name
                                     ORDER BY      eff_date
                                   )
              - ROW_NUMBER () OVER ( PARTITION BY  group_name, flag_a, flag_b
                                     ORDER BY      eff_date
                                   )   AS output_group
        FROM    tmp_group_test
    )
    SELECT    group_name, flag_a, flag_b
    ,         MIN (eff_date)           AS eff_date
    ,         MAX (term_date)          AS term_date
    FROM      got_output_group
    GROUP BY  group_name, flag_a, flag_b
    ,         output_group
    ORDER BY  group_name
    ,         eff_date      DESC
    ;
    

    The output I get is

    GROUP_NA F F EFF_DATE TERM_DAT
    -------- - - -------- --------
    Group_A  Y Y 20110101 99991231
    Group_A  N N 20060101 20101231
    Group_A  N Y 20040101 20051231
    Group_A  Y Y 20030101 20031231
    Group_B  N Y 20030101 99991231
    

    which is what you requested.

    user13040446
«1

Answers

  • Sergei Krasnoslobodtsev
    Sergei Krasnoslobodtsev Member Posts: 495 Silver Badge
    edited Mar 13, 2014 5:13AM

    Hi.

    I create two ranks ;
    Rnk1 : partition by group name, order by eff_date desc : this rank will sort the records from the most recent and reset for each group\
    Rnk2: (dense) partition by group name , order by flag_A, flagb : this rank gives each combination of group\flag a number so that they are categorised as “families”
    
    
    

    Try this....

    select rownum,group_name, flag_a, flag_b, eff_date, term_date, rnk1, rnk2
    ,min(eff_date) over (partition by group_name, rnk2) min_eff
    from
    (
    select rownum,
    group_name, flag_a, flag_b,eff_date, term_date
    ,rank() over (partition by group_name order by eff_date desc) rnk1
    ,dense_rank () over (partition by group_name , flag_A,flag_B order by ,eff_date desc) rnk2
    from tmp_group_test
    )order by rownum;
    
    
    ROWNUM

    GROUP_NAME

    FLAG_AFLAG_BEFF_DATERNK1RNK2MIN_EFF
    1Group_AYY201101011220090101
    2Group_ANN201001012320100101
    3Group_ANN200901013220090101
    4Group_ANN200601014120030101
    5Group_ANY200401015120030101
    6Group_AYY200301016120030101
    7Group_BNY200401011220040101
    8Group_BNY200301012120030101
  • user10412263
    user10412263 Member Posts: 98 Blue Ribbon

    Hi,

    Here is your query :

    select group_name,flag_a,flag_b,eff_date,term_date

    from tmp_group_test

    where group_name='Group_A'

    and flag_a = 'Y'

    and flag_b = 'Y'

    union

    select group_name,flag_a,flag_b,min(eff_date) eff_date,max(term_date) term_date

    from tmp_group_test

    where (flag_a,flag_b) not in (select flag_a,flag_b

                        from tmp_group_test

                        where flag_a='Y' and flag_b='Y')

    group by group_name,flag_a,flag_b

    order by group_name,eff_date desc;

    Result for this query as per your requirement:

    GROUP_NAMEFLAG_AFLAG_BEFF_DATETERM_DATE
    Group_AYY2011010199991231
    Group_ANN2006010120101231
    Group_ANY2004010120051231
    Group_AYY2003010120031231
    Group_BNY2003010199991231
  • Hi KSI,

    Thanks for your input, you were able to distinguish between those highlighted rows but lost rows 2,3,4 which are supposed to have the same min effective date = 20060101.

    Please refer to the desired chart to see the end result I would like to reach

    Thanks Again

  • Hi user10412263 ,

    Thanks for your input. However, you have hardcoded the group names and flags so your query will only work on this example.

    I would like a solution that would be compatible with any table having this structure.


    Thanks again for the contribution

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,060 Red Diamond
    edited Mar 17, 2014 1:16PM Accepted Answer

    Hi,

    user13040446 wrote:
    
    Hi KSI, 
    Thanks for your input, you were able to distinguish between those highlighted rows but lost rows 2,3,4 which are supposed to have the same min effective date = 20060101.
    Please refer to the desired chart to see the end result I would like to reach
    
    Thanks Again
    
    

    That first reply is basically correct, but in the main query, you want to use the aggregate MIN function, not the analytic function, and GROUP BY the columns with the common values, like this:

    WITH    got_output_group    AS
    (
        SELECT  group_name, flag_a, flag_b, eff_date, term_date
        ,       ROW_NUMBER () OVER ( PARTITION BY  group_name
                                     ORDER BY      eff_date
                                   )
              - ROW_NUMBER () OVER ( PARTITION BY  group_name, flag_a, flag_b
                                     ORDER BY      eff_date
                                   )   AS output_group
        FROM    tmp_group_test
    )
    SELECT    group_name, flag_a, flag_b
    ,         MIN (eff_date)           AS eff_date
    ,         MAX (term_date)          AS term_date
    FROM      got_output_group
    GROUP BY  group_name, flag_a, flag_b
    ,         output_group
    ORDER BY  group_name
    ,         eff_date      DESC
    ;
    

    The output I get is

    GROUP_NA F F EFF_DATE TERM_DAT
    -------- - - -------- --------
    Group_A  Y Y 20110101 99991231
    Group_A  N N 20060101 20101231
    Group_A  N Y 20040101 20051231
    Group_A  Y Y 20030101 20031231
    Group_B  N Y 20030101 99991231
    

    which is what you requested.

    user13040446
  • Stew Ashton
    Stew Ashton Member Posts: 2,861 Gold Trophy
    edited Mar 17, 2014 1:18PM

    This is a job for the "start of group" method:

    with grp_starts as (
      select a.*,
      case
        when eff_date = lag(term_date) over(partition by group_name, flag_a, flag_b order by eff_date) + 1
        then 0 else 1
      end grp_start
      from TMP_group_test a
    )
    , grps as (
      select group_name, flag_a, flag_b, eff_date, term_date,
      sum(grp_start) over(partition by group_name, flag_a, flag_b order by eff_date) grp
      from grp_starts
    )
    select group_name, flag_a, flag_b, min(eff_date) eff_date, max(term_date) term_date
    from grps
    group by group_name, flag_a, flag_b, grp
    order by 1,2,3,4;
    
    

    GROUP_NAMEFLAG_AFLAG_BEFF_DATETERM_DATE
    Group_ANN2006010120101231
    Group_ANY2004010120051231
    Group_AYY2003010120031231
    Group_AYY2011010199991231
    Group_BNY2003010199991231

    I explain this method in detail here > http://stewashton.wordpress.com/2014/03/16/merging-contiguous-date-ranges/

    Best regards, Stew Ashton

    Stew Ashton
  • Stew Ashton
    Stew Ashton Member Posts: 2,861 Gold Trophy

    When you have version 12c, this will be much easier:

    select * from TMP_group_test
    match_recognize(
      partition by group_name, flag_a, flag_b order by eff_date
      measures first(eff_date) eff_date, max(term_date) term_date
      pattern(A B*)
      define B as eff_date = prev(term_date) + 1
    );
    

    For an explanation of the MATCH_RECOGNIZE clause, see http://stewashton.wordpress.com/category/match_recognize/

    Stew Ashton
  • Stew Ashton
    Stew Ashton Member Posts: 2,861 Gold Trophy

    Frank, what if there is a gap between one term_date and the next eff_date? If I execute

    update tmp_group_test set term_date = term_date - 1
    where term_date = '20081231';
    

    and I run your SELECT, I get the same output.

    With my solutions, I get an extra line:

    GROUP_NAMEFLAG_AFLAG_BEFF_DATETERM_DATE
    Group_ANN2006010120081230
    Group_ANN2009010120101231
    Group_ANY2004010120051231
    Group_AYY2003010120031231
    Group_AYY2011010199991231
    Group_BNY2003010199991231

    Best regards, Stew

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,060 Red Diamond
    edited Mar 17, 2014 8:17PM

    Hi,

    Stew Ashton wrote:
    
    Frank, what if there is a gap between one term_date and the next eff_date? If I execute
    
    
    
    1. <span><span><span><span><span><span> 
    2. update tmp_group_test set term_date = term_date - 1 
    3. where term_date = '20081231';</span></span></span></span></span></span> 
     update tmp_group_test set term_date = term_date - 1 where term_date = '20081231';
    and I run your SELECT, I get the same output.
    
    
    
    

    My guess is that's what OP wants, but it's just a guess.

    With my solutions, I get an extra line: ...
    
    

    The Fixed Difference method that I used can be used to get the same results you get (if that's what's wanted), just like the Start O Group method can be used to get the same results I got.

  • Stew Ashton
    Stew Ashton Member Posts: 2,861 Gold Trophy
    Frank Kulash wrote:
    
    The Fixed Difference method that I used can be used to get the same results you get (if that's what's wanted), just like the Start O Group method can be used to get the same results I got.
    

    Could you please show how to get the same results I got with your method? I would like to adopt your method if possible because it seems more efficient.

    Thanks in advance,

    Stew

This discussion has been closed.