Forum Stats

  • 3,767,857 Users
  • 2,252,726 Discussions
  • 7,874,366 Comments

Discussions

Having an indicator for few conditions

User_QDHXF
User_QDHXF Member Posts: 20 Green Ribbon

Hello,

Create Table T1_Temp(P_ID number,PC_ID number ,PC_ST_DT Date,PC_END_DT Date);

insert into T1_TEMP values (100011,738968,'07/01/1982','06/30/1983');

insert into T1_TEMP values (100011,738969,'07/01/1983','06/30/1984');

insert into T1_TEMP values (100011,738970,'07/01/1984','06/30/1985');

insert into T1_TEMP values (100011,738971,'07/01/1985','06/30/1986');

insert into T1_TEMP values (100011,738972,'07/01/1986','06/30/1987');

insert into T1_TEMP values (100011,738973,'07/01/1987','11/19/1987');

insert into T1_TEMP values (100012,756889,'10/01/2001','09/30/2002');

insert into T1_TEMP values (100012,756890,'10/01/2002','09/30/2003');

insert into T1_TEMP values (100001,1348946,'07/01/2012','06/30/2013');

insert into T1_TEMP values (100001,1419492,'07/01/2013','06/30/2014');

insert into T1_TEMP values (100001,1490920,'07/01/2014','06/30/2015');

insert into T1_TEMP values (100001,1565455,'07/01/2015','06/30/2016');

insert into T1_TEMP values (100001,1636771,'07/01/2016','06/30/2017');

insert into T1_TEMP values (100001,1786664,'07/01/2017','06/30/2018');

insert into T1_TEMP values (100001,1915177,'07/01/2018','06/30/2019');

insert into T1_TEMP values (100001,2008752,'07/01/2019','06/30/2020');

insert into T1_TEMP values (100001,2126338,'07/01/2020','06/30/2021');

insert into T1_TEMP values (100001,2213731,'07/01/2021','06/30/2022');

insert into T1_TEMP values (100001,2294051,'07/01/2022','06/30/2023');


I am looking to have an indicator = Y for the following conditions:

  1. The last 5 records of each P_ID, with the most recent PC_END_DT <= SYSDATE
  2. When the most recent PC_END_DT has ended beyond 5 years from Sysdate, then the last 5 records of P_ID must be flagged as Y, irrespective of the relation with Sysdate, as the most recent PC_END_DT is back more than 5 years from Sysdate.
  3. When there are less than 5 records, all the records must be flagged as Y, irrespective of the relation with Sysdate.

Currently I have the query as

select p_id,pc_id,pc_st_dt,pc_end_dt,

case when extract(year from sysdate) - extract(year from pc_end_dt)<=4 

and

extract(year from sysdate) - extract(year from pc_end_dt)>=0 THEN 'Y' ELSE 'N' END Ind_Flag 

from t1_temp;


This gives me right result for P_ID = 100001. It gives me Y for all the 5 records from PC_END_DT = 2017 - 2021.

I need help in getting the indicator Y for P_ID = 100011, where it's most recent PC_END_DT has ended beyond the 5 years from Sysdate and also for P_ID = 100012, where there are just 2 records and it's PC_END_DT has also ended beyond the 5 years from Sysdate.


Thank you

Tagged:

Best Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,202 Red Diamond
    edited Sep 28, 2021 5:29PM Accepted Answer

    Hi, @User_QDHXF

    Hope this clarifies the questions.

    No, it doesn't answer the questions about what "last" means, or how to deal with ties.

    Assuming that "last" means "in order by pc_end_dt" and, in case of ties, all rows that have a claim to being in the last 5 are counted as being in the last 5, then you can do something like this:

    WITH  got_analytics  AS
    (
      SELECT  p_id, pc_id, pc_st_dt, pc_end_dt
      ,	  RANK () OVER ( PARTITION BY p_id
                            ,             CASE
    				          WHEN pc_end_dt <= SYSDATE
    			  	          THEN 'PAST/PRESENT'
    				          ELSE 'FUTURE'
    			              END
    		         ORDER BY     pc_end_dt DESC
    		       )                          AS rn
      ,	  COUNT (*)   OVER ( PARTITION BY p_id )  AS cnt
      FROM  t1_temp
    )
    SELECT   p_id, pc_id, pc_st_dt, pc_end_dt
    ,	 CASE
    	   WHEN pc_end_dt <= SYSDATE
    	   AND  rn <= 5   THEN 'Y'
    	   WHEN cnt <= 5  THEN 'Y'
    	   	  	  ELSE 'N'
    	 END  AS flag
    FROM	 got_analytics
    ORDER BY p_id, pc_end_dt  -- or whatever you want
    ;
    

    The sub-query isn't strictly necessary, but I find it makes debugging and maintenance easier.

  • mathguy
    mathguy Member Posts: 10,144 Blue Diamond
    Accepted Answer

    You can do this in one pass, using the row_number() analytic function with proper ordering of the rows:

    select t.*,
           case when pc_end_dt <= sysdate
                and  row_number() over (partition by p_id
                          order by case when pc_end_dt <= sysdate then pc_end_dt end
                                desc nulls last) <= 5
                then 'Y' else 'N' end as flag
    from   t1_temp t
    order  by p_id, pc_end_dt   --  if needed
    ;
    

Answers

  • mathguy
    mathguy Member Posts: 10,144 Blue Diamond

    Missing from your question:

    1. Your database version (such as 12.2.0.1). Don't guess - run select banner from v$version;
    2. The output you desire from the data you provided.


  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,202 Red Diamond

    hI, @User_QDHXF

    Also, don't try to insert string values, such as '07/01/1982' into DATE columns, such as pc_st_dt. Use the TO_DATE function, or DATE literals, instead.

    Remember that there is no built-in order to rows in a table, so when you talk about the "last 5 rows", say what "last" means (e.g., "in order by pc_end_dt") and explain how you want to handle ties. (If ties are impossible in your application, just say so.)

  • User_QDHXF
    User_QDHXF Member Posts: 20 Green Ribbon
    edited Sep 28, 2021 4:55PM

    @mathguy , @Frank Kulash

    DB Version - Oracle Database 19c Enterprise Edition Release 19.0.0.0.0

    Expected Result:

    For P_ID = 100011, IND_FLAG should be Y for the most recent 5 PC_END_DT records, irrespective of the Sysdate, as it ended way beyond the 5 years from Sysdate.

    For P_ID = 100012, IND_FLAG should be Y for all the records, as it has less than 5 records and it's most recent PC_END_DT ended way beyond 5 years from Sysdate.

    For P_ID = 100001, IND_FLAG should be Y from PC_END_DT = 2016 to 2021, as it qualifies for the condition to flag most recent 5 records where PC_END_DT <= SYSDATE. Any records that have PC_END_DT greater than Sysdate remains N.

    Hope this clarifies the questions.


    Thanks.

  • mathguy
    mathguy Member Posts: 10,144 Blue Diamond

    I don't quite understand the distinction between the first two conditions (in your original post). What is different when the oldest row is more than five years old, vs. when it's more recent? Don't you always need to return the five most recent rows that are <= SYSDATE?

    I assume your last statement, that "any records that are in the future must be N", applies in all cases - or are there exceptions?

    For example, what should be returned in the following case: a new P_ID, call it 222222, has ten rows all before year 2010; two more rows in 2019 and 2020; and five more rows in 2022 or later. Total 17 rows. Of these, which rows should be returned, and why? By the application of which rules, and when two rules contradict each other, which takes precedence - meaning, which rule has higher priority?

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,202 Red Diamond
    edited Sep 28, 2021 5:29PM Accepted Answer

    Hi, @User_QDHXF

    Hope this clarifies the questions.

    No, it doesn't answer the questions about what "last" means, or how to deal with ties.

    Assuming that "last" means "in order by pc_end_dt" and, in case of ties, all rows that have a claim to being in the last 5 are counted as being in the last 5, then you can do something like this:

    WITH  got_analytics  AS
    (
      SELECT  p_id, pc_id, pc_st_dt, pc_end_dt
      ,	  RANK () OVER ( PARTITION BY p_id
                            ,             CASE
    				          WHEN pc_end_dt <= SYSDATE
    			  	          THEN 'PAST/PRESENT'
    				          ELSE 'FUTURE'
    			              END
    		         ORDER BY     pc_end_dt DESC
    		       )                          AS rn
      ,	  COUNT (*)   OVER ( PARTITION BY p_id )  AS cnt
      FROM  t1_temp
    )
    SELECT   p_id, pc_id, pc_st_dt, pc_end_dt
    ,	 CASE
    	   WHEN pc_end_dt <= SYSDATE
    	   AND  rn <= 5   THEN 'Y'
    	   WHEN cnt <= 5  THEN 'Y'
    	   	  	  ELSE 'N'
    	 END  AS flag
    FROM	 got_analytics
    ORDER BY p_id, pc_end_dt  -- or whatever you want
    ;
    

    The sub-query isn't strictly necessary, but I find it makes debugging and maintenance easier.

  • User_QDHXF
    User_QDHXF Member Posts: 20 Green Ribbon

    @mathguy You are right. I complicated the question.

    I must be able to show Y for five most recent rows that are <= SYSDATE for each P_ID. This statement stands correct for all scenarios.

    For your question - For example, what should be returned in the following case: a new P_ID, call it 222222, has ten rows all before year 2010; two more rows in 2019 and 2020; and five more rows in 2022 or later. Total 17 rows. Of these, which rows should be returned, and why? 

    For P_ID = 222222, records with P_END_DT = 2020,2019,2010,2009 and 2008 must be Y, while rest must be N.

    Reason: Clearly after 2020, the next row is in 2022, which greater than Sysdate. So all the 5 records from 2022 and later will be marked N along with the records that are earlier than 2008.

  • mathguy
    mathguy Member Posts: 10,144 Blue Diamond
    Accepted Answer

    You can do this in one pass, using the row_number() analytic function with proper ordering of the rows:

    select t.*,
           case when pc_end_dt <= sysdate
                and  row_number() over (partition by p_id
                          order by case when pc_end_dt <= sysdate then pc_end_dt end
                                desc nulls last) <= 5
                then 'Y' else 'N' end as flag
    from   t1_temp t
    order  by p_id, pc_end_dt   --  if needed
    ;