Forum Stats

  • 3,824,762 Users
  • 2,260,415 Discussions
  • 7,896,309 Comments

Discussions

adding another count to this solution given

user13328581
user13328581 Member Posts: 1,343 Silver Badge

Hello experts,

I was given the two wonderful solutions to the problem posted



I would like to add another attribute called "Non_monthend_cnt".
The additional attribute should be based on the following requirement, 
if the month falls within the date range then show it

hence, based on the following sample data below gotten from the link above


with
  sample_data (id, start_dt, end_dt) as (
    select 1001, date '2020-03-10', date '2020-12-31' from dual union all
    select 2005, date '2021-12-31', null              from dual union all
    select 3008, date '2021-04-10', date '2021-04-30' from dual union all
    select 4001, date '2021-05-12', date '2021-05-20' from dual
  )


Results expected

  ID   each_monthend_cnt    Non_monthend_cnt
----   ------------         ------------------
1001   Mar 2020                    Mar 2020    
1001   Apr 2020                    Apr 2020
1001   May 2020                    May 2020
1001   Jun 2020                    Jun 2020
1001   Jul 2020                    Jul 2020
1001   Aug 2020                    Aug 2020
1001   Sep 2020                    Sep 2020
1001   Oct 2020                    Oct 2020
1001   Nov 2020                    Nov 2020
2005   Dec 2021                    Dec 2021
2005   Jan 2022                    Jan 2022
2005   Feb 2022                    Feb 2022
2005   Mar 2022                    Mar 2022
3008                               Apr 2021
4001                               May 2021



I am still struggling to come up with a solution but I thought I should post it to get help as well.


Thanks in advance

Best Answer

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,018 Red Diamond
    Answer ✓

    Hi, @user13328581

    The query I posted three days ago does basically what (I think) you want. Instead of column names non_monthend_count and monthend_count you now want to name those columns monthed_included and monthend_overlapped, so just change the names:

    SELECT   s.id
    --,	 s.start_dt, s.end_dt	-- for debugging only
    ,	 CASE
    	   WHEN COALESCE ( s.end_dt
    	   	  	   , SYSDATE
    			   ) >= c.month_overlapped
    	   THEN month_overlapped
    	 END AS monthend_included
    ,	 c.month_overlapped
    FROM	 sample_data s
    CROSS APPLY (
       	        SELECT ADD_MONTHS ( LAST_DAY (s.start_dt) -- or TRUNC (LAST_DAY (s.start_dt))
    				  , LEVEL - 1
    				  ) AS month_overlapped
    		FROM  dual
    		CONNECT BY LEVEL <= 1 + MONTHS_BETWEEN ( COALESCE (s.end_dt, SYSDATE)
    			  	    	 		, TRUNC (s.start_dt, 'MONTH')
    							)
       	  )	    c
    ORDER BY s.id
    ,   	 c.month_overlapped
    ;
    


    The output from the query above with the new sample data is:

          MONTHEND_ MONTH_
      ID  INCLUDED  OVERLAPPED
    ----- --------- ----------
     1001 Mar 2020 Mar 2020
     1001 Apr 2020 Apr 2020
     1001 May 2020 May 2020
     1001 Jun 2020 Jun 2020
     1001 Jul 2020 Jul 2020
     1001 Aug 2020 Aug 2020
     1001 Sep 2020 Sep 2020
     1001 Oct 2020 Oct 2020
     1001 Nov 2020 Nov 2020
     1001 Dec 2020 Dec 2020
     2005 Dec 2021 Dec 2021
     2005 Jan 2022 Jan 2022
     2005 Feb 2022 Feb 2022
     2005 Mar 2022 Mar 2022
     2005 Apr 2022 Apr 2022
     2005          May 2022
     3008 Apr 2021 Apr 2021
     4001          May 2021
     5100 May 2021 May 2021
     5200 May 2021 May 2021
     5200          Jun 2021
     5300 May 2021 May 2021
     5300          Jun 2021
    

    This is what you requested except for the last rows for ids 1001 and 3008. Why do you want monthend_included to be NULL for those rows? Whatever the reason is, why doesn't the same reason apply to id-5200?

«1

Answers

  • mathguy
    mathguy Member Posts: 10,537 Blue Diamond

    You say "if the month FALLS WITHIN the date range" [then show it].

    A month is not a single day. It may "fall within" a range, but that would mean (to most English speakers) that the entire month falls within the date range. That's not the case for the examples you gave.

    Rather, it seems that you want something else. If the month OVERLAPS WITH the date range, then show it. Is that correct?

    If so - just to make sure we have a clear problem statement - if the range is 20 January 2022 to 14 February 2022, you want to show January 2022 (already by the old requirement, since the month-end falls within the date range), but ALSO show February 2022, since it overlaps with the date range?

    Or is there a different, much simpler change that you are requesting now - if the date range doesn't include any month end (which, then, means that the date range is ENTIRELY within a single month), then show that month? Alas both examples in your post show a range that falls entirely in a single month.

    To summarize, what is the desired result for the date range 10 January to 14 February? (Note that whatever your answer is, it SHOULD apply to the second row equally, assuming "null" is to be interpreted as "today".)

  • user13328581
    user13328581 Member Posts: 1,343 Silver Badge

    If so - just to make sure we have a clear problem statement - if the range is 20 January 2022 to 14 February 2022, you want to show January 2022 (already by the old requirement, since the month-end falls within the date range), but ALSO show February 2022, since it overlaps with the date range?

    So yes for the new column(Non_monthend_cnt) February 2022 will be included 
    and only January 2022 for the old column(monthend_cnt)
    

    And if it is 20-January 2022 to null(take today's date)

    So the result will be

    monthend_cnt Non_monthEnd_cnt

    Jan 2022 Jan 2022

    Feb 2022 Feb 2022

    Mar 2022 Mar 2022

    April 2022 April 2022

    May 2022


    to summarize things, the result for 10 January 2022 to 14 February 2022 will be

    Monthend_cnt Non_monthend_cnt

    Jan 2022 Jan 2022

    Feb 2022


    I hope it is clear. thanks

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,018 Red Diamond

    Hi, @user13328581

    with
      sample_data (id, start_dt, end_dt) as (
        select 1001, date '2020-03-10', date '2020-12-31' from dual union all
        select 2005, date '2021-12-31', null              from dual union all
        select 3008, date '2021-04-10', date '2021-04-30' from dual union all
        select 4001, date '2021-05-12', date '2021-05-20' from dual
      )
    


    Results expected

      ID   each_monthend_cnt    Non_monthend_cnt
    ----   ------------         ------------------
    1001   Mar 2020                    Mar 2020    
    1001   Apr 2020                    Apr 2020
    1001   May 2020                    May 2020
    1001   Jun 2020                    Jun 2020
    1001   Jul 2020                    Jul 2020
    1001   Aug 2020                    Aug 2020
    1001   Sep 2020                    Sep 2020
    1001   Oct 2020                    Oct 2020
    1001   Nov 2020                    Nov 2020
    2005   Dec 2021                    Dec 2021
    2005   Jan 2022                    Jan 2022
    2005   Feb 2022                    Feb 2022
    2005   Mar 2022                    Mar 2022
    3008                               Apr 2021
    4001                               May 2021
    
    

    For ID=1001, why don't you want any output for December, 2020?

    For ID=2005, why don't you want any output for April or May, 2022?


    So the result will be

    monthend_cnt Non_monthEnd_cnt

    Jan 2022 Jan 2022

    Feb 2022 Feb 2022

    Mar 2022 Mar 2022

    April 2022 April 2022

    May 2022


    to summarize things, the result for 10 January 2022 to 14 February 2022 will be

    Monthend_cnt Non_monthend_cnt

    Jan 2022 Jan 2022

    Feb 2022

    In your first message, each_monthend_count was sometimes NULL, but non_monthend_cnt was never NULL. Do you really want that reversed in these two cases?

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,018 Red Diamond

    Hi, @user13328581

    And if it is 20-January 2022 to null(take today's date)

    So the result will be

    monthend_cnt Non_monthEnd_cnt

    Jan 2022 Jan 2022

    Feb 2022 Feb 2022

    Mar 2022 Mar 2022

    April 2022 April 2022

    May 2022


    to summarize things, the result for 10 January 2022 to 14 February 2022 will be

    Monthend_cnt Non_monthend_cnt

    Jan 2022 Jan 2022

    Feb 2022


    Here's one way to do that:

    SELECT   s.id
    --,	 s.start_dt, s.end_dt	-- for debugging only
    ,	 c.monthend_count
    ,	 CASE
    	     WHEN COALESCE ( s.end_dt
    	   	  	   , SYSDATE
    			   ) >= c.monthend_count
    	     THEN monthend_count
    	 END AS non_monthend_count
    FROM	 sample_data s
    CROSS APPLY (
       	        SELECT ADD_MONTHS ( LAST_DAY (s.start_dt) -- or TRUNC (LAST_DAY (s.start_dt))
    				  , LEVEL - 1
    				  ) AS monthend_count
    		FROM  dual
    		CONNECT BY LEVEL  <= 1 + MONTHS_BETWEEN ( COALESCE (s.end_dt, SYSDATE)
    			  	    	 		, TRUNC (s.start_dt, 'MONTH')
    							)
       	    )	    c
    ORDER BY s.id
    ,   	 c.monthend_count
    ;
    

    This assumes that the time of start_dt is always 00:00:00. If not, add TRUNC, as shown in the comment.

    The DATEs in the result set will actually be the last day of each month, but since you're only displaying the month and year, that won't matter.

  • mathguy
    mathguy Member Posts: 10,537 Blue Diamond

    Here is how I understand the problem.

    As before, I don't need to make any assumptions about time-of-day being midnight in all cases; that is just a special case. However, if in your problem it is important that times be treated as if they were all midnight, then you need some care with SYSDATE, which in almost all cases will have non-midnight time of day component.

    If the time range ends, for example, on May 31 at midnight, that means at 00:00:00 on May 31. This range does NOT include the end of the month. The END_DT in this case is at the beginning of May 31; the entire day (except midnight at the beginning) is MISSING from the date range. So in this case, May overlaps with the date range, but the month end is NOT included. Note specifically that if in your test data (or even your real-life data) all dates are "pure dates", with time-of-day of midnight, then END_DT of May 31 means the month end is EXCLUDED. Is that what your end users require?

    If the time range ends on June 1 at 00:00:00, then the end of May is included; and there is NO OVERLAP with the month of June, so it will NOT appear in the output.

    If the time range ends on June 2, then the end of May is included, and the period overlaps with June but the end of June is not included.

    Look at the three rows I added for testing, with ID = 5100, 5200 and 5300 respectively, and what I get in the output for those three rows. Is that what you need? If not, the query can be modified easily, but you must state very clearly what we should see for each input. And, clarify if your real-life data may have non-midnight time-of-day, how that should be handled, and how SYSDATE should be handled. (Do we need to use TRUNC(SYSDATE) everywhere?)

    Anyway, based on my understanding of your problem, here is how I would change my earlier solution.

    alter session set nls_date_format = 'Mon yyyy'; -- so I don't need to do it in the query
    
    with
      sample_data (id, start_dt, end_dt) as (
        select 1001, date '2020-03-10', date '2020-12-31' from dual union all
        select 2005, date '2021-12-31', null              from dual union all
        select 3008, date '2021-04-10', date '2021-04-30' from dual union all
        select 4001, date '2021-05-12', date '2021-05-20' from dual union all
        select 5100, date '2021-05-03', date '2021-05-31' from dual union all
        select 5200, date '2021-05-03', date '2021-06-01' from dual union all
        select 5300, date '2021-05-03', date '2021-06-02' from dual
      )
    select t.id,
           case when add_months(l.month_overlapped, 1) <= nvl(t.end_dt, sysdate)
                then l.month_overlapped end as monthend_included,
           l.month_overlapped     
    from   sample_data t
           cross apply
           (
             select     add_months(trunc(start_dt, 'mm'), level - 1) as month_overlapped
             from       dual
             connect by add_months(trunc(start_dt, 'mm'), level - 1) < nvl(end_dt, sysdate)
           ) l
    ;
    
        ID  MONTHEND_INCLUDED  MONTH_OVERLAPPED  
    ------  -----------------  ------------------
      1001  Mar 2020           Mar 2020          
      1001  Apr 2020           Apr 2020          
      1001  May 2020           May 2020          
      1001  Jun 2020           Jun 2020          
      1001  Jul 2020           Jul 2020          
      1001  Aug 2020           Aug 2020          
      1001  Sep 2020           Sep 2020          
      1001  Oct 2020           Oct 2020          
      1001  Nov 2020           Nov 2020          
      1001                     Dec 2020          
      2005  Dec 2021           Dec 2021          
      2005  Jan 2022           Jan 2022          
      2005  Feb 2022           Feb 2022          
      2005  Mar 2022           Mar 2022          
      2005  Apr 2022           Apr 2022          
      2005                     May 2022          
      3008                     Apr 2021          
      4001                     May 2021          
      5100                     May 2021          
      5200  May 2021           May 2021          
      5300  May 2021           May 2021          
      5300                     Jun 2021         
    
    
  • user13328581
    user13328581 Member Posts: 1,343 Silver Badge

    Thanks @Frank Kulash for the posting. Please find below some clarification in regards to the questions below

    In your first message, each_monthend_count was sometimes NULL, but non_monthend_cnt was never NULL. Do you really want that reversed in these two cases?

    It is not a reverse, The logic is based on the fact that non_monthend_cnt falls as an overlap in the two dates range. Hence, non_monthend_cnt would never be null, because there would always be a start_dt and the end_dt if null will utilizes today's date

    For ID=1001, why don't you want any output for December, 2020?

    For ID=2005, why don't you want any output for April or May, 2022?

    After further verifying the requirement specification, it should be included. Sorry about that. Hence, it should have been

      ID   each_monthend_cnt    Non_monthend_cnt
    ----   ------------         ------------------
    1001   Mar 2020                    Mar 2020    
    1001   Apr 2020                    Apr 2020
    1001   May 2020                    May 2020
    1001   Jun 2020                    Jun 2020
    1001   Jul 2020                    Jul 2020
    1001   Aug 2020                    Aug 2020
    1001   Sep 2020                    Sep 2020
    1001   Oct 2020                    Oct 2020
    1001   Nov 2020                    Nov 2020
    1001   Dec 2020                    Dec 2020
    2005   Dec 2021                    Dec 2021
    2005   Jan 2022                    Jan 2022
    2005   Feb 2022                    Feb 2022
    2005   Mar 2022                    Mar 2022
    2005   Apr 2022                    Apr 2022
                                       May 2022
    

    This assumes that the time of start_dt is always 00:00:00. If not, add TRUNC, as shown in the comment.

    The time matters. In otherwords the last_day of a month including the time should be at MM/DD/YYYY 12:00:00 am

    Thanks @mathguy for the feedback

    Note specifically that if in your test data (or even your real-life data) all dates are "pure dates", with time-of-day of midnight, then END_DT of May 31 means the month end is EXCLUDED. Is that what your end users require?

    I just clarified and May 31 should be included and the time matters

    Look at the three rows I added for testing, with ID = 5100, 5200 and 5300 respectively, and what I get in the output for those three rows. Is that what you need? If not, the query can be modified easily, but you must state very clearly what we should see for each input. And, clarify if your real-life data may have non-midnight time-of-day, how that should be handled, and how SYSDATE should be handled. (Do we need to use TRUNC(SYSDATE) everywhere?)

    Please see the updated output below

    with
      sample_data (id, start_dt, end_dt) as (
        select 1001, date '2020-03-10', date '2020-12-31' from dual union all
        select 2005, date '2021-12-31', null              from dual union all
        select 3008, date '2021-04-10', date '2021-04-30' from dual union all
        select 4001, date '2021-05-12', date '2021-05-20' from dual union all
        select 5100, date '2021-05-03', date '2021-05-31' from dual union all
        select 5200, date '2021-05-03', date '2021-06-01' from dual union all
        select 5300, date '2021-05-03', date '2021-06-02' from dual
      )
    

    Desired output

        ID  MONTHEND_INCLUDED  MONTH_OVERLAPPED  
    ------  -----------------  ------------------
      1001  Mar 2020           Mar 2020          
      1001  Apr 2020           Apr 2020          
      1001  May 2020           May 2020          
      1001  Jun 2020           Jun 2020          
      1001  Jul 2020           Jul 2020          
      1001  Aug 2020           Aug 2020          
      1001  Sep 2020           Sep 2020          
      1001  Oct 2020           Oct 2020          
      1001  Nov 2020           Nov 2020          
      1001                     Dec 2020          
      2005  Dec 2021           Dec 2021          
      2005  Jan 2022           Jan 2022          
      2005  Feb 2022           Feb 2022          
      2005  Mar 2022           Mar 2022          
      2005  Apr 2022           Apr 2022          
      2005                     May 2022          
      3008                     Apr 2021          
      4001                     May 2021          
      5100  May 2021           May 2021          
      5200  May 2021           May 2021   
      5200                     Jun 2021    
      5300  May 2021           May 2021          
      5300                     Jun 2021         
    

    . And, clarify if your real-life data may have non-midnight time-of-day, how that should be handled, and how SYSDATE should be handled.

    The real-life data would have non-midnight time-of-day. It should be handled based on the fact, the last_day of the month check will be 12:00:00AM. Hence based on the following example below, the result will be

    with sample_data(id, start_dt, end_dt) as
    (
     select 6200,  to_date('2021-01-07 12:00:00 AM', 'YYYY/DD/MM HH:MI:SS AM'), to_date('2021-01-07 11:59:59 PM', 'YYYY/DD/MM HH:MI:SS AM') from dual union all
        select 6300, to_date('2021-03-08 12:00:01 AM', 'YYYY/DD/MM HH:MI:SS AM'),  to_date('2021-03-09 4:53:59 PM', 'YYYY/DD/MM HH:MI:SS AM') from dual
      )
        
    
    desired output
     
          ID  MONTHEND_INCLUDED  MONTH_OVERLAPPED  
    ------  -----------------  ------------------
      6200     JUL 2021            JUL 2021          
      6300     AUG 2021            AUG 2021
      6300                         SEP 2021
    

    SYSDATE should include the timestamp (HH:MI:SS AM/PM) as well,


    Thanks again and I hope i was able to clarify things.

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,018 Red Diamond
    Answer ✓

    Hi, @user13328581

    The query I posted three days ago does basically what (I think) you want. Instead of column names non_monthend_count and monthend_count you now want to name those columns monthed_included and monthend_overlapped, so just change the names:

    SELECT   s.id
    --,	 s.start_dt, s.end_dt	-- for debugging only
    ,	 CASE
    	   WHEN COALESCE ( s.end_dt
    	   	  	   , SYSDATE
    			   ) >= c.month_overlapped
    	   THEN month_overlapped
    	 END AS monthend_included
    ,	 c.month_overlapped
    FROM	 sample_data s
    CROSS APPLY (
       	        SELECT ADD_MONTHS ( LAST_DAY (s.start_dt) -- or TRUNC (LAST_DAY (s.start_dt))
    				  , LEVEL - 1
    				  ) AS month_overlapped
    		FROM  dual
    		CONNECT BY LEVEL <= 1 + MONTHS_BETWEEN ( COALESCE (s.end_dt, SYSDATE)
    			  	    	 		, TRUNC (s.start_dt, 'MONTH')
    							)
       	  )	    c
    ORDER BY s.id
    ,   	 c.month_overlapped
    ;
    


    The output from the query above with the new sample data is:

          MONTHEND_ MONTH_
      ID  INCLUDED  OVERLAPPED
    ----- --------- ----------
     1001 Mar 2020 Mar 2020
     1001 Apr 2020 Apr 2020
     1001 May 2020 May 2020
     1001 Jun 2020 Jun 2020
     1001 Jul 2020 Jul 2020
     1001 Aug 2020 Aug 2020
     1001 Sep 2020 Sep 2020
     1001 Oct 2020 Oct 2020
     1001 Nov 2020 Nov 2020
     1001 Dec 2020 Dec 2020
     2005 Dec 2021 Dec 2021
     2005 Jan 2022 Jan 2022
     2005 Feb 2022 Feb 2022
     2005 Mar 2022 Mar 2022
     2005 Apr 2022 Apr 2022
     2005          May 2022
     3008 Apr 2021 Apr 2021
     4001          May 2021
     5100 May 2021 May 2021
     5200 May 2021 May 2021
     5200          Jun 2021
     5300 May 2021 May 2021
     5300          Jun 2021
    

    This is what you requested except for the last rows for ids 1001 and 3008. Why do you want monthend_included to be NULL for those rows? Whatever the reason is, why doesn't the same reason apply to id-5200?

  • user13328581
    user13328581 Member Posts: 1,343 Silver Badge
    edited May 24, 2022 6:22PM

    @Frank Kulash

    This is what you requested except for the last rows for ids 1001 and 3008.

    Thank for the prompt response. For 1001 and 3008 the same reason should apply to id-5200 as well. For id-5200. the same reason should apply to 1001 and 3008 as well.

    i wanted null because the solution will be used as a subquery to count the total Monthend_included and count the total month_overlapped in a given year for each id. I decided to only focus on a particular sub-set of the problem for now

    please note, so far the posted result from your solution looks right based on the sample data given. I am still testing it out with a few more data set and test cases.

  • mathguy
    mathguy Member Posts: 10,537 Blue Diamond

    The real-life data would have non-midnight time-of-day. It should be handled based on the fact, the last_day of the month check will be 12:00:00AM.

    Sorry, but between this quote and the other thing you mentioned (that the dates may have non-zero time-of-day and that the time of day must be considered), this makes absolutely no sense to me.

    A period from May 3, 8:30 AM until May 31, 4:45 PM does not overlap the end of the month. The end of the month is at midnight at the END of May 31, and your period ends strictly before that (more than 7 hours before it).

    Perhaps in real life, your users told you that the period does overlap the end of the month if it overlaps 00:00:00 (12 AM) of the month-end date, but that doesn't mean it makes sense. I give up. The query can be modified very easily to implement all your clarified requirements (including this one), but I choose not to work on problems that don't make sense to me.

    Good luck with your project!

  • user13328581
    user13328581 Member Posts: 1,343 Silver Badge
    edited May 25, 2022 3:58AM

    @mathguy I was hoping I could still clarify anything left unchecked or any mistakes or anything I missed as well. However, I don't think I posted any sample data with the context May 3, 8:30 AM until May 31, 4:45 PM. Likewise, I don't know if there is any misinterpretation in relation to the word "overlap". My definition of overlap is "area or range in common" . Nevertheless thanks for all the help provided as well.