Forum Stats

  • 3,728,431 Users
  • 2,245,616 Discussions
  • 7,853,524 Comments

Discussions

How to generate rows depend on column data

ronald_2017
ronald_2017 Member Posts: 474 Blue Ribbon
edited April 13 in SQL & PL/SQL

Hi All,


I want to populate the gaps between min date and max date of c_date for each person. The following query can do it.However, it generates the whole data set. I want to generate rows between missing date gap rows only.

Version: 19C


with

t1 as

  (

    SELECT 'Chris' c_name, to_date('022020', 'MMYYYY') c_date FROM dual

    union all

    SELECT 'Chris', to_date('032020', 'MMYYYY') FROM dual

    union all

    SELECT 'Chris', to_date('062020', 'MMYYYY') FROM dual

    union all

    SELECT 'Chris', to_date('072020', 'MMYYYY') FROM dual

    union all

    SELECT 'Chris', to_date('082020', 'MMYYYY') FROM dual

    union all

    SELECT 'Lisa', to_date('052020', 'MMYYYY') FROM dual

    union all

    SELECT 'Lisa', to_date('062020', 'MMYYYY') FROM dual

    union all

    SELECT 'Lisa', to_date('102020', 'MMYYYY') FROM dual

    union all

    SELECT 'Lisa', to_date('112020', 'MMYYYY') FROM dual

    union all

    SELECT 'Lisa', to_date('122020', 'MMYYYY') FROM dual

  ),

t2 as (

  SELECT add_months(to_date('01/01/2020', 'dd/mm/yyyy'), level-1) DT FROM dual connect by level <= 12

)   

SELECT * FROM t2 left join t1 partition by (t1.c_name) on t2.dt = t1.c_date;





Intended output:


C_NAME   DT     C_DATE   

------ ---------- ----------     

Chris  01/02/2020   01/02/2020

Chris  01/03/2020   01/03/2020

Chris  01/04/2020      

Chris  01/05/2020      

Chris  01/06/2020   01/06/2020

Chris  01/07/2020   01/07/2020

Chris  01/08/2020   01/08/2020     

Lisa   01/05/2020   01/05/2020

Lisa   01/06/2020   01/06/2020

Lisa   01/07/2020      

Lisa   01/08/2020      

Lisa   01/09/2020      

Lisa   01/10/2020   01/10/2020

Lisa   01/11/2020   01/11/2020

Lisa   01/12/2020   01/12/2020


Thanks in advance.

Best Answer

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,319 Red Diamond
    edited April 13 Accepted Answer

    Hi,

    Thanks for posting the sample data, the desired results, and your code; that's very helpful!

    One way is to generate the results with the unwanted months, like you're already doing, and then discard the ones you don't want. Here's one way to do that:

    WITH  all_months  AS
    (
    	SELECT    t2.*
    	,	  t1.*
    	,	  MIN (t1.c_date) OVER (PARTITION BY t1.c_name)	AS min_c_date
    	,	  MAX (t1.c_date) OVER (PARTITION BY t1.c_name) AS max_c_date
    	FROM 	  t2
    	LEFT JOIN t1 PARTITION BY (t1.c_name) ON t2.dt = t1.c_date
    )
    SELECT  c_name, dt, c_date
    FROM	 all_months
    WHERE	 dt BETWEEN min_c_date
    	    AND     max_c_date
    ORDER BY c_name, dt  -- or whatever you want
    ;
    

    EDIT: The query above gets the desired results from the sample data, but may not get the right results in other cases, depending on the requirements. See the next reply for a different solution.

    ronald_2017

Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,319 Red Diamond
    edited April 13 Accepted Answer

    Hi,

    Thanks for posting the sample data, the desired results, and your code; that's very helpful!

    One way is to generate the results with the unwanted months, like you're already doing, and then discard the ones you don't want. Here's one way to do that:

    WITH  all_months  AS
    (
    	SELECT    t2.*
    	,	  t1.*
    	,	  MIN (t1.c_date) OVER (PARTITION BY t1.c_name)	AS min_c_date
    	,	  MAX (t1.c_date) OVER (PARTITION BY t1.c_name) AS max_c_date
    	FROM 	  t2
    	LEFT JOIN t1 PARTITION BY (t1.c_name) ON t2.dt = t1.c_date
    )
    SELECT  c_name, dt, c_date
    FROM	 all_months
    WHERE	 dt BETWEEN min_c_date
    	    AND     max_c_date
    ORDER BY c_name, dt  -- or whatever you want
    ;
    

    EDIT: The query above gets the desired results from the sample data, but may not get the right results in other cases, depending on the requirements. See the next reply for a different solution.

    ronald_2017
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,319 Red Diamond

    Hi, @ronald_2017

    What results would you want for these people?

      SELECT 'Vicky', to_date('112019', 'MMYYYY') FROM dual
      union all
      SELECT 'Vicky', to_date('032020', 'MMYYYY') FROM dual
      union all
      SELECT 'Vicky', to_date('032020', 'MMYYYY') FROM dual
      union all
      SELECT 'Vicky', to_date('012021', 'MMYYYY') FROM dual
      union all
      SELECT 'Walter', to_date('122019', 'MMYYYY') FROM dual  	  
      union all
      SELECT 'Walter', to_date('012021', 'MMYYYY') FROM dual  	  
    

    Would you want one row of output for each month for both names, since both names have data in the table before the first month of interest and after the last month of interest? If so, then you need to find the min_c_date aand max_c_date before doing the outer join, not after it (which is what the previous solution did). Here's one way:

    WITH  got_c_date_range  AS
    (
    	SELECT  c_name, c_date
    	,	 MIN (c_date) OVER (PARTITION BY c_name) AS min_c_date
    	,	 MAX (c_date) OVER (PARTITION BY c_name) AS max_c_date
    	FROM	 t1
    )
    SELECT  r.c_name
    ,	 t2.dt
    ,	 r.c_date
    FROM	 t2
    LEFT JOIN got_c_date_range r PARTITION BY (r.c_name, r.min_c_date, r.max_c_date)
       	   		    ON t2.dt = r.c_date
    WHERE	 t2.dt BETWEEN r.min_c_date
    	   	 AND   r.max_c_date
    ORDER BY r.c_name, t2.dt  -- or whatever you want
    ;
    

    Note that c_min_date and c_max_date are included in the outer-join PARTITION BY clause, so those values can be used in the WHERE clause.

    ronald_2017
  • user13328581
    user13328581 Member Posts: 1,300 Bronze Badge
    "LEFT JOIN got_c_date_range r PARTITION BY (r.c_name, r.min_c_date, r.max_c_date)
    

    Is the purpose of the partition by, is to allow you use columns that are part of the tables in the join. thanks

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,319 Red Diamond
    edited April 13

    Hi,

    Is the purpose of the partition by, is to allow you use columns that are part of the tables in the join. 

    The purpose of including min_c_date and max_c_date in the PARTITION BY clause is to allow you to use their values in places OTHER THAN the join: for example, in the WHERE clause.

    If the combination (c_name, req_date) is unique (as it is in your sample data and, I assume, also in your real table) then

    FROM	  t2
    LEFT JOIN got_c_date_range r PARTITION BY (r.c_name, r.min_c_date, r.max_c_date)
       	   		     ON t2.dt = r.c_date
    

    joins exactly the same rows as

    FROM	  t2
    LEFT JOIN got_c_date_range r PARTITION BY (r.c_name)
       	   		     ON t2.dt = r.c_date
    

    The number of rows, and where they came from, is the same. The difference is that, when a row in t2 does not match anything in r, then the first way populates the columns c_name, min_c_date and max_c_date with the values found in r. The second way only populates c_name, and leaves min_c_date and max_c_date NULL.


    EDIT: Above, I said: "If the combination (c_name, req_date) is unique ...". Actually, it doesn't matter if that combination is unique or not. The way min_c_date and max_c_date are computed guarantees that all rows with the same c_name have the same min_c_date and max_c_date, so that

    SELECT c_name, c_date FROM (SELECT DISTINCT c_name, c_date FROM t1)

    produces exactly the same results as

    SELECT c_name, c_date FROM (SELECT DISTINCT c_name, c_date, min_c_date, max_c_date FROM t1)

    which is what's important for the outer-join.

  • user13328581
    user13328581 Member Posts: 1,300 Bronze Badge

    The number of rows, and where they came from, is the same. The difference is that, when a row in t2 does not match anything in r, then the first way populates the columns c_name, min_c_date and max_c_date with the values found in r. The second way only populates c_name, and leaves min_c_date and max_c_date NULL.


    thanks, if i understand correctly, instead of having nulls listed for those columns from that table, then the utilize of partition by in the join allows you to associate the nulls with the listed columns in the "partition by (join)"

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,319 Red Diamond

    Hi, @user13328581

    thanks, if i understand correctly, instead of having nulls listed for those columns from that table, then the utilize of partition by in the join allows you to associate the nulls with the listed columns in the "partition by (join)"

    Right (if I understand you correctly). When you say

    FROM      t2
    LEFT JOIN r ON ...
    

    (without PARTITION BY) then, if a row in t2 has no match, it appears in the result set with all columns that should have come from r as NULL. But if you say

    FROM      t2
    LEFT JOIN r PARTITION BY r.col1, r.col2) 
                ON ...
    

    (with a PARTITION BY clause) then, if a row in t2 has no match, then, in the result set, not all of the columns that should have come from r will be NULL. The columns in the PARTITION BY clause will be populated with the values it was looking for,

    user13328581
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,319 Red Diamond

    Hi, @user13328581

    In the first reply above, I said one way to solve this problem was to generate all the rows and then discard the ones you don't want. I was wondering if there was a solution (possibly faster) that only generate the exact rows needed. Here's one, but I don't know if it will be any faster. It's certainly more complicated; at least to me. But for what it's worth, here's a completely different approach that doesn't use an outer join:

    WITH  got_next_c_date  AS
    (
    	SELECT  c_name, c_date
    	,	LEAD (c_date, 1, c_date + 1)
    		   OVER ( PARTITION BY c_name
    			  ORDER BY	 c_date
    			) AS next_c_date
    	FROM	t1
    )
    SELECT   n.c_name
    ,	 c.dt
    ,	 CASE
    	   WHEN c.dt = n.c_date
    	   THEN n.c_date
    	 END AS c_date
    FROM	 got_next_c_date n
    CROSS APPLY (
    		SELECT t2.dt
    		FROM	t2
    		WHERE	t2.dt BETWEEN n.c_date
    			      AND     n.next_c_date - 1
       	  )		  c
    ORDER BY c_name, dt
    ;
    


    user13328581
  • ronald_2017
    ronald_2017 Member Posts: 474 Blue Ribbon


    It should come every month for each user between min date and max date. So, your first response get the result.

    Thanks

  • Rocky
    Rocky Member Posts: 186 Blue Ribbon
    edited April 20

    Hi User,

    I have also tried it with partition outer join that firstly generating 12 rows then discard those which do not need in resultset. Also, i just forgot about the partition outer joins so thanks for it. Below is the link which i read few years back for the same.


    http://oracle-developer.net/display.php?id=312


    with

    t1 as

     (

      SELECT 'Chris' c_name, to_date('022020', 'MMYYYY') c_date FROM dual

      union all

      SELECT 'Chris', to_date('032020', 'MMYYYY') FROM dual

      union all

      SELECT 'Chris', to_date('062020', 'MMYYYY') FROM dual

      union all

      SELECT 'Chris', to_date('072020', 'MMYYYY') FROM dual

      union all

      SELECT 'Chris', to_date('082020', 'MMYYYY') FROM dual

      union all

      SELECT 'Lisa', to_date('052020', 'MMYYYY') FROM dual

      union all

      SELECT 'Lisa', to_date('062020', 'MMYYYY') FROM dual

      union all

      SELECT 'Lisa', to_date('102020', 'MMYYYY') FROM dual

      union all

      SELECT 'Lisa', to_date('112020', 'MMYYYY') FROM dual

      union all

      SELECT 'Lisa', to_date('122020', 'MMYYYY') FROM dual

     ),

    t2 as (

     SELECT add_months(to_date('01/01/2020', 'dd/mm/yyyy'), level-1) DT FROM dual connect by level <= 12

    )  

    select dt, c_name from (

    select dt, c_name, 

          (case when min(c_date) over (partition by c_name ) <= dt and dt <= max(c_date) over (partition by c_name ) then 'T' else 'F' end) flg

      from (

    SELECT t2.dt, t1.*--,

    --    (case when min_dt <= t2.DT and t2.DT <= mx_dt then 'Y' else 'F' end) flg, t2.DT

     FROM t2 

    left join 

       (select c_name, c_date 

          from t1

         ) t1 partition by (t1.c_name) on t2.dt = t1.c_date 

          

     )      

     ) where flg = 'T' order by c_name, dt;

Sign In or Register to comment.