Forum Stats

  • 3,770,714 Users
  • 2,253,158 Discussions
  • 7,875,556 Comments

Discussions

Recursive CTE pulling values from another table

BeefStu
BeefStu Member Posts: 284 Blue Ribbon

I have a recursive CTE, which generates start and END dates for multiple dates, which is working as expected.


It will stop creating rows before the date crosses midnight or the count(*) of the locations table is exceeded.


With each row the CTE generates I want to associated a location_id. Note there should not be duplicate location_id before one of the stop gaps are reached.


There is no commonality between the start/end dates and the locations so a JOIN didn't seem possible to me. 


In the sample code below I tried a cross apply, which gave me the same location_id for each group generated by the recursive code so that isn't working either.


Can someone please suggest a way I can get a unique location_id from the locations table for each row.


Thanks in advance for your expertise and all who answer. My test CASE is below.


Current Result 

LOCATION_ID    BASE_DATE    START_DATE    END_DATE
1    08212021 00:00:00    08212021 23:16:00    08212021 23:21:00
1    08212021 00:00:00    08212021 23:26:00    08212021 23:31:00
1    08212021 00:00:00    08212021 23:36:00    08212021 23:41:00
…
…

Expected Result

LOCATION_ID    BASE_DATE    START_DATE    END_DATE
1    08212021 00:00:00    08212021 23:16:00    08212021 23:21:00
2    08212021 00:00:00    08212021 23:26:00    08212021 23:31:00 
3    08212021 00:00:00    08212021 23:36:00    08212021 23:41:00
…
...
ALTER SESSION SET NLS_DATE_FORMAT = 'MMDDYYYY HH24:MI:SS';

CREATE OR REPLACE TYPE nt_date IS TABLE OF DATE;
/

    CREATE OR REPLACE FUNCTION generate_dates_pipelined(
      p_from IN DATE,
      p_to   IN DATE
    )
    RETURN nt_date  PIPELINED   DETERMINISTIC
    IS
      v_start DATE := TRUNC(LEAST(p_from, p_to));
      v_end   DATE := TRUNC(GREATEST(p_from, p_to));
    BEGIN
     LOOP
        PIPE ROW (v_start);
        EXIT WHEN v_start >= v_end;
        v_start := v_start + INTERVAL '1' DAY;
      END LOOP;
      RETURN;
    END       generate_dates_pipelined;
/

CREATE TABLE locations AS
    SELECT level AS location_id,
       'Door ' || level AS location_name,

    CASE round(dbms_random.value(1,3)) 
            WHEN 1 THEN 'A' 
            WHEN 2 THEN 'T' 
            WHEN 3 THEN 'T' 
         END AS location_type

    FROM   dual
    CONNECT BY level <= 20;

WITH input  (base_date,start_time) AS (
SELECT 
COLUMN_VALUE,
COLUMN_VALUE+
NUMTODSINTERVAL(83760, 'SECOND')
  FROM   TABLE(generate_dates_pipelined(DATE '2021-08-21',DATE '2021-08-30')) 
)
SELECT
     location_id,
     base_date, start_time + (LEVEL-1) * INTERVAL '10' MINUTE
         AS start_date,
       start_time + (LEVEL-1) * INTERVAL '10' MINUTE + INTERVAL '5' MINUTE
         AS end_date 
FROM   input i
CROSS APPLY  (SELECT location_id from locations)
CONNECT BY (LEVEL-1) * INTERVAL '10' MINUTE < INTERVAL '1' DAY
AND    LEVEL <= (SELECT COUNT(*) FROM locations)
AND    start_time + (LEVEL-1) * INTERVAL '10' MINUTE < TRUNC(start_time) + INTERVAL '1' DAY;
Tagged:

Best Answer

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,234 Red Diamond
    Accepted Answer

    Hi, @BeefStu

    Let me make sure I understand the problem. You want to associate every row in the locations table with a start_date and an end_date. The first start_date is a given base_date (midnight on August 21, 2021 in this example) plus a given offset (83760 seconds in this case). Each subsequent start_date will be a given interval (10 minutes in this case) later. End_date is always a given interval (5 minutes in this case) after start_date. Is that right?

    If so, I don't think the pipelined function will help in this problem. The function is just giving you calendar days starting from base_date. Once you have that day, you need to use something (like LEVEL in your code, or ROWNUM below) to generate separate start_dates. It's simpler just to do both of those things together, and generate the start_date directly from base_bate, without using the pipelined function.

    Here's one way to do that:

    WITH  params  AS
    (
    	SELECT TO_DATE ( '2021-08-21 00:00:00'
    			, 'YYYY-MM-DD HH24:MI:SS'
    			)			 AS base_date
    	,	INTERVAL '83760' SECOND	 	 AS offset
    	,	INTERVAL '10' MINUTE  		 AS incr
    	,	INTERVAL '5' MINUTE		 AS duration
    	FROM	dual
    )
    SELECT   l.location_id
    ,	  p.base_date
    ,	  p.base_date + offset
    	  		+ (incr * (ROWNUM - 1)) AS start_date
    ,	  p.base_date + offset
    	  		+ (incr * (ROWNUM - 1))
    			+ p.duration		 AS end_date
    FROM	  locations l
    CROSS JOIN params   p
    ORDER BY  start_date
    ;
    

    If you really had some reason for using a pipelined function, why not modify the existing function to take an additional argument, the increment between one value returned and the next. Instead returning DATEs that are always 1 day apart, return DATEs where the difference between values is that new argument.

Answers

  • wtolentino
    wtolentino Member Posts: 18 Blue Ribbon

    is your locations table has a unique identifier? if not maybe you can try to add distinct.

    SELECT distinct location_id from locations
    


  • BeefStu
    BeefStu Member Posts: 284 Blue Ribbon

    @wtolentino all the VALUES in the locations table are unique. The primary KEY is location_id

  • User_H3J7U
    User_H3J7U Member Posts: 696 Silver Trophy

    SQL language reference describes the execution order of hierarchy query.

  • User_H3J7U
    User_H3J7U Member Posts: 696 Silver Trophy

    Why use a connect by if there is a generator function?

  • BeefStu
    BeefStu Member Posts: 284 Blue Ribbon

    @User_H3J7U what difference would a generator function make and how will it solve my issue? Please elaborate. Thanks

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,234 Red Diamond
    Accepted Answer

    Hi, @BeefStu

    Let me make sure I understand the problem. You want to associate every row in the locations table with a start_date and an end_date. The first start_date is a given base_date (midnight on August 21, 2021 in this example) plus a given offset (83760 seconds in this case). Each subsequent start_date will be a given interval (10 minutes in this case) later. End_date is always a given interval (5 minutes in this case) after start_date. Is that right?

    If so, I don't think the pipelined function will help in this problem. The function is just giving you calendar days starting from base_date. Once you have that day, you need to use something (like LEVEL in your code, or ROWNUM below) to generate separate start_dates. It's simpler just to do both of those things together, and generate the start_date directly from base_bate, without using the pipelined function.

    Here's one way to do that:

    WITH  params  AS
    (
    	SELECT TO_DATE ( '2021-08-21 00:00:00'
    			, 'YYYY-MM-DD HH24:MI:SS'
    			)			 AS base_date
    	,	INTERVAL '83760' SECOND	 	 AS offset
    	,	INTERVAL '10' MINUTE  		 AS incr
    	,	INTERVAL '5' MINUTE		 AS duration
    	FROM	dual
    )
    SELECT   l.location_id
    ,	  p.base_date
    ,	  p.base_date + offset
    	  		+ (incr * (ROWNUM - 1)) AS start_date
    ,	  p.base_date + offset
    	  		+ (incr * (ROWNUM - 1))
    			+ p.duration		 AS end_date
    FROM	  locations l
    CROSS JOIN params   p
    ORDER BY  start_date
    ;
    

    If you really had some reason for using a pipelined function, why not modify the existing function to take an additional argument, the increment between one value returned and the next. Instead returning DATEs that are always 1 day apart, return DATEs where the difference between values is that new argument.

  • BeefStu
    BeefStu Member Posts: 284 Blue Ribbon

    @Frank Kulash I'm not tied to the PIPELINED function. The only reason I was using it was to create a bunch of dates at a time. This way there doesn't have to be a schedule job every day to copy the rows.


    My intention with the base_date was to use it as part of a PRIMARY key ie schedule_id, location_id, base_date. Once that was in place I was going to REPLACE the INSERT with a MERGE to prevent a unique key violation if the schedule creation process was run more than once for the same day. 


    The seconds are used to establish the initial start time of the schedule and the end date is 5 minutes from the start date. There is a 5 minute gap and the next start date is created… until we dont cross midnight or we run out of distinct location_id. My intention is to allow the number of minutes between start and END DATE to be flexible along with the gap. For now I'm just creating templates for users and they can modify as they see fit. For ease I chose 5 minutes.


    Think of a guard walking around a building. They need to get to a location_id by a certain time, if they don't they are either early or late. That was the code you found my data bug in because I was creating those records manually.


    I'm anxious to test your solution but it will probably be over the weekend. As always thanks for responding and your expertise.

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,234 Red Diamond
    edited Aug 19, 2021 9:28PM

    Hi, @BeefStu

    My intention with the base_date was to use it as part of a PRIMARY key ie schedule_id, location_id, base_date. 

    I recommend using a surrogate key, such as the number returned by a sequence, as the primary key. You can still have a unique constraint on the combination (schedule_id, location_id, base_date). Updating primary keys causes problems, so you don't want to have the primary key contain any real information that you might ever need to update.

    until we dont cross midnight or we run out of distinct location_id.

    I don't understand this part. In the example you posted, you never cross midnight, but you keep going anyway. Perhaps I don't understand what you mean by "cross midnight". Post an example where this happens. To reduce the amount of stuff you need to post, you might want to change the increment between rows from 10 minutes to 60 or 90 minutes.

  • BeefStu
    BeefStu Member Posts: 284 Blue Ribbon

    @Frank Kulash You are correct that is a bug. I changed the gap INTERVAL to 60 minutes and it kept going. (See below for an example of my version).


    I was thinking ahead about how I could copy a schedule from one date to another. What would the source be max(start_date) or max(end_date)? If that crossed midnight that would only give me a partial schedule.


    If I used a seq_num number that would mean copying 1 row at a time? 


    Do I make a user use the interface every day to manually add records. I suspect that would not go over well. So I was thinking if rows didn't cross midnight I can do something like this. See below again.

     

    i'll spare you the details but I'm not a happy camper with all these little projects becoming an entire application.


    WITH input  (base_date,start_time) AS (
    SELECT 
    COLUMN_VALUE,
    COLUMN_VALUE+
    NUMTODSINTERVAL(83760, 'SECOND')
      FROM   TABLE(generate_dates_pipelined(DATE '2021-08-21',DATE '2021-08-30')) 
    )
    SELECT
         location_id,
         base_date, start_time + (LEVEL-1) * INTERVAL '10' MINUTE
             AS start_date,
           start_time + (LEVEL-1) * INTERVAL '10' MINUTE + INTERVAL '60' MINUTE
             AS end_date 
    FROM   input i
    CROSS APPLY  (SELECT location_id from locations)
    CONNECT BY (LEVEL-1) * INTERVAL '10' MINUTE < INTERVAL '1' DAY
    AND    LEVEL <= (SELECT COUNT(*) FROM locations)
    AND    start_time + (LEVEL-1) * INTERVAL '10' MINUTE < TRUNC(start_time) + INTERVAL '1' DAY;
    
    -- example of copyright rows to another day
    
    
    INSERT into schedule
     ( 
          schedule_id,
           location_id,
           start_date,
           end_date
    )
    with rws as (
      select s.*,
             max (start_date ) over () mx_dt
      from   schedule s
    )
    SELECT * from
    (
      select schedule_id,
                 location_id,
                 start_date + 1,
                 end_date +1
      from   rws
      where 
        start_date > trunc ( mx_dt )
    );
    
    
    


  • BeefStu
    BeefStu Member Posts: 284 Blue Ribbon

    @Frank Kulash ignore the part about copying the schedule I can do that by a schedule_id instead of entire batch at once instead of using the max() function. So crossing midnight shouldn't be a limitation now.

    I'll be testing your code suggestion this afternoon