Forum Stats

  • 3,752,275 Users
  • 2,250,483 Discussions
  • 7,867,774 Comments

Discussions

Oracle looking for conservative dates with lead and lag

BeefStu
BeefStu Member Posts: 278 Blue Ribbon

I'm trying to make use of the lead and lag functionality that checks for 'N>1'

consecutive absent_dates but seem to be struggling. I was hoping someone can please suggest a solution 


Note I know the employees table isn't included in the query yet to obtain first_name and last_name as I am trying to keep the test case as simple as possible for now. 


Below is my test CASE and the desired output should be as follows. Thanks in advance to all that answer.

EMPLOYEE_ID ABSENT_DATE

        1            14-JUL-21 Jane Doe

        1            15-JUL-21 Jane Doe

        1            30-JUL-21 Jane Doe

        1            31-JUL-21 Jane Doe

        4            22-JUL-21 Mike Jones 

        4            23-JUL-21 Mike Jones 


Create table employees(

     employee_id NUMBER(6), 

     first_name VARCHAR2(20),

     last_name VARCHAR2(20),

     card_num VARCHAR2(10),

work_days VARCHAR2(7)

    );



     ALTER TABLE employees

             ADD ( CONSTRAINT employees_pk

           PRIMARY KEY (employee_id));


    INSERT INTO employees                   

    (

    EMPLOYEE_ID,

    first_name, 

    last_name,

    card_num,

    work_days

    )

    WITH names AS ( 

    SELECT 1, 'Jane',     'Doe','F123456', 'NYYYYYN'FROM dual UNION ALL 

    SELECT 2, 'Madison', 'Smith','R33432','NYYYYYN'

   FROM dual UNION ALL 

     SELECT 3, 'Justin',   'Case','C765341','NYYYYYN'

    FROM dual UNION ALL 

    SELECT 4, 'Mike',     'Jones','D564311','NYYYYYN' FROM dual  ) 

    SELECT * FROM names;  



     create table absences(

     seq_num integer  GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,

     employee_id NUMBER(6),

     absent_date DATE,

     constraint absence_chk check (absent_date=trunc(absent_date, 'dd')),

     constraint absence_pk primary key (employee_id, absent_date)

      );




    begin  

      insert into absences values (1,1, date'2021-07-21');  

      insert into absences values (2,4, date'2021-07-22');  

      insert into absences values (3,4, date'2021-07-23');  

      insert into absences values (4,4, date'2021-07-26');  

      insert into absences values (5,1, date'2021-07-30');  

      insert into absences values (6,1, date'2021-07-31');  

      insert into absences values (7,4, date'2021-07-13');  

      insert into absences values (8,1, date'2021-07-14');  

      insert into absences values (9,1, date'2021-07-15');  

      commit;  

    end;



       

-- problem with query 

select *

   (

  select *, lead(absent_date) over(partition by employee_id order by absent_date)  nxt, 

    lag(absent_date) over(partition by employee_id order by absent_date) prev

  from absences 

) t

where absent_date = prev + 1 or absent_date = nxt - 1;

Tagged:

Best Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,925 Red Diamond
    Accepted Answer

    Hi, @BeefStu

    I think I understand the problem now: you want to show only the absences that are part of a streak of at least two abseneces for the same employee, each within one day of the next. Is that right?

    If so, you were on the right track using LAG (absent_date) to get prev, but since prev is the date before absent_date, the condition

    where absent_date = prev + 1 

    will never be true. If you're using LAG to get the previous date, then also use LEAD to get the following date, and then choose the rows where either the previous date or the next date is within one day of absent_date.

    However, I wouldn't use LAG and LEAD. I would use MATCH_RECOGNIZE, like this:

    SELECT   a.employee_id, a.absent_date
    ,	 e.first_name, e.last_name
    FROM	 absences
    MATCH_RECOGNIZE
         (
           PARTITION BY employee_id
           ORDER BY     absent_date
           ALL ROWS PER MATCH
           PATTERN      (frst nxt +)
           DEFINE       nxt AS  absent_date <= PREV (absent_date) + 1
    	 )         a
    JOIN     employees e  ON e.employee_id = a.employee_id
    ORDER BY employee_id, absent_date
    ;
    


    User_WI23P
  • BeefStu
    BeefStu Member Posts: 278 Blue Ribbon
    Accepted Answer

    @mathguy I read Frank's reply several times and was concentrating mainly on the subqery. Thank you for explicitly for pointing out the problem with code.

    Having said that the output from the query is awkward and doesn't match the desired output I documented above.

    EMPLOYEE_ID ABSENT_DATE

            1            14-JUL-21 Jane Doe

            1            15-JUL-21 Jane Doe

            1            30-JUL-21 Jane Doe

            1            31-JUL-21 Jane Doe

            4            22-JUL-21 Mike Jones 

            4            23-JUL-21 Mike Jones 

    I am still very interested in trying to achieve the goal with the lead/lag functionality to see where I went wrong.

    Any other helpful hints on how I can get the above mentioned output using lead/lag would be greatly appreciated

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,925 Red Diamond
    Accepted Answer

    Hi, @BeefStu

    Having said that the output from the query is awkward and doesn't match the desired output I documented above.

    Point out where it doesn't match the desired output. If necessary, post new sample data (and corresponding desired results).

    Once I add the missing FROM clause, the query you posted at 13:42 EDT works fine for me. You've got the correct way of using LAG and LEAD. Here's another way of writing the same approach:

    WITH  got_prev_and_next_date  AS -- WITH is easier to read than in-line view
    (
    	SELECT  a.*
    	,	LEAD (absent_date) OVER ( PARTITION BY employee_id
    		   		  	  ORDER BY   absent_date
    					)  AS next_date -- give descriptivr name
    	,       LAG  (absent_date) OVER ( PARTITION BY employee_id
    		   		  	  ORDER BY   absent_date
    					)  AS prev_date
     	FROM 	 absences a
    )
    SELECT   employee_id, absent_date -- only include columns wanted
    FROM	 got_prev_and_next_date
    ORDER BY employee_id, absent_date -- always use ORDER BY clause 
    ;
    

    Again, this is essentially what you posted, only with some stylistic changes. I find this version easier to read and understand, and therefore easier to debug and maintain. However, I won't be maintaining your code, so use what works best for you. If you think nxt is a better name than next_date, then use nxt.

    The results are exactly the same as the results of the query you posted (once the FROM clause is added).

    Having said that the output from the query is awkward

    What do you mean by "the output from the query is awkward"?

«1

Answers

  • User_H3J7U
    User_H3J7U Member Posts: 415 Bronze Trophy

    If you get a syntax error, it should contain the line and position (*):

       *
    ERROR at line 2:
    ORA-00923: FROM keyword not found where expected
    

    Read the SQL Reference carefully and check the syntax of your query.

  • BeefStu
    BeefStu Member Posts: 278 Blue Ribbon

    User_H3J7U  I've went through the manual and tried many times to fix the issue but was unsuccessful. That's why I took the time to post the question and create a complete test case.

    Thanks for responding

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,925 Red Diamond
    edited Jul 22, 2021 4:17PM

    Hi, @BeefStu

    Thanks for posting the sample data and desired results. Don't forget to post an explanation of what you're trying to do (including the meaning of the work_days column, and how you get the desired results from the given data) and your full Oracle version.

    -- problem with query 

    select *

       (

      select *, lead(absent_date) over(partition by employee_id order by absent_date) nxt, 

        lag(absent_date) over(partition by employee_id order by absent_date) prev

      from absences 

    ) t

    where absent_date = prev + 1 or absent_date = nxt - 1;

    In any query, there must be exactly one FROM keyword for every SELECT keyword. The first SELECT above doesn't have a matching FROM. Also, if you say SELECT * (without a table name or alias), then there can't be anything else in the SELECT clause. In the sub-query, you can qualify * with the table name, like this

    SELECT  absences.*, LEAD ...
    

    or use a table alias, like this

    SELECT  a.*, LEAD ...
    FROM    absences  a
    


    BeefStu
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,925 Red Diamond
    Accepted Answer

    Hi, @BeefStu

    I think I understand the problem now: you want to show only the absences that are part of a streak of at least two abseneces for the same employee, each within one day of the next. Is that right?

    If so, you were on the right track using LAG (absent_date) to get prev, but since prev is the date before absent_date, the condition

    where absent_date = prev + 1 

    will never be true. If you're using LAG to get the previous date, then also use LEAD to get the following date, and then choose the rows where either the previous date or the next date is within one day of absent_date.

    However, I wouldn't use LAG and LEAD. I would use MATCH_RECOGNIZE, like this:

    SELECT   a.employee_id, a.absent_date
    ,	 e.first_name, e.last_name
    FROM	 absences
    MATCH_RECOGNIZE
         (
           PARTITION BY employee_id
           ORDER BY     absent_date
           ALL ROWS PER MATCH
           PATTERN      (frst nxt +)
           DEFINE       nxt AS  absent_date <= PREV (absent_date) + 1
    	 )         a
    JOIN     employees e  ON e.employee_id = a.employee_id
    ORDER BY employee_id, absent_date
    ;
    


    User_WI23P
  • BeefStu
    BeefStu Member Posts: 278 Blue Ribbon

    Frank, Your solution us elegant and worked great. But wasn't familiar with the match command.

    I also tried modifying the subqery solution based on your suggestion and got the same error. For my personal education can you provide a working solution with the subqery and lag and lead. I really want to know where I went wrong.

    As for your earlier question about the work_days column it's a 7 byte string containing Y or N that corresponds to the day of week. First byte is Mon,... if set to Y it means an employee should be working on that day. Since the absences table is the last step in the process, it doesn't apply here.

    Once again, thanks for your help and expertise and am anxious to see the subqery solution if you can easily provide one

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,925 Red Diamond

    Hi, @BeefStu

    I also tried modifying the subqery solution based on your suggestion and got the same error. ... I really want to know where I went wrong.

    I can't show you where you went wrong if I don't know where you went. Post your code. Always post your code.

    Since the absences table is the last step in the process, it doesn't apply here.

    If a column doesn't play any role in the problem, then you're just confusing the issue by including it. Post only the relevant columns. (Primary keys are an exception. If we need to talk about individual rows, then it's useful to have a short, unique way to identify each row.)

  • BeefStu
    BeefStu Member Posts: 278 Blue Ribbon

    Frank, here is the code I modified with the alias. Note a.* for the subqery

    select * 

       (

      select a.*,

     lead(absent_date) over(partition by employee_id order by absent_date)  nxt, 

        lag(absent_date) over(partition by employee_id order by absent_date) prev

      from absences a

    ) t

    where absent_date = prev + 1 or absent_date = nxt - 1;


    Whenever I get thrown on a development project uggg (I am a DBA) I always insist on adding a way to uniquely identifying a row hence the seq_num column in the create table command. I didn't think to include it in the output as the test data only has 9 rows.

    In the future I will modify columns by storing comments in the table for ambiguous columns

  • mathguy
    mathguy Member Posts: 10,009 Gold Crown

    You are still missing the FROM keyword in the main query - select * FROM ( ... ) Mr Kulash already pointed out that mistake in his first reply, and it was also pointed out in the very first reply, from USER_whatever.

    Do you understand that and you just missed it, or do you have difficulty understanding why that throws an error?

    BeefStu
  • BeefStu
    BeefStu Member Posts: 278 Blue Ribbon
    Accepted Answer

    @mathguy I read Frank's reply several times and was concentrating mainly on the subqery. Thank you for explicitly for pointing out the problem with code.

    Having said that the output from the query is awkward and doesn't match the desired output I documented above.

    EMPLOYEE_ID ABSENT_DATE

            1            14-JUL-21 Jane Doe

            1            15-JUL-21 Jane Doe

            1            30-JUL-21 Jane Doe

            1            31-JUL-21 Jane Doe

            4            22-JUL-21 Mike Jones 

            4            23-JUL-21 Mike Jones 

    I am still very interested in trying to achieve the goal with the lead/lag functionality to see where I went wrong.

    Any other helpful hints on how I can get the above mentioned output using lead/lag would be greatly appreciated

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 40,925 Red Diamond
    Accepted Answer

    Hi, @BeefStu

    Having said that the output from the query is awkward and doesn't match the desired output I documented above.

    Point out where it doesn't match the desired output. If necessary, post new sample data (and corresponding desired results).

    Once I add the missing FROM clause, the query you posted at 13:42 EDT works fine for me. You've got the correct way of using LAG and LEAD. Here's another way of writing the same approach:

    WITH  got_prev_and_next_date  AS -- WITH is easier to read than in-line view
    (
    	SELECT  a.*
    	,	LEAD (absent_date) OVER ( PARTITION BY employee_id
    		   		  	  ORDER BY   absent_date
    					)  AS next_date -- give descriptivr name
    	,       LAG  (absent_date) OVER ( PARTITION BY employee_id
    		   		  	  ORDER BY   absent_date
    					)  AS prev_date
     	FROM 	 absences a
    )
    SELECT   employee_id, absent_date -- only include columns wanted
    FROM	 got_prev_and_next_date
    ORDER BY employee_id, absent_date -- always use ORDER BY clause 
    ;
    

    Again, this is essentially what you posted, only with some stylistic changes. I find this version easier to read and understand, and therefore easier to debug and maintain. However, I won't be maintaining your code, so use what works best for you. If you think nxt is a better name than next_date, then use nxt.

    The results are exactly the same as the results of the query you posted (once the FROM clause is added).

    Having said that the output from the query is awkward

    What do you mean by "the output from the query is awkward"?