Forum Stats

  • 3,769,359 Users
  • 2,252,957 Discussions
  • 7,875,002 Comments

Discussions

Comparing schedule to actual time

BeefStu
BeefStu Member Posts: 284 Blue Ribbon

I have some code where I am trying to compare actual time(time a person swipes their access card) to a schedule, in order to determine if the person was early, late, on time or never reached that location. 


The schedule  consists of a bunch of start and end dates, one for each location. 


That schedule is then assigned to an employee. 


In my test CASE below I have one employee (employee_id=1) and one schedule (schedule_id=22).


For the schedule I have 3 location rows (1,2,3) and for the access_history(time employee swipe their card) I have 4 rows for Aug 23 (with the following times 10:57:20,11:01:53, 11:11:13 and 11:15:13)


As you can see there isn't a 1-1 correlation between the schedule and the history as a person's card could be swiped more than once at a location. Every time its swiped an access_history row is added. 


 Most of the output below is correct except I'm missing a row.


My problem is for employee_id=1, location_id=1, access_date=

'2020/08/23 10:57:20' is not showing up in the output as 'E' (early)..


Current output:

SCHEDULE_ID    SCHEDULE_DATE    EMPLOYEE_ID    LOCATION_ID    START_DATE    END_DATE    ACCESS_DATE    STATUS
22    08232021 00:00:00    1    1    08232021 11:00:00    08232021 11:04:00    08232021 11:01:53    G
22    08232021 00:00:00    1    2    08232021 11:10:00    08232021 11:13:00    08232021 11:11:13    G
22    08232021 00:00:00    1    2    08232021 11:10:00    08232021 11:13:00    08232021 11:15:13    L
22    08232021 00:00:00    1    3    08232021 11:16:00    08232021 11:19:00     -     N

Expected output

SCHEDULE_ID    SCHEDULE_DATE    EMPLOYEE_ID    LOCATION_ID    START_DATE    END_DATE    ACCESS_DATE    STATUS

22    08232021 00:00:00    1    1    08232021 11:00:00    08232021 11:04:00    08232021 10:57:20    E
22    08232021 00:00:00    1    1    08232021 11:00:00    08232021 11:04:00    08232021 11:01:53    G
22    08232021 00:00:00    1    2    08232021 11:10:00    08232021 11:13:00    08232021 11:11:13    G
22    08232021 00:00:00    1    2    08232021 11:10:00    08232021 11:13:00    08232021 11:15:13    L
22    08232021 00:00:00    1    3    08232021 11:16:00    08232021 11:19:00     -     N


I don't understand why this row isn't appearing and was hoping someone can explain the issue and suggest a way to resolve it.


BTW I'm running this on live SQL so there can be a common environment.


To all that respond, thanks in advance for your patience,time and expertise. 


Test case:
ALTER SESSION SET NLS_DATE_FORMAT = 'MMDDYYYY HH24:MI:SS';


create table access_history(
      seq_num integer  GENERATED BY DEFAULT AS IDENTITY (START WITH 1) NOT NULL,
       employee_id NUMBER(6), 
       card_num varchar2(10),
       location_id number(4),
       access_date date,
       processed NUMBER(1) default 0
    );

        create table schedule_assignment(
       schedule_id number(4),
       schedule_date DATE,
       employee_id NUMBER(6) DEFAULT 0,
constraint chk_midnight check (schedule_date=trunc(schedule_date, 'dd')),
          constraint sa_pk primary key (schedule_id, schedule_date)
  );

INSERT into schedule_assignment (
    schedule_id,
    schedule_date,
    employee_id
)
VALUES 
(22, TO_DATE('2021/08/23 00:00:00', 'yyyy/mm/dd hh24:mi:ss'), 1);

create table schedule(
       schedule_id NUMBER(4),
       location_id number(4),
       start_date DATE,
       end_date DATE,
         CONSTRAINT start_min check (start_date=trunc(start_date,'MI')),   
       CONSTRAINT end_min check (end_date=trunc(end_date,'MI')),
 CONSTRAINT end_gt_start CHECK (end_date >= start_date),
CONSTRAINT same_day CHECK (TRUNC(end_date) = TRUNC(start_date))
      );

   
   insert into schedule(
     schedule_id,
     location_id,
     start_date,
     end_date 
     )
        VALUES     (22,1,TO_DATE('2021/08/23 11:00:00', 'yyyy/mm/dd hh24:mi:ss'),TO_DATE('2021/08/23 11:04:00', 'yyyy/mm/dd hh24:mi:ss'));

   insert into schedule(
        schedule_id,
         location_id,
         start_date,
         end_date 
       )
       VALUES      (22,2,TO_DATE('2021/08/23 11:10:00', 'yyyy/mm/dd hh24:mi:ss'),
TO_DATE('2021/08/23 11:13:00', 'yyyy/mm/dd hh24:mi:ss'));

    insert into schedule(
      schedule_id,
      location_id,
      start_date,
      end_date 
     )
       VALUES        (22,3,TO_DATE('2021/08/23 11:16:00', 'yyyy/mm/dd hh24:mi:ss'),TO_DATE('2021/08/23 11:19:00', 'yyyy/mm/dd hh24:mi:ss'));

insert into access_history(
employee_id,
card_num,
location_id,  access_date)
VALUES
  (1, 'XXX',1,
TO_DATE('2020/08/23 10:57:20', 'yyyy/mm/dd hh24:mi:ss'));


insert into access_history(
employee_id,
card_num,
location_id,  access_date)
VALUES
  (1, 'XXX', 1,
TO_DATE('2021/08/23 11:01:53', 'yyyy/mm/dd hh24:mi:ss'));

insert into access_history(
employee_id,
card_num,
location_id,  access_date)
values
  (1, 'XXX', 2,
TO_DATE('2021/08/23 11:11:13', 'yyyy/mm/dd hh24:mi:ss'));

insert into access_history(
employee_id,
card_num,
location_id,  access_date)
values
  (1, 'XXX', 2,
TO_DATE('2021/08/23 11:15:13', 'yyyy/mm/dd hh24:mi:ss'));

WITH  sch  AS
(

    SELECT  s.schedule_id, 
sa.schedule_date,
sa.employee_id,
s.location_id, s.start_date,
s.end_date
    ,       ROW_NUMBER () OVER ( PARTITION BY  s.schedule_id,  s.location_id, TRUNC (s.start_date)
        ORDER BY      s.start_date
   )      AS rn
    ,       TRUNC (s.start_date)                AS this_day
    ,       TRUNC (s.start_date) + 1            AS next_day
    ,             s.start_date - INTERVAL '2' MINUTE  AS early_date
    ,       
s.end_date + INTERVAL '2' MINUTE  AS late_date    FROM    schedule s
-- need JOIN to get employee_id 
JOIN schedule_assignment sa
           ON s.schedule_id = sa.schedule_id
          AND TRUNC(s.start_date) = sa.schedule_date
),    ah    AS
(

    SELECT  a.employee_id, a.location_id, a.access_date

    ,       ROW_NUMBER () OVER ( PARTITION BY  a.employee_id, a.location_id, TRUNC (a.access_date)

          ORDER BY      a.access_date

    )     AS rn
    FROM    access_history a
--  WHERE   ...  -- if you need any filtering, put it here
) select  sch.schedule_id,
sch.schedule_date,
sch.employee_id,
sch.location_id, sch.start_date, 
sch.end_date,   ah.access_date,
CASE
                            WHEN  ah.access_date  <  sch.early_date       THEN  'E'

WHEN  
ah.access_date >=
sch.early_date AND
ah.access_date  <= sch.late_date        THEN  'G'

WHEN  ah.access_date  > sch.late_date        THEN  'L'
                                          WHEN  ah.access_date  IS NULL                THEN  'N'
          END  AS status


FROM sch
LEFT OUTER JOIN                ah  ON   ah.employee_id  = sch.employee_id
                                   AND  ah.location_id  = sch.location_id
                                   AND  ah.access_date  >= sch.this_day
                                   AND  ah.access_date  <  sch.next_day 
ORDER BY  
sch.employee_id, sch.start_date;


Tagged:

Best Answer

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,223 Red Diamond
    Accepted Answer

    Hi, @BeefStu

     Before I started adding multiple employees and schedules I wanted to keep the test CASE as small and compact as possible until i was satisfied it was working. 

    Wrong approach. A way that works for one employee and one schedule might be very inefficient, or might not work at at all, for multiple employees and/or multiple schedules. If you want to keep the sample data and results simple, then why don't you go all the way, that is, test it with 0 employees and 0 schedules. That will make it very easy to get the correct results.


    The row that I believe is missing shares the same employee_id, location_id but a different access_date (before the first schedule boundary) and I thought that would be considered early.

    As you said, the missing row has a different access_date. Did you notice that the row in ah is a full year earlier than the row in sch? If you really want to join those two anyway, then remove these lines

                      AND ah.access_date >= sch.this_day

                      AND ah.access_date < sch.next_day 

    from the join condition in the main query.

    BeefStu

Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,223 Red Diamond

    Hi, @BeefStu

    Thanks for posting the sample data and desired results. Don't forget to explain how you get the desired results from the data. What does each row of the output represent? Why should the missing row be included?

    Could you possible have two (or even more) employees, and/or two or more schedules? Say you had employee_id=2 who was assigned to schedule_id=33; wouldn't it be important that the results for employee_id=1 didn't get mixed up with the results for employee_id = 2? If so, then include at least two employees and at least two schedules in the sample data and results. It rarely makes sense for all rows of the sample data to have the same value in any column.

    Explain how your current query works. For example, why do you compute the rn columns?

  • BeefStu
    BeefStu Member Posts: 284 Blue Ribbon

    @Frank Kulash First, thanks for responding. Before I started adding multiple employees and schedules I wanted to keep the test CASE as small and compact as possible until i was satisfied it was working. If I couldn't get it working correctly with 1 schedule and 1 employee_id I doubt it will work with more than one. Once I'm satisfied I plan on adding more data for testing.


    The rn aren't being used. When this project was being designed someone thought there would be a 1-1 relationship between the schedule and the history and they didn't keep in mind what happened if an employee swipe their card twice at the same location. That's what I was trying to simulate with the missing entry in the output.


    As you can see from the output I have a row for  employee_id=1 and location_id=1 and an access_date of 11:01:53, which has a status ='G' (good) because it fell between the schedule boundaries. The row that I believe is missing shares the same employee_id, location_id but a different access_date (before the first schedule boundary) and I thought that would be considered early.


    A little further down  I have a row for employee_id=1 location_id=2 that is (L) late. Why does that row appear and not the early entry?

    It's likely that I am not understanding the code but I can't figure out what the problem is and why that early entry row isn't appearing, I thought it should. That's where I could use some SQL expertise. Thanks for your time and effort

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,223 Red Diamond
    Accepted Answer

    Hi, @BeefStu

     Before I started adding multiple employees and schedules I wanted to keep the test CASE as small and compact as possible until i was satisfied it was working. 

    Wrong approach. A way that works for one employee and one schedule might be very inefficient, or might not work at at all, for multiple employees and/or multiple schedules. If you want to keep the sample data and results simple, then why don't you go all the way, that is, test it with 0 employees and 0 schedules. That will make it very easy to get the correct results.


    The row that I believe is missing shares the same employee_id, location_id but a different access_date (before the first schedule boundary) and I thought that would be considered early.

    As you said, the missing row has a different access_date. Did you notice that the row in ah is a full year earlier than the row in sch? If you really want to join those two anyway, then remove these lines

                      AND ah.access_date >= sch.this_day

                      AND ah.access_date < sch.next_day 

    from the join condition in the main query.

    BeefStu
  • BeefStu
    BeefStu Member Posts: 284 Blue Ribbon

    @Frank Kulash a damm type on my part ugggh!! Not being a SQL developer, I thought the problem lied in the code and it never occurred to me to check 4 rows of data. I guess wasting my whole Sunday (day off) is all on me. Thanks for the help. Now I can load up the data and test away.