7 Replies Latest reply: Apr 21, 2011 10:17 PM by Frank Kulash RSS

    Stuck - Need help writing SQL logic

    856245
      Heres the scenario ... I've got a table storing date ranges which overlap each other. I need to breakup up the dates and to show a timeline. The catch is that I'm trying to do this in SQL, specifically I'm using Oracle 10g – so can also use the Oracle PL/SQL to write the logic if need be.

      Lets assume I have a table storing job information for employees in my organisation.
      JOB_ID | EMPLOYEE_NUMBER | JOB_TITLE | START_DATE | END_DATE
      1 | XYZ | Forklift Driver |10-JAN-2011 10-DEC-2020
      2 | XYZ | Supervisor |20-JAN-2013 15-DEC-2013

      In the above example, employee XYZ originally joined the company to be a forklift driver but then temporarily fills a supervisor role that becomes vacant i.e. XYZ is on secondment.

      I want to split the dates to show a timeline of employment like this.
      JOB_ID | START_DATE | END_DATE
      1 | 10-JAN-2011 | 19-JAN-2013 << Starts original role
      2 | 20-JAN-2013 | 15-DEC-2013 << Gets seconded – supervisor
      1 | 16-DEC-2013 | 10-DEC-2020 << Back to normal role

      This is not the exact scenario for my problem but its a lot simpler to explain. I've also tried using the Oracle LEAD and LAG functions which get my quite close but no cigar. Because an employee can switch between roles many times throughout their employment, the solution needs to be dynamic rather then using hardcoded logic.

      Edited by: user12022738 on Apr 21, 2011 11:53 AM

      Edited by: user12022738 on Apr 21, 2011 11:53 AM

      Edited by: user12022738 on Apr 21, 2011 11:54 AM
        • 1. Re: Stuck - Need help writing SQL logic
          Dave Rabone
          This type of problem is tricky - and the solution depends on what else we can assume:

          For example: is

          3 | XYZ | Manager |20-MAR-2013 15-SEP-2013 possible
          or even
          3 | XYZ | Manager |20-MAR-2013 15-SEP-2014

          in other words, can we have more than one assignment active at a time, and is each assignment contained within a previous one.
          • 2. Re: Stuck - Need help writing SQL logic
            856245
            Dave, lets just assume that there will only be one active assignment here. The "active" assignment will be determined by using the latest starting "start date" as at the current day - if that makes sense. So as of today 21st Apr, the "active" assignment will be the forklift driver i.e. the employees original role.

            Each assignment doesn't necesarily need to be contained within another one. The example employee might originally join as a forklift driver on a fixed contract but then be given a contract extension which could be in a different role i.e. we create a new record.

            Regards,
            • 3. Re: Stuck - Need help writing SQL logic
              Frank Kulash
              Hi,

              Welcome to the forum!

              Whenever you post a question, please post CREATE TABLE and INSERT statements for your sample data, so that the people who want to help you can re-create the problem and test their ideas.
              For example:
              CREATE TABLE     job
              (     job_id          NUMBER          PRIMARY KEY
              ,     employee_number     VARCHAR2 (10)
              ,     job_title     VARCHAR2 (20)
              ,     start_date     DATE
              ,     end_date     DATE
              );
              
              INSERT INTO job (job_id, employee_number, job_title,           start_date,        end_date)
                   VALUES  (1,      'XYZ',           'Forklift Driver', DATE '2011-01-10', DATE '2020-12-10');
              INSERT INTO job (job_id, employee_number, job_title,           start_date,        end_date)
                   VALUES  (2,      'XYZ',           'Supervisor',      DATE '2013-01-20', DATE '2013-12-15');
              You can get the results you want using analytic functions. (I used ROW_NUMBER and LAG below.)
              WITH     unpivoted_data     AS
              (
                   SELECT     job_id, employee_number, job_title
                   ,      start_date     AS event_date
                   ,     ROW_NUMBER () OVER ( PARTITION BY  employee_number
                                       ORDER BY        start_date
                                     )     AS start_number
                   FROM     job
                        --
                  UNION ALL
                        --
                   SELECT     job_id, employee_number, job_title
                   ,      END_date     AS event_date
                   ,     0          AS start_number
                   FROM     job
              )
              ,     got_neighbor_data     AS
              (
                   SELECT     job_id
                   ,     employee_number     -- If wanted
                   ,     LAG (event_date)      OVER ( PARTITION BY  employee_number
                                            ORDER BY      event_date
                                             )     AS start_date
                   ,       event_date               AS end_date
                   ,       start_number
                   ,       LAG  (start_number) OVER ( PARTITION BY  employee_number
                                            ORDER BY      event_date
                                             )     AS prev_start_number
                   FROM       unpivoted_data
              )
              SELECT    job_id
              ,       employee_number     -- If wanted
              ,       start_date + CASE
                                WHEN  prev_start_number = 0
                                THEN  1
                                ELSE  0
                               END          AS start_date
              ,       end_date   - CASE
                                WHEN  start_number > 0
                                THEN  1
                                ELSE  0
                               END          AS end_date
              FROM       got_neighbor_data
              WHERE       start_number     != 1
              ORDER BY  employee_number
              ,       start_date
              ;
              Each row in your table will result in at least one row in the output, but it may result in two rows, if a given assignment temporarily interrupts another assignment. That's why this query starts off with a UNION to make two rows for every assingment: one showing the start_date and the other the end_date. start_number identifies whether the row indicates the start of an assignment (when start_numebr > 0) or the end of one (start_number = 0).
              In the sub-query got_neighbor_data, LAG gets the event_date from the previous row, so we have the beginning and ending dates of each tenure. We may need to fudge these by a day; for example, on December 15, XYZ ceased being a Supervisor, and went back to being a Forklift Driver. We don't want to show the same date twice, so we add 1 day to the start_date on the later line.
              user12022738 wrote:
              Each assignment doesn't necesarily need to be contained within another one. The example employee might originally join as a forklift driver on a fixed contract but then be given a contract extension which could be in a different role i.e. we create a new record.
              Okay, but, given that two assignments do overlap, will it always be the case that one is entirely contained inside the other. or can you have situations like this:
              INSERT INTO job (job_id, employee_number, job_title,           start_date,        end_date)
                   VALUES  (91,     'FOO',           'Forklift Driver', DATE '2011-01-01', DATE '2012-12-31');
              INSERT INTO job (job_id, employee_number, job_title,           start_date,        end_date)
                   VALUES  (92,     'FOO,           'Supervisor',      DATE '2012-01-01', DATE '2014-12-31');
              If that is possible, what output would you want?
              • 4. Re: Stuck - Need help writing SQL logic
                856245
                Thanks again Frank.

                Yes the dates don't need to be entirely contained, therefore that example that you posted is definitely valid. In this example I would expect the output to be like so
                JOB_ID | job_title | START_DATE | END_DATE
                91 | Forklift driver | 01-JAN-2011 | 31-DEC-2011
                92 | Supervisor | 01-JAN-2012 | 31-DEC-2014

                The biggest challenge is that in theory each row of output may result in an unlimited amount of rows, rather then just 2 rows as you mentioned.

                Consider the following timeline, this employee might originally start job A as a forklift driver but then later temporarily move into job role B, then back to the forklift driver. Then again, later down the line he might move into C - manager role, once completed he's resume the forklift driver role.

                |----------------------A-------------------------------|
                |----B----| |------C----|
                • 5. Re: Stuck - Need help writing SQL logic
                  856245
                  | |

                  Edited by: user12022738 on Apr 22, 2011 9:45 AM
                  • 6. Re: Stuck - Need help writing SQL logic
                    856245
                    | |

                    Edited by: user12022738 on Apr 22, 2011 9:45 AM
                    • 7. Re: Stuck - Need help writing SQL logic
                      Frank Kulash
                      Hi,
                      user12022738 wrote:
                      ... The biggest challenge is that in theory each row of output may result in an unlimited amount of rows, rather then just 2 rows as you mentioned...
                      Sorry, I must not understand the problem correctly. Can you try again to describe what you want? Post some sample data and results, including a case where an employee with N rows in the table has more than 2*N rows of output.
                      Will you ever have a situation where an employee has two different rows in the table with exactly the same start_date? If so, describe what happens, and include an example in your sample data and results.