1 2 Previous Next 19 Replies Latest reply: May 16, 2008 2:32 AM by NicloeiW RSS

    using analytical function to calculate concurrency between date range

    522382
      Folks,
      I'm trying to use analytical functions to come up with a query that gives me the
      concurrency of jobs executing between a date range.

      For example:
      JOB100 - started at 9AM - stopped at 11AM
      JOB200 - started at 10AM - stopped at 3PM
      JOB300 - started at 12PM - stopped at 2PM

      The query would tell me that JOB1 ran with a concurrency of 2 because JOB1 and JOB2
      were running started and finished within the same time. JOB2 ran with the concurrency
      of 3 because all jobs ran within its start and stop time. The output would look like this.

      JOB START STOP CONCURRENCY
      === ==== ==== =========
      100 9AM 11AM 2
      200 10AM 3PM 3
      300 12PM 2PM 2

      I've been looking at this post, and this one if very similar...
      Analytic functions using window date range

      Here is the sample data..

      CREATE TABLE TEST_JOB
      ( jobid NUMBER,
      created_time DATE,
      start_time DATE,
      stop_time DATE
      )
      /
      insert into TEST_JOB values (100, sysdate -1, to_date('05/04/08 09:00:00','MM/DD/YY hh24:mi:ss'), to_date('05/04/08 11:00:00','MM/DD/YY hh24:mi:ss'));
      insert into TEST_JOB values (200, sysdate -1, to_date('05/04/08 10:00:00','MM/DD/YY hh24:mi:ss'), to_date('05/04/08 13:00:00','MM/DD/YY hh24:mi:ss'));
      insert into TEST_JOB values (300, sysdate -1, to_date('05/04/08 12:00:00','MM/DD/YY hh24:mi:ss'), to_date('05/04/08 14:00:00','MM/DD/YY hh24:mi:ss'));


      select * from test_job;

      JOBID|CREATED_TIME |START_TIME |STOP_TIME
      ----------|--------------|--------------|--------------
      100|05/04/08 09:28|05/04/08 09:00|05/04/08 11:00
      200|05/04/08 09:28|05/04/08 10:00|05/04/08 13:00
      300|05/04/08 09:28|05/04/08 12:00|05/04/08 14:00

      Any help with this query would be greatly appreciated.

      thanks.
      -peter
        • 1. Re: using analytical function to calculate concurrency between date range
          Nicolas.Gasparotto
          LAG() or LEAD() should help you. Take a look in the documentation.

          Nicolas.
          • 2. Re: using analytical function to calculate concurrency between date range
            Frank Kulash
            Hi,

            If you have two date ranges b1-e1 and b2-e2 (b1 <= e1 and b2 <= e2), you can tell if they overlap by saying

            (b1 <= e2) AND (e1 >= b2)

            Use this condition in a self-join to see how many rows overlap with a given row.
            • 3. Re: using analytical function to calculate concurrency between date range
              522382
              I'm not so sure about that.
              From all the examples I've seen LAG and LEAD can give you the previous or next value
              in relation to a particular row. In my case I need to count how many records have a start_time and stop_time that fall within my current record's start_time and stop_time.
              I'll check it out further though to see if I can do some sort of aggregation the result of LAG and LEAD.

              thanks for your input.
              -peter
              • 4. Re: using analytical function to calculate concurrency between date range
                522382
                I should have mentioned this, but I was trying to use analytical functions to avoid the self join.
                -peter
                • 5. Re: using analytical function to calculate concurrency between date range
                  Nicolas.Gasparotto
                  You don't need auto-join with the functions I told earlier about.

                  Nicolas.
                  • 6. Re: using analytical function to calculate concurrency between date range
                    Anurag Tibrewal
                    Hi,

                    Too late

                    Regards

                    Message was edited by:
                    Anurag Tibrewal
                    • 7. Re: using analytical function to calculate concurrency between date range
                      Nicolas.Gasparotto
                      It was not clear enough from your first post. So, kindly post what should be the expected output with your sample input given above.

                      Nicolas.
                      • 8. Re: using analytical function to calculate concurrency between date range
                        Laurent Schneider
                        did you consider model?
                        select *
                        from test_job 
                        model
                        dimension by (start_time, stop_time)
                        measures (created_time,jobid,0 n)
                        (n[any,any]=
                          count(*)[start_time<=cv(start_time),stop_time>=cv(start_time)]+
                          count(*)[start_time between cv(start_time) and cv(stop_time),stop_time>cv(stop_time)]
                        ) 
                        • 9. Re: using analytical function to calculate concurrency between date range
                          Anurag Tibrewal
                          If the analytical query is not a constraint, does this give you the expected output
                          select JOBID,
                                 (select count(*)
                                    from TEST_JOB
                                   where START_TIME <= A.STOP_TIME OR STOP_TIME => A.START_TIME)
                            from TEST_JOB
                          • 10. Re: using analytical function to calculate concurrency between date range
                            522382
                            Nicolas,

                            The sample output might have not been obvious in my first post, but here it is again.

                            JOOB START STOP CONCURRENCY
                            === ==== ==== =========
                            100 9AM 11AM 2
                            200 10AM 3PM 3
                            300 12PM 2PM 2

                            Example, JOB 100 started at 9AM and completed at 11AM. That means that this job had a concurrency of 2 because job 200 started at 10AM which falls within the time that it took job 100 to complete (9AM -> 11AM).

                            JOB 200 started at 10AM and completd at 3PM, so it has a concurrency of 3 because the other 2 jobs were also running within this time.

                            hope this clears it up a bit.
                            -peter
                            • 11. Re: using analytical function to calculate concurrency between date range
                              522382
                              Laurent,
                              I have to say that I've never used model before, but from running the query the output is exactly what I was looking for. Very nice!

                              Anurag,
                              your query is essentially a self-join, but it will probably work too (haven't tested). I'm going to compare the model query against a self join to check for overall efficiency.

                              thanks.
                              -peter
                              • 12. Re: using analytical function to calculate concurrency between date range
                                Laurent Schneider
                                fine, the model should be more efficient as the table is selected only once.

                                Be sure you test limit case, for example if one period ends exactly when another start, etc... but it should be easy for you to correct it

                                Glad I made you interested in sql modelling :)
                                • 13. Re: using analytical function to calculate concurrency between date range
                                  522382
                                  after some checking the model rule wasn't working exactly as expected.

                                  I believe it's working right now. I'm posting a self-contained example for completeness sake.I use 2 functions to convert back and forth between epoch unix timestamps, so
                                  I'll post them here as well.

                                  Like I said I think this works okay, but any feedback is always appreciated.
                                  -peter

                                  CREATE OR REPLACE FUNCTION date_to_epoch(p_dateval IN DATE)
                                  RETURN NUMBER
                                  AS
                                  BEGIN
                                  return (p_dateval - to_date('01/01/1970','MM/DD/YYYY')) * (24 * 3600);
                                  END;
                                  /

                                  CREATE OR REPLACE FUNCTION epoch_to_date (p_epochval IN NUMBER DEFAULT 0)
                                  RETURN DATE
                                  AS
                                  BEGIN
                                  return to_date('01/01/1970','MM/DD/YYYY') + (( p_epochval) / (24 * 3600));
                                  END;
                                  /

                                  DROP TABLE TEST_MODEL3 purge;

                                  CREATE TABLE TEST_MODEL3
                                  ( jobid NUMBER,
                                  start_time NUMBER,
                                  end_time NUMBER);


                                  insert into TEST_MODEL3
                                  VALUES (300,date_to_epoch(to_date('05/07/2008 10:00','MM/DD/YYYY hh24:mi')),
                                  date_to_epoch(to_date('05/07/2008 19:00','MM/DD/YYYY hh24:mi')));
                                  insert into TEST_MODEL3
                                  VALUES (200,date_to_epoch(to_date('05/07/2008 09:00','MM/DD/YYYY hh24:mi')),
                                  date_to_epoch(to_date('05/07/2008 12:00','MM/DD/YYYY hh24:mi')));
                                  insert into TEST_MODEL3
                                  VALUES (400,date_to_epoch(to_date('05/07/2008 10:00','MM/DD/YYYY hh24:mi')),
                                  date_to_epoch(to_date('05/07/2008 14:00','MM/DD/YYYY hh24:mi')));
                                  insert into TEST_MODEL3
                                  VALUES (500,date_to_epoch(to_date('05/07/2008 11:00','MM/DD/YYYY hh24:mi')),
                                  date_to_epoch(to_date('05/07/2008 16:00','MM/DD/YYYY hh24:mi')));
                                  insert into TEST_MODEL3
                                  VALUES (600,date_to_epoch(to_date('05/07/2008 15:00','MM/DD/YYYY hh24:mi')),
                                  date_to_epoch(to_date('05/07/2008 22:00','MM/DD/YYYY hh24:mi')));
                                  insert into TEST_MODEL3
                                  VALUES (100,date_to_epoch(to_date('05/07/2008 09:00','MM/DD/YYYY hh24:mi')),
                                  date_to_epoch(to_date('05/07/2008 23:00','MM/DD/YYYY hh24:mi')));
                                  commit;

                                  SELECT jobid,
                                  epoch_to_date(start_time)start_time,
                                  epoch_to_date(end_time)end_time,
                                  n concurrency
                                  FROM TEST_MODEL3
                                  MODEL
                                  DIMENSION BY (start_time,end_time)
                                  MEASURES (jobid,0 n)
                                  (n[any,any]=
                                  count(*)[start_time<= cv(start_time),end_time>=cv(start_time)]+
                                  count(*)[start_time > cv(start_time) and start_time <= cv(end_time), end_time >= cv(start_time)]
                                  )
                                  ORDER BY start_time;

                                  The results look like this:

                                  JOBID|START_TIME|END_TIME |CONCURRENCY
                                  ----------|---------------|--------------|-------------------
                                  100|05/07/08 09:00|05/07/08 23:00| 6
                                  200|05/07/08 09:00|05/07/08 12:00| 5
                                  300|05/07/08 10:00|05/07/08 19:00| 6
                                  400|05/07/08 10:00|05/07/08 14:00| 5
                                  500|05/07/08 11:00|05/07/08 16:00| 6
                                  600|05/07/08 15:00|05/07/08 22:00| 4
                                  • 14. Re: using analytical function to calculate concurrency between date range
                                    522382
                                    For completeness sake, I will add something else that I ran into, which was
                                    the ORA-32638. In my example, it's very possible that you can have different jobid's that have a start/end time that is the same... so the DIMENSION BY will not be able to uniquely identify a cell .. which from the docs it seems that it is required.

                                    So for example, if I added another row into my test table...

                                    insert into TEST_MODEL3
                                    VALUES (700,date_to_epoch(to_date('05/07/2008 09:00','MM/DD/YYYY hh24:mi')),
                                    date_to_epoch(to_date('05/07/2008 23:00','MM/DD/YYYY hh24:mi')));

                                    this insert statement has the same start/end times as jobid 100. If I were to execute the
                                    above query, it fails with 'ORA-32638: Non unique addressing in MODEL dimension'

                                    In my example, if my DIMENSION BY includes jobid, then this would be able to
                                    uniquely identify a cell. So rewriting the query like this does work and I think it's correct.

                                    SELECT
                                    jobid,
                                    epoch_to_date(start_time)start_time,
                                    epoch_to_date(end_time)end_time,
                                    n concurrency
                                    FROM
                                    (
                                    SELECT
                                    jobid,
                                    start_time,
                                    end_time
                                    from TEST_MODEL3
                                    )
                                    MODEL
                                    DIMENSION BY (jobid,start_time,end_time)
                                    MEASURES (0 n)
                                    (n[any,any,any]=
                                    count(*)[any, start_time<= cv(start_time),end_time>=cv(start_time)]+
                                    count(*)[any, start_time > cv(start_time) and start_time <= cv(end_time), end_time >= cv(start_time)]
                                    )
                                    ORDER BY start_time;

                                    .. and the results are:

                                    JOBID|START_TIME |END_TIME |CONCURRENCY
                                    ----------|-------------------|-------------------|-------------------
                                    200|05/07/08 09:00|05/07/08 12:00| 6
                                    700|05/07/08 09:00|05/07/08 23:00| 7
                                    100|05/07/08 09:00|05/07/08 23:00| 7
                                    300|05/07/08 10:00|05/07/08 19:00| 7
                                    400|05/07/08 10:00|05/07/08 14:00| 6
                                    500|05/07/08 11:00|05/07/08 16:00| 7
                                    600|05/07/08 15:00|05/07/08 22:00| 5

                                    If my thought process is wrong, please let me know.
                                    thanks.
                                    1 2 Previous Next