8 Replies Latest reply on Sep 23, 2010 1:59 AM by Aketi Jyuuzou

    Obtain largest timespan where sum of values is below threshold

    Dan Scott Guest
      Hi,

      I asked this question in the general forum a while ago and it was suggested that I post here too.

      I'm trying to obtain the 'largest timespan where sum of values is below threshold'. More details available here:

      Obtain largest timespan where sum of values is below threshold

      Here's the sample data:
      create table outputs (output_date date, quantity number);
      
      insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 09.00.00', 'DD-MON-YYYY HH24.MI.SS'), 10);
      insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 12.00.00', 'DD-MON-YYYY HH24.MI.SS'), 100);
      insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 14.00.00', 'DD-MON-YYYY HH24.MI.SS'), 1);
      insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 18.00.00', 'DD-MON-YYYY HH24.MI.SS'), 1);
      insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 20.30.00', 'DD-MON-YYYY HH24.MI.SS'), 1);
      insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 22.00.00', 'DD-MON-YYYY HH24.MI.SS'), 10);
      insert into outputs (output_date, quantity) values (to_date ('01-AUG-3000 02.00.00', 'DD-MON-YYYY HH24.MI.SS'), 10);
      The output should be (for a threshold of 9):
      31-JUL-3000 14.00.00,     31-JUL-3000 20.30.00,     3,     3
      The first two columns are the start and end of the window, the final 2 columns are the total within the window (below the threshold) and the number of entries. As a separate, but related problem, I'd also like to obtain the largest gap between measurements within the window.

      Thanks,

      Dan Scott
        • 1. Re: Obtain largest timespan where sum of values is below threshold
          Frank Kulash
          Hi, Dan,

          Intersting problem!
          danscott wrote:
          Hi,

          I asked this question in the general forum a while ago and it was suggested that I post here too.
          Yes, this forum is where questions like this belong.
          If you do have to make a duplicate post, like this, mark the original as "Answered" right away, so all your answers will be in one place. You can edit your origianl message, saying "Moved to this forum:", with a link. (Did you just do that? I see that the original message is marked as "Answered" now.)
          ... Here's the sample data:
          Thanks for posting that! It's very helpful.

          Here's one way to do what you want, using analytic fucntions:
          VARIABLE     threshold_value      NUMBER;
          EXEC  :threshold_value := 9;
          
          
          WITH     got_running_totals     AS
          (
               SELECT     output_date, quantity
               ,     SUM (quantity) OVER (ORDER BY output_date)      AS running_total_quantity
               ,     ROW_NUMBER ()  OVER (ORDER BY output_date)     AS cnt
               FROM     outputs
          )
          ,     got_rnk               AS
          (
               SELECT     e.output_date                    AS start_date
               ,     l.output_date                    AS end_date
               ,     e.quantity + l.running_total_quantity
                            - e.running_total_quantity     AS total_quantity
               ,     1 + l.cnt
                      - e.cnt                    AS total_cnt
               ,     RANK () OVER (ORDER BY  e.quantity + l.running_total_quantity
                                              - e.running_total_quantity     DESC
                              )                         AS rnk
               FROM     got_running_totals     e     -- e is earlier, l is later
               JOIN     got_running_totals     l     ON   e.cnt      <= l.cnt
                                        AND  e.quantity + l.running_total_quantity
                                                          - e.running_total_quantity     
                                             < :threshold_value
          )
          SELECT  start_date
          ,     end_date
          ,     total_quantity
          ,     total_cnt
          FROM     got_rnk
          WHERE     rnk     = 1
          ;
          I'm sure you can do this with MODEL, too, and also with CONNECT BY. I'm not sure they'd be any more efficient, or any simpler, though.

          In tje sub-query called got_rnk, we need the same formula (x + y - z) 3 times. You might want to add another sub-query just to compute that once.
          ... As a separate, but related problem, I'd also like to obtain the largest gap between measurements within the window.
          Sorry, I don't understand this part. Post the results you want from the sample data you've already given, and an explanation of how you get those results.

          Edited by: Frank Kulash on Aug 27, 2010 5:43 PM

          Edited by: Frank Kulash on Aug 28, 2010 9:52 AM

          As originally posted above, this is finding the group of consecutvie rows with the greatest toal quantity less than the threshold.
          To find the longest time span (tha is, the greatest difference between dates) change the computation of rnk (in sub-query got_rnk) to
          ...     ,     RANK () OVER (ORDER BY l.output_date - e.output_date          DESC
                              )                         AS rnk
          • 2. Re: Obtain largest timespan where sum of values is below threshold
            672680
            Below is from your previous post:
            threshold '14' would give:
            31-JUL-3000 14.00.00, 31-JUL-3000 22.00.00, 13, 4
            for, threshold 14, shuldn`t the biggest time span is ? :
            31-JUL-3000 14.00.00, 01-AUG-3000 02.00.00, 13, 4
            • 3. Re: Obtain largest timespan where sum of values is below threshold
              Aketi Jyuuzou
              Thanks.
              This question lets me say "I like recursive with clause B-) "
              with t(output_date,quantity) as(
              select to_date('31-07-3000 09.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
              select to_date('31-07-3000 12.00.00','DD-MM-YYYY HH24.MI.SS'),100 from dual union
              select to_date('31-07-3000 14.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
              select to_date('31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
              select to_date('31-07-3000 20.30.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
              select to_date('31-07-3000 22.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
              select to_date('01-08-3000 02.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual),
              tmp(output_date,quantity,rn) as(
              select output_date,quantity,
              Row_Number() over(order by output_date)
                from t),
              rec(rootOutput_date,output_date,rn,sumVal,cnt) as(
              select output_date,output_date,rn,quantity,1
                from tmp
               where quantity < 9 /*threshold*/
              union all
              select a.rootOutput_date,b.output_date,b.rn,
              a.sumVal+b.quantity,a.cnt+1
                from rec a,tmp b
               where a.rn+1=b.rn
                 and a.sumVal+b.quantity < 9 /*threshold*/)
              select to_char(rootOutput_date,'YYYY/MM/DD HH24:MI') as rootOutput_date,
              to_char(output_date,'YYYY/MM/DD HH24:MI') as output_date,sumVal,cnt
              from (select rootOutput_date,output_date,sumVal,cnt,
                    max(sumVal) over() as maxSumVal
                    from rec)
              where sumVal = maxSumVal;
              
              rootOutput_date   output_date       sumVal  cnt
              ----------------  ----------------  ------  ---
              3000/07/31 14:00  3000/07/31 20:30       3    3
              • 4. Re: Obtain largest timespan where sum of values is below threshold
                Aketi Jyuuzou
                I like Join :D

                There is my article of OTN-Japan which is related Join.
                http://www.oracle.com/technetwork/jp/articles/otnj-sql-image1-308625-ja.html
                with t(output_date,quantity) as(
                select to_date('31-07-3000 09.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
                select to_date('31-07-3000 12.00.00','DD-MM-YYYY HH24.MI.SS'),100 from dual union
                select to_date('31-07-3000 14.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
                select to_date('31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
                select to_date('31-07-3000 20.30.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
                select to_date('31-07-3000 22.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
                select to_date('01-08-3000 02.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual)
                select to_char(staD,'YYYY/MM/DD HH24:MI') as staD,
                to_char(endD,'YYYY/MM/DD HH24:MI') as endD,sumVal,cnt
                from (select a.output_date as staD,b.output_date as endD,
                      sum(c.quantity) as sumVal,count(*) as cnt,
                      max(sum(c.quantity)) over() as maxSumVal
                        from t a,t b,t c
                       where a.output_date < b.output_date
                         and c.output_date between a.output_date and b.output_date
                      group by a.output_date,b.output_date
                      having sum(c.quantity) < 9 /*threshold*/)
                where sumVal = maxSumVal;
                
                staD              endD              sumVal  cnt
                ----------------  ----------------  ------  ---
                3000/07/31 14:00  3000/07/31 20:30       3    3
                • 5. Re: Obtain largest timespan where sum of values is below threshold
                  Dan Scott Guest
                  Thanks, excellent solutions! Much better than my 48 'copy and paste's. :)

                  Just to clarify my initial additional request, I was asking for the largest timespan between subsequent entries. i.e. for the '3' threshold, the largest time gap is 4 hrs, from '31-07-3000 14.00.00' to '31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'. Now that you have given me a working solution for the hardest part, I can probably work this out for myself, using LEAD and a MAX analytic function.

                  @PhoenixBai - it depends on the (my) definition....

                  I just want the time difference between the first and last measurements. I guess equally, the 'largest time span' could be from

                  31-07-3000 12.00.01

                  to

                  31-07-3000 21.59.59

                  i.e. 1 second after the '100' measurement until 1 second before the '10' measurement.

                  Thanks,

                  Dan
                  • 6. Re: Obtain largest timespan where sum of values is below threshold
                    Frank Kulash
                    Hi, Dan,

                    The query I posted earlier computed the total_quantity, total_cnt and timespan from two rows, including the values from each of the endpoints.
                    If I understand you, you want to count the timpapan between two rows, including the enpoints, bu to include the qunatity and count only from the rwos strictly between the endpoints, not counting either one.

                    Here's how to modify my query to do that:
                    WITH     got_running_totals     AS
                    (
                         SELECT     output_date, quantity
                         ,     SUM (quantity) OVER (ORDER BY output_date)      AS running_total_quantity
                         ,     ROW_NUMBER ()  OVER (ORDER BY output_date)     AS cnt
                         FROM     outputs
                    )
                    ,     got_rnk               AS
                    (
                         SELECT     e.output_date               AS start_date
                         ,     l.output_date               AS end_date
                         ,     ( l.running_total_quantity                              -- Changed
                              - e.running_total_quantity
                              ) - l.quantity               AS total_quantity
                         ,     ( 1 + l.cnt                                        -- Changed
                                  - e.cnt
                              ) - 2                    AS total_cnt
                         ,     RANK () OVER (ORDER BY l.output_date - e.output_date          DESC
                                        )                    AS rnk
                         FROM     got_running_totals     e     -- e is earlier, l is later
                         JOIN     got_running_totals     l     ON   e.cnt      < l.cnt          -- Changed
                                                  AND  ( l.running_total_quantity          -- Changed
                                                          - e.running_total_quantity     
                                                       ) - l.quantity     <= :threshold_value
                    )
                    SELECT  start_date
                    ,     end_date
                    ,     total_quantity
                    ,     total_cnt
                    ,     (end_date - start_date) * 24     AS timespan_hours                    -- Added
                    FROM     got_rnk
                    WHERE     rnk     = 1
                    ;
                    4 expressions were changed and one was added, as shown by the comments.
                    Output, for :threshold_value = 3:
                    `                                           TOTAL_ TOTAL TIMESPAN
                    START_DATE           END_DATE             QUANTITY  _CNT   _HOURS
                    -------------------- -------------------- -------- ----- --------
                    31-JUL-3000 12.00.00 31-JUL-3000 22.00.00        3     3       10
                    As you said, you could get the same results by calling some more functions, such as LEAD, but there's no need to.
                    • 7. Re: Obtain largest timespan where sum of values is below threshold
                      Dan Scott Guest
                      Hi,

                      I think there is some confusion here. Sorry.

                      I'm looking for the 'largest gap between subsequent values within the window'. i.e. The window with threshold 3 has a window from 31-JUL-3000 12.00.00 to 31-JUL-3000 20.30.00. Within that window, the largest gap between measurements is from 14.00 to 18.00 or 4 hrs. So the overall output would be:
                                                                 TOTAL_  TOTAL  MAX_GAP
                      START_DATE           END_DATE             QUANTITY  _CNT   _HOURS
                      -------------------- -------------------- -------- ----- --------
                      31-JUL-3000 12.00.00 31-JUL-3000 22.00.00        3     3       4
                      I think I have worked this out for myself:
                      WITH got_running_totals AS (
                        SELECT     output_date, quantity,
                          SUM (quantity)      OVER (ORDER BY output_date) AS running_total_quantity,
                          ROW_NUMBER ()       OVER (ORDER BY output_date) AS cnt,
                          (LEAD (output_date)  OVER (ORDER BY output_date) - output_date) * 24 AS hours_to_next_output
                        FROM     outputs
                      )
                      --select * from got_running_totals;
                      , got_rnk AS (
                        SELECT
                          e.output_date AS start_date,
                          l.output_date AS end_date,
                          e.quantity + l.running_total_quantity
                            - e.running_total_quantity AS total_quantity,
                          1 + l.cnt
                            - e.cnt AS total_cnt,
                          RANK () OVER (ORDER BY  e.quantity + l.running_total_quantity
                            - e.running_total_quantity     DESC) AS rnk,
                          MAX (e.hours_to_next_output) OVER (ORDER BY  e.quantity + l.running_total_quantity
                            - e.running_total_quantity     DESC) AS largest_gap
                        FROM got_running_totals e -- e is earlier, l is later
                        JOIN got_running_totals l ON e.cnt <= l.cnt
                         AND e.quantity + l.running_total_quantity
                                - e.running_total_quantity < :threshold_value
                      )
                      --select * from got_rnk;
                      SELECT start_date
                      ,     end_date
                      ,     total_quantity
                      ,     total_cnt
                      , largest_gap
                      FROM     got_rnk
                      WHERE     rnk     = 1
                      ;
                      Obviously, ordering is not important for MAX, and I will change this to a partition when I include my additional ID fields

                      Most of my previous reply, the text after '@PhoenixBai', was directed at the poster of the third response to this thread, who was asking about the window definition.

                      Sorry for the confusion.

                      Dan
                      • 8. Re: Obtain largest timespan where sum of values is below threshold
                        Aketi Jyuuzou
                        Your additional request lets me say "I like recursive with clause B-) "
                        with t(output_date,quantity) as(
                        select to_date('31-07-3000 09.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
                        select to_date('31-07-3000 12.00.00','DD-MM-YYYY HH24.MI.SS'),100 from dual union
                        select to_date('31-07-3000 14.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
                        select to_date('31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
                        select to_date('31-07-3000 20.30.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
                        select to_date('31-07-3000 22.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
                        select to_date('01-08-3000 02.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual),
                        tmp(output_date,quantity,rn) as(
                        select output_date,quantity,
                        Row_Number() over(order by output_date)
                          from t),
                        rec(rootOutput_date,output_date,rn,sumVal,cnt,maxDiff) as(
                        select output_date,output_date,rn,quantity,1,0
                          from tmp
                         where quantity < 9 /*threshold*/
                        union all
                        select a.rootOutput_date,b.output_date,b.rn,
                        a.sumVal+b.quantity,a.cnt+1,
                        greatest(a.maxDiff,(b.output_date-a.output_date)*24)
                          from rec a,tmp b
                         where a.rn+1=b.rn
                           and a.sumVal+b.quantity < 9 /*threshold*/)
                        select to_char(rootOutput_date,'YYYY/MM/DD HH24:MI') as rootOutput_date,
                        to_char(output_date,'YYYY/MM/DD HH24:MI') as output_date,sumVal,cnt,maxDiff
                        from (select rootOutput_date,output_date,sumVal,cnt,maxDiff,
                              max(sumVal) over() as maxSumVal
                              from rec)
                        where sumVal = maxSumVal;
                        
                        rootOutput_date   output_date       sumVal  cnt  maxDiff
                        ----------------  ----------------  ------  ---  -------
                        3000/07/31 14:00  3000/07/31 20:30       3    3        4