8 Replies Latest reply on Sep 23, 2010 1:59 AM by Aketi Jyuuzou

# Obtain largest timespan where sum of values is below threshold

Hi,

I asked this question in the general forum a while ago and it was suggested that I post here too.

I'm trying to obtain the 'largest timespan where sum of values is below threshold'. More details available here:

Obtain largest timespan where sum of values is below threshold

Here's the sample data:
``````create table outputs (output_date date, quantity number);

insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 09.00.00', 'DD-MON-YYYY HH24.MI.SS'), 10);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 12.00.00', 'DD-MON-YYYY HH24.MI.SS'), 100);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 14.00.00', 'DD-MON-YYYY HH24.MI.SS'), 1);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 18.00.00', 'DD-MON-YYYY HH24.MI.SS'), 1);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 20.30.00', 'DD-MON-YYYY HH24.MI.SS'), 1);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 22.00.00', 'DD-MON-YYYY HH24.MI.SS'), 10);
insert into outputs (output_date, quantity) values (to_date ('01-AUG-3000 02.00.00', 'DD-MON-YYYY HH24.MI.SS'), 10);``````
The output should be (for a threshold of 9):
``31-JUL-3000 14.00.00,     31-JUL-3000 20.30.00,     3,     3``
The first two columns are the start and end of the window, the final 2 columns are the total within the window (below the threshold) and the number of entries. As a separate, but related problem, I'd also like to obtain the largest gap between measurements within the window.

Thanks,

Dan Scott
• ###### 1. Re: Obtain largest timespan where sum of values is below threshold
Hi, Dan,

Intersting problem!
danscott wrote:
Hi,

I asked this question in the general forum a while ago and it was suggested that I post here too.
Yes, this forum is where questions like this belong.
If you do have to make a duplicate post, like this, mark the original as "Answered" right away, so all your answers will be in one place. You can edit your origianl message, saying "Moved to this forum:", with a link. (Did you just do that? I see that the original message is marked as "Answered" now.)
... Here's the sample data:
Thanks for posting that! It's very helpful.

Here's one way to do what you want, using analytic fucntions:
``````VARIABLE     threshold_value      NUMBER;
EXEC  :threshold_value := 9;

WITH     got_running_totals     AS
(
SELECT     output_date, quantity
,     SUM (quantity) OVER (ORDER BY output_date)      AS running_total_quantity
,     ROW_NUMBER ()  OVER (ORDER BY output_date)     AS cnt
FROM     outputs
)
,     got_rnk               AS
(
SELECT     e.output_date                    AS start_date
,     l.output_date                    AS end_date
,     e.quantity + l.running_total_quantity
- e.running_total_quantity     AS total_quantity
,     1 + l.cnt
- e.cnt                    AS total_cnt
,     RANK () OVER (ORDER BY  e.quantity + l.running_total_quantity
- e.running_total_quantity     DESC
)                         AS rnk
FROM     got_running_totals     e     -- e is earlier, l is later
JOIN     got_running_totals     l     ON   e.cnt      <= l.cnt
AND  e.quantity + l.running_total_quantity
- e.running_total_quantity
< :threshold_value
)
SELECT  start_date
,     end_date
,     total_quantity
,     total_cnt
FROM     got_rnk
WHERE     rnk     = 1
;``````
I'm sure you can do this with MODEL, too, and also with CONNECT BY. I'm not sure they'd be any more efficient, or any simpler, though.

In tje sub-query called got_rnk, we need the same formula (x + y - z) 3 times. You might want to add another sub-query just to compute that once.
... As a separate, but related problem, I'd also like to obtain the largest gap between measurements within the window.
Sorry, I don't understand this part. Post the results you want from the sample data you've already given, and an explanation of how you get those results.

Edited by: Frank Kulash on Aug 27, 2010 5:43 PM

Edited by: Frank Kulash on Aug 28, 2010 9:52 AM

As originally posted above, this is finding the group of consecutvie rows with the greatest toal quantity less than the threshold.
To find the longest time span (tha is, the greatest difference between dates) change the computation of rnk (in sub-query got_rnk) to
``````...     ,     RANK () OVER (ORDER BY l.output_date - e.output_date          DESC
)                         AS rnk``````
• ###### 2. Re: Obtain largest timespan where sum of values is below threshold
Below is from your previous post:
``````threshold '14' would give:
31-JUL-3000 14.00.00, 31-JUL-3000 22.00.00, 13, 4``````
for, threshold 14, shuldn`t the biggest time span is ? :
``31-JUL-3000 14.00.00, 01-AUG-3000 02.00.00, 13, 4``
• ###### 3. Re: Obtain largest timespan where sum of values is below threshold
Thanks.
This question lets me say "I like recursive with clause B-) "
``````with t(output_date,quantity) as(
select to_date('31-07-3000 09.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('31-07-3000 12.00.00','DD-MM-YYYY HH24.MI.SS'),100 from dual union
select to_date('31-07-3000 14.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 20.30.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 22.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('01-08-3000 02.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual),
tmp(output_date,quantity,rn) as(
select output_date,quantity,
Row_Number() over(order by output_date)
from t),
rec(rootOutput_date,output_date,rn,sumVal,cnt) as(
select output_date,output_date,rn,quantity,1
from tmp
where quantity < 9 /*threshold*/
union all
select a.rootOutput_date,b.output_date,b.rn,
a.sumVal+b.quantity,a.cnt+1
from rec a,tmp b
where a.rn+1=b.rn
and a.sumVal+b.quantity < 9 /*threshold*/)
select to_char(rootOutput_date,'YYYY/MM/DD HH24:MI') as rootOutput_date,
to_char(output_date,'YYYY/MM/DD HH24:MI') as output_date,sumVal,cnt
from (select rootOutput_date,output_date,sumVal,cnt,
max(sumVal) over() as maxSumVal
from rec)
where sumVal = maxSumVal;

rootOutput_date   output_date       sumVal  cnt
----------------  ----------------  ------  ---
3000/07/31 14:00  3000/07/31 20:30       3    3``````
• ###### 4. Re: Obtain largest timespan where sum of values is below threshold
I like Join :D

There is my article of OTN-Japan which is related Join.
http://www.oracle.com/technetwork/jp/articles/otnj-sql-image1-308625-ja.html
``````with t(output_date,quantity) as(
select to_date('31-07-3000 09.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('31-07-3000 12.00.00','DD-MM-YYYY HH24.MI.SS'),100 from dual union
select to_date('31-07-3000 14.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 20.30.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 22.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('01-08-3000 02.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual)
select to_char(staD,'YYYY/MM/DD HH24:MI') as staD,
to_char(endD,'YYYY/MM/DD HH24:MI') as endD,sumVal,cnt
from (select a.output_date as staD,b.output_date as endD,
sum(c.quantity) as sumVal,count(*) as cnt,
max(sum(c.quantity)) over() as maxSumVal
from t a,t b,t c
where a.output_date < b.output_date
and c.output_date between a.output_date and b.output_date
group by a.output_date,b.output_date
having sum(c.quantity) < 9 /*threshold*/)
where sumVal = maxSumVal;

staD              endD              sumVal  cnt
----------------  ----------------  ------  ---
3000/07/31 14:00  3000/07/31 20:30       3    3``````
• ###### 5. Re: Obtain largest timespan where sum of values is below threshold
Thanks, excellent solutions! Much better than my 48 'copy and paste's. :)

Just to clarify my initial additional request, I was asking for the largest timespan between subsequent entries. i.e. for the '3' threshold, the largest time gap is 4 hrs, from '31-07-3000 14.00.00' to '31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'. Now that you have given me a working solution for the hardest part, I can probably work this out for myself, using LEAD and a MAX analytic function.

@PhoenixBai - it depends on the (my) definition....

I just want the time difference between the first and last measurements. I guess equally, the 'largest time span' could be from

31-07-3000 12.00.01

to

31-07-3000 21.59.59

i.e. 1 second after the '100' measurement until 1 second before the '10' measurement.

Thanks,

Dan
• ###### 6. Re: Obtain largest timespan where sum of values is below threshold
Hi, Dan,

The query I posted earlier computed the total_quantity, total_cnt and timespan from two rows, including the values from each of the endpoints.
If I understand you, you want to count the timpapan between two rows, including the enpoints, bu to include the qunatity and count only from the rwos strictly between the endpoints, not counting either one.

Here's how to modify my query to do that:
``````WITH     got_running_totals     AS
(
SELECT     output_date, quantity
,     SUM (quantity) OVER (ORDER BY output_date)      AS running_total_quantity
,     ROW_NUMBER ()  OVER (ORDER BY output_date)     AS cnt
FROM     outputs
)
,     got_rnk               AS
(
SELECT     e.output_date               AS start_date
,     l.output_date               AS end_date
,     ( l.running_total_quantity                              -- Changed
- e.running_total_quantity
) - l.quantity               AS total_quantity
,     ( 1 + l.cnt                                        -- Changed
- e.cnt
) - 2                    AS total_cnt
,     RANK () OVER (ORDER BY l.output_date - e.output_date          DESC
)                    AS rnk
FROM     got_running_totals     e     -- e is earlier, l is later
JOIN     got_running_totals     l     ON   e.cnt      < l.cnt          -- Changed
AND  ( l.running_total_quantity          -- Changed
- e.running_total_quantity
) - l.quantity     <= :threshold_value
)
SELECT  start_date
,     end_date
,     total_quantity
,     total_cnt
,     (end_date - start_date) * 24     AS timespan_hours                    -- Added
FROM     got_rnk
WHERE     rnk     = 1
;``````
4 expressions were changed and one was added, as shown by the comments.
Output, for :threshold_value = 3:
```````                                           TOTAL_ TOTAL TIMESPAN
START_DATE           END_DATE             QUANTITY  _CNT   _HOURS
-------------------- -------------------- -------- ----- --------
31-JUL-3000 12.00.00 31-JUL-3000 22.00.00        3     3       10``````
As you said, you could get the same results by calling some more functions, such as LEAD, but there's no need to.
• ###### 7. Re: Obtain largest timespan where sum of values is below threshold
Hi,

I think there is some confusion here. Sorry.

I'm looking for the 'largest gap between subsequent values within the window'. i.e. The window with threshold 3 has a window from 31-JUL-3000 12.00.00 to 31-JUL-3000 20.30.00. Within that window, the largest gap between measurements is from 14.00 to 18.00 or 4 hrs. So the overall output would be:
``````                                           TOTAL_  TOTAL  MAX_GAP
START_DATE           END_DATE             QUANTITY  _CNT   _HOURS
-------------------- -------------------- -------- ----- --------
31-JUL-3000 12.00.00 31-JUL-3000 22.00.00        3     3       4``````
I think I have worked this out for myself:
``````WITH got_running_totals AS (
SELECT     output_date, quantity,
SUM (quantity)      OVER (ORDER BY output_date) AS running_total_quantity,
ROW_NUMBER ()       OVER (ORDER BY output_date) AS cnt,
(LEAD (output_date)  OVER (ORDER BY output_date) - output_date) * 24 AS hours_to_next_output
FROM     outputs
)
--select * from got_running_totals;
, got_rnk AS (
SELECT
e.output_date AS start_date,
l.output_date AS end_date,
e.quantity + l.running_total_quantity
- e.running_total_quantity AS total_quantity,
1 + l.cnt
- e.cnt AS total_cnt,
RANK () OVER (ORDER BY  e.quantity + l.running_total_quantity
- e.running_total_quantity     DESC) AS rnk,
MAX (e.hours_to_next_output) OVER (ORDER BY  e.quantity + l.running_total_quantity
- e.running_total_quantity     DESC) AS largest_gap
FROM got_running_totals e -- e is earlier, l is later
JOIN got_running_totals l ON e.cnt <= l.cnt
AND e.quantity + l.running_total_quantity
- e.running_total_quantity < :threshold_value
)
--select * from got_rnk;
SELECT start_date
,     end_date
,     total_quantity
,     total_cnt
, largest_gap
FROM     got_rnk
WHERE     rnk     = 1
;``````
Obviously, ordering is not important for MAX, and I will change this to a partition when I include my additional ID fields

Most of my previous reply, the text after '@PhoenixBai', was directed at the poster of the third response to this thread, who was asking about the window definition.

Sorry for the confusion.

Dan
• ###### 8. Re: Obtain largest timespan where sum of values is below threshold
Your additional request lets me say "I like recursive with clause B-) "
``````with t(output_date,quantity) as(
select to_date('31-07-3000 09.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('31-07-3000 12.00.00','DD-MM-YYYY HH24.MI.SS'),100 from dual union
select to_date('31-07-3000 14.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 20.30.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 22.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('01-08-3000 02.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual),
tmp(output_date,quantity,rn) as(
select output_date,quantity,
Row_Number() over(order by output_date)
from t),
rec(rootOutput_date,output_date,rn,sumVal,cnt,maxDiff) as(
select output_date,output_date,rn,quantity,1,0
from tmp
where quantity < 9 /*threshold*/
union all
select a.rootOutput_date,b.output_date,b.rn,
a.sumVal+b.quantity,a.cnt+1,
greatest(a.maxDiff,(b.output_date-a.output_date)*24)
from rec a,tmp b
where a.rn+1=b.rn
and a.sumVal+b.quantity < 9 /*threshold*/)
select to_char(rootOutput_date,'YYYY/MM/DD HH24:MI') as rootOutput_date,
to_char(output_date,'YYYY/MM/DD HH24:MI') as output_date,sumVal,cnt,maxDiff
from (select rootOutput_date,output_date,sumVal,cnt,maxDiff,
max(sumVal) over() as maxSumVal
from rec)
where sumVal = maxSumVal;

rootOutput_date   output_date       sumVal  cnt  maxDiff
----------------  ----------------  ------  ---  -------
3000/07/31 14:00  3000/07/31 20:30       3    3        4``````