Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Obtain largest timespan where sum of values is below threshold

Dan Scott GuestAug 27 2010 — edited Mar 8 2011
Hi,

I asked this question in the general forum a while ago and it was suggested that I post here too.

I'm trying to obtain the 'largest timespan where sum of values is below threshold'. More details available here:

1119232

Here's the sample data:
create table outputs (output_date date, quantity number);

insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 09.00.00', 'DD-MON-YYYY HH24.MI.SS'), 10);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 12.00.00', 'DD-MON-YYYY HH24.MI.SS'), 100);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 14.00.00', 'DD-MON-YYYY HH24.MI.SS'), 1);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 18.00.00', 'DD-MON-YYYY HH24.MI.SS'), 1);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 20.30.00', 'DD-MON-YYYY HH24.MI.SS'), 1);
insert into outputs (output_date, quantity) values (to_date ('31-JUL-3000 22.00.00', 'DD-MON-YYYY HH24.MI.SS'), 10);
insert into outputs (output_date, quantity) values (to_date ('01-AUG-3000 02.00.00', 'DD-MON-YYYY HH24.MI.SS'), 10);
The output should be (for a threshold of 9):
31-JUL-3000 14.00.00,	31-JUL-3000 20.30.00,	3,	3
The first two columns are the start and end of the window, the final 2 columns are the total within the window (below the threshold) and the number of entries. As a separate, but related problem, I'd also like to obtain the largest gap between measurements within the window.

Thanks,

Dan Scott

Comments

Frank Kulash
Hi, Dan,

Intersting problem!
danscott wrote:
Hi,

I asked this question in the general forum a while ago and it was suggested that I post here too.
Yes, this forum is where questions like this belong.
If you do have to make a duplicate post, like this, mark the original as "Answered" right away, so all your answers will be in one place. You can edit your origianl message, saying "Moved to this forum:", with a link. (Did you just do that? I see that the original message is marked as "Answered" now.)
... Here's the sample data:
Thanks for posting that! It's very helpful.

Here's one way to do what you want, using analytic fucntions:
VARIABLE	threshold_value	 NUMBER;
EXEC  :threshold_value := 9;


WITH	got_running_totals	AS
(
	SELECT	output_date, quantity
	,	SUM (quantity) OVER (ORDER BY output_date) 	AS running_total_quantity
	,	ROW_NUMBER ()  OVER (ORDER BY output_date)	AS cnt
	FROM	outputs
)
,	got_rnk			AS
(
	SELECT	e.output_date				AS start_date
	,	l.output_date				AS end_date
	,	e.quantity + l.running_total_quantity
			   - e.running_total_quantity	AS total_quantity
	,	1 + l.cnt
		  - e.cnt				AS total_cnt
	,	RANK () OVER (ORDER BY  e.quantity + l.running_total_quantity
			   			   - e.running_total_quantity	DESC
			     )			     	AS rnk
	FROM	got_running_totals	e	-- e is earlier, l is later
	JOIN	got_running_totals	l	ON   e.cnt	 <= l.cnt
						AND  e.quantity + l.running_total_quantity
			   			     		- e.running_total_quantity	
							< :threshold_value
)
SELECT  start_date
,	end_date
,	total_quantity
,	total_cnt
FROM	got_rnk
WHERE	rnk	= 1
;
I'm sure you can do this with MODEL, too, and also with CONNECT BY. I'm not sure they'd be any more efficient, or any simpler, though.

In tje sub-query called got_rnk, we need the same formula (x + y - z) 3 times. You might want to add another sub-query just to compute that once.
... As a separate, but related problem, I'd also like to obtain the largest gap between measurements within the window.
Sorry, I don't understand this part. Post the results you want from the sample data you've already given, and an explanation of how you get those results.

Edited by: Frank Kulash on Aug 27, 2010 5:43 PM

Edited by: Frank Kulash on Aug 28, 2010 9:52 AM

As originally posted above, this is finding the group of consecutvie rows with the greatest toal quantity less than the threshold.
To find the longest time span (tha is, the greatest difference between dates) change the computation of rnk (in sub-query got_rnk) to
...	,	RANK () OVER (ORDER BY l.output_date - e.output_date		DESC
			     )			     	AS rnk
672680
Below is from your previous post:
threshold '14' would give:
31-JUL-3000 14.00.00, 31-JUL-3000 22.00.00, 13, 4
for, threshold 14, shuldn`t the biggest time span is ? :
31-JUL-3000 14.00.00, 01-AUG-3000 02.00.00, 13, 4
Aketi Jyuuzou
Thanks.
This question lets me say "I like recursive with clause B-) "
with t(output_date,quantity) as(
select to_date('31-07-3000 09.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('31-07-3000 12.00.00','DD-MM-YYYY HH24.MI.SS'),100 from dual union
select to_date('31-07-3000 14.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 20.30.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 22.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('01-08-3000 02.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual),
tmp(output_date,quantity,rn) as(
select output_date,quantity,
Row_Number() over(order by output_date)
  from t),
rec(rootOutput_date,output_date,rn,sumVal,cnt) as(
select output_date,output_date,rn,quantity,1
  from tmp
 where quantity < 9 /*threshold*/
union all
select a.rootOutput_date,b.output_date,b.rn,
a.sumVal+b.quantity,a.cnt+1
  from rec a,tmp b
 where a.rn+1=b.rn
   and a.sumVal+b.quantity < 9 /*threshold*/)
select to_char(rootOutput_date,'YYYY/MM/DD HH24:MI') as rootOutput_date,
to_char(output_date,'YYYY/MM/DD HH24:MI') as output_date,sumVal,cnt
from (select rootOutput_date,output_date,sumVal,cnt,
      max(sumVal) over() as maxSumVal
      from rec)
where sumVal = maxSumVal;

rootOutput_date   output_date       sumVal  cnt
----------------  ----------------  ------  ---
3000/07/31 14:00  3000/07/31 20:30       3    3
Aketi Jyuuzou
I like Join :D

There is my article of OTN-Japan which is related Join.
http://www.oracle.com/technetwork/jp/articles/otnj-sql-image1-308625-ja.html
with t(output_date,quantity) as(
select to_date('31-07-3000 09.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('31-07-3000 12.00.00','DD-MM-YYYY HH24.MI.SS'),100 from dual union
select to_date('31-07-3000 14.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 20.30.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 22.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('01-08-3000 02.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual)
select to_char(staD,'YYYY/MM/DD HH24:MI') as staD,
to_char(endD,'YYYY/MM/DD HH24:MI') as endD,sumVal,cnt
from (select a.output_date as staD,b.output_date as endD,
      sum(c.quantity) as sumVal,count(*) as cnt,
      max(sum(c.quantity)) over() as maxSumVal
        from t a,t b,t c
       where a.output_date < b.output_date
         and c.output_date between a.output_date and b.output_date
      group by a.output_date,b.output_date
      having sum(c.quantity) < 9 /*threshold*/)
where sumVal = maxSumVal;

staD              endD              sumVal  cnt
----------------  ----------------  ------  ---
3000/07/31 14:00  3000/07/31 20:30       3    3
Dan Scott Guest
Thanks, excellent solutions! Much better than my 48 'copy and paste's. :)

Just to clarify my initial additional request, I was asking for the largest timespan between subsequent entries. i.e. for the '3' threshold, the largest time gap is 4 hrs, from '31-07-3000 14.00.00' to '31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'. Now that you have given me a working solution for the hardest part, I can probably work this out for myself, using LEAD and a MAX analytic function.

@PhoenixBai - it depends on the (my) definition....

I just want the time difference between the first and last measurements. I guess equally, the 'largest time span' could be from

31-07-3000 12.00.01

to

31-07-3000 21.59.59

i.e. 1 second after the '100' measurement until 1 second before the '10' measurement.

Thanks,

Dan
Frank Kulash
Hi, Dan,

The query I posted earlier computed the total_quantity, total_cnt and timespan from two rows, including the values from each of the endpoints.
If I understand you, you want to count the timpapan between two rows, including the enpoints, bu to include the qunatity and count only from the rwos strictly between the endpoints, not counting either one.

Here's how to modify my query to do that:
WITH	got_running_totals	AS
(
	SELECT	output_date, quantity
	,	SUM (quantity) OVER (ORDER BY output_date) 	AS running_total_quantity
	,	ROW_NUMBER ()  OVER (ORDER BY output_date)	AS cnt
	FROM	outputs
)
,	got_rnk			AS
(
	SELECT	e.output_date			AS start_date
	,	l.output_date			AS end_date
	,	( l.running_total_quantity						-- Changed
		- e.running_total_quantity
		) - l.quantity			AS total_quantity
	,	( 1 + l.cnt								-- Changed
		    - e.cnt
		) - 2				AS total_cnt
	,	RANK () OVER (ORDER BY l.output_date - e.output_date		DESC
			     )		     	AS rnk
	FROM	got_running_totals	e	-- e is earlier, l is later
	JOIN	got_running_totals	l	ON   e.cnt	 < l.cnt		-- Changed
						AND  ( l.running_total_quantity		-- Changed
			   			     - e.running_total_quantity	
						     ) - l.quantity	<= :threshold_value
)
SELECT  start_date
,	end_date
,	total_quantity
,	total_cnt
,	(end_date - start_date) * 24	AS timespan_hours				-- Added
FROM	got_rnk
WHERE	rnk	= 1
;
4 expressions were changed and one was added, as shown by the comments.
Output, for :threshold_value = 3:
`                                           TOTAL_ TOTAL TIMESPAN
START_DATE           END_DATE             QUANTITY  _CNT   _HOURS
-------------------- -------------------- -------- ----- --------
31-JUL-3000 12.00.00 31-JUL-3000 22.00.00        3     3       10
As you said, you could get the same results by calling some more functions, such as LEAD, but there's no need to.
Dan Scott Guest
Hi,

I think there is some confusion here. Sorry.

I'm looking for the 'largest gap between subsequent values within the window'. i.e. The window with threshold 3 has a window from 31-JUL-3000 12.00.00 to 31-JUL-3000 20.30.00. Within that window, the largest gap between measurements is from 14.00 to 18.00 or 4 hrs. So the overall output would be:
                                           TOTAL_  TOTAL  MAX_GAP
START_DATE           END_DATE             QUANTITY  _CNT   _HOURS
-------------------- -------------------- -------- ----- --------
31-JUL-3000 12.00.00 31-JUL-3000 22.00.00        3     3       4
I think I have worked this out for myself:
WITH got_running_totals AS (
  SELECT	output_date, quantity,
    SUM (quantity)      OVER (ORDER BY output_date) AS running_total_quantity,
    ROW_NUMBER ()       OVER (ORDER BY output_date) AS cnt,
    (LEAD (output_date)  OVER (ORDER BY output_date) - output_date) * 24 AS hours_to_next_output
  FROM	outputs
)
--select * from got_running_totals;
, got_rnk AS (
  SELECT
    e.output_date AS start_date,
    l.output_date AS end_date,
    e.quantity + l.running_total_quantity
      - e.running_total_quantity AS total_quantity,
    1 + l.cnt
      - e.cnt AS total_cnt,
    RANK () OVER (ORDER BY  e.quantity + l.running_total_quantity
      - e.running_total_quantity	DESC) AS rnk,
    MAX (e.hours_to_next_output) OVER (ORDER BY  e.quantity + l.running_total_quantity
      - e.running_total_quantity	DESC) AS largest_gap
  FROM got_running_totals e -- e is earlier, l is later
  JOIN got_running_totals l ON e.cnt <= l.cnt
   AND e.quantity + l.running_total_quantity
          - e.running_total_quantity < :threshold_value
)
--select * from got_rnk;
SELECT start_date
,	end_date
,	total_quantity
,	total_cnt
, largest_gap
FROM	got_rnk
WHERE	rnk	= 1
;
Obviously, ordering is not important for MAX, and I will change this to a partition when I include my additional ID fields

Most of my previous reply, the text after '@PhoenixBai', was directed at the poster of the third response to this thread, who was asking about the window definition.

Sorry for the confusion.

Dan
Aketi Jyuuzou
Your additional request lets me say "I like recursive with clause B-) "
with t(output_date,quantity) as(
select to_date('31-07-3000 09.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('31-07-3000 12.00.00','DD-MM-YYYY HH24.MI.SS'),100 from dual union
select to_date('31-07-3000 14.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 18.00.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 20.30.00','DD-MM-YYYY HH24.MI.SS'),  1 from dual union
select to_date('31-07-3000 22.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual union
select to_date('01-08-3000 02.00.00','DD-MM-YYYY HH24.MI.SS'), 10 from dual),
tmp(output_date,quantity,rn) as(
select output_date,quantity,
Row_Number() over(order by output_date)
  from t),
rec(rootOutput_date,output_date,rn,sumVal,cnt,maxDiff) as(
select output_date,output_date,rn,quantity,1,0
  from tmp
 where quantity < 9 /*threshold*/
union all
select a.rootOutput_date,b.output_date,b.rn,
a.sumVal+b.quantity,a.cnt+1,
greatest(a.maxDiff,(b.output_date-a.output_date)*24)
  from rec a,tmp b
 where a.rn+1=b.rn
   and a.sumVal+b.quantity < 9 /*threshold*/)
select to_char(rootOutput_date,'YYYY/MM/DD HH24:MI') as rootOutput_date,
to_char(output_date,'YYYY/MM/DD HH24:MI') as output_date,sumVal,cnt,maxDiff
from (select rootOutput_date,output_date,sumVal,cnt,maxDiff,
      max(sumVal) over() as maxSumVal
      from rec)
where sumVal = maxSumVal;

rootOutput_date   output_date       sumVal  cnt  maxDiff
----------------  ----------------  ------  ---  -------
3000/07/31 14:00  3000/07/31 20:30       3    3        4
1 - 8
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Sep 27 2010
Added on Aug 27 2010
8 comments
1,712 views