## Forum Stats

• 3,836,963 Users
• 2,262,207 Discussions

Discussions

# Median function in 10gR2

Member Posts: 2
edited Feb 26, 2010 5:46PM
I am seeing some unexpected behavior using the median function applied to a set of distinct values returned by a subquery. I didnt find an explanation by searching the web; hence the question here. These queries are on a 10gR2 database.

In both cases below, the select distinct should return 1,2,3,4,5; for a median of 3.

However the median is different when the 1,2 are repeated

This query returns 2 as median -

SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob
FROM dual
UNION ALL
SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob FROM dual) sub);

This query returns 3 as the median.

SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob FROM dual) sub);

What am I missing about the median or the distinct function to cause this different behavior? My expectation was to get the same median - 3 - in both cases since the subquery that the median uses returns the same values in both.

Thanks in advance for any insight on this.

• Member, Moderator Posts: 42,209 Red Diamond
edited Feb 26, 2010 3:06PM
Hi,

Welcome to the forum!

Sorry, I can't explain that. I agree with you; it's odd. It seems like the first query should return 3.
The analytic MEDIAN function does:
```SELECT DISTINCT         MEDIAN (prob) OVER (PARTITION BY 1)
FROM 	(   SELECT DISTINCT (prob) AS prob
FROM   (   SELECT  1 AS prob	FROM dual	UNION ALL
SELECT  2 AS prob	FROM dual	UNION ALL
SELECT  3 AS prob	FROM dual	UNION ALL
SELECT  4 AS prob	FROM dual	UNION ALL
SELECT  5 AS prob	FROM dual	UNION ALL
SELECT  1 AS prob	FROM dual	UNION ALL
SELECT  2 AS prob 	FROM dual
) sub
);```
It also retuns 3 if you do a GROUP BY instead of SELECT DISTINCT in the anonymous in-line view:
```SELECT	MEDIAN (prob)
FROM	(   SELECT    prob
FROM      sub
GROUP BY  prob
);```
Edited by: Frank Kulash on Feb 26, 2010 3:05 PM
• Member Posts: 748 Bronze Badge
edited Feb 26, 2010 3:04PM
Oops, sorry I didn't notice the DISTINCT in the subquery.

isotope

Edited by: isotope on Feb 26, 2010 12:03 PM
• Member Posts: 2
Thanks Frank.
Using the analytic median gives me the solution to fix this issue.
• Member Posts: 8,567 Bronze Crown
My guess would be that it is a bug, not sure if it is a known one, but I can reproduce it in 11.1.0.7.

From the documentation:

"The result of MEDIAN is computed by first ordering the rows. Using N as the number of rows in the group, Oracle calculates the row number (RN) of interest with the formula RN = (1 + (0.5*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN)."

What I think is going on is that the distinct causes a hash unique to be performed, but somewhere median (or the optimizer) is assuming that the rows are already ordered because of the distinct so is not doing its own sort.
```SQL> SELECT distinct prob
2        FROM (SELECT 1 AS prob FROM dual UNION ALL
3              SELECT 2 FROM dual UNION ALL
4              SELECT 3 FROM dual UNION ALL
5              SELECT 4 FROM dual UNION ALL
6              SELECT 5 FROM dual UNION ALL
7              SELECT 1 FROM dual UNION ALL
8              SELECT 2 FROM dual);

PROB
----------
5
1
2
3
4```
So, it correctly calculates RN as 3, but the value at row 3 is 2.

The analytic version probably forces a sort, and in the group by version, median (or the optimizer) probably recognizes that a hash algorithm would be used for the grouping.
• Member Posts: 1,490
edited Feb 26, 2010 5:21PM
I think it's correct, actually. I'm not going by the formula quoted from Oracle docs below, I'm going by the general mathematical formula posted [http://easycalculation.com/statistics/learn-median.php|here.] (What the hell am I doing wrong with links? They aren't links!

Anyway. When the count of numbers is odd, the formula is to take the (n+1)/2 number of the ordered sequence.

Sequence 1:
`1,2,3,4,5,1,2         1,1,2,2,3,4,5   (7+1)/2=4.  # at pos. 4 = 2`
(7+1)/2 = 8/2 = 4. The 4th number in the ordered set it 2. Which is what you got.

Sequence 2:
`1,2,3,4,5               1,2,3,4,5      (5+1)/2 = 3.  # at pos. 3 = 3`
(5+1)/2 = 6/2 = 3. The 3rd number in the ordered set happens to be 3. Which is also what you got.

So it works as mathematically expected, no?
• Member Posts: 2,087
So it works as mathematically expected, no?
No, because the Op is executing a DISTINCT before the MEDIAN, hence the set of values on which Oracle has to calculate the MEDIAN is

1,2,3,4,5

in both cases...

And the result must be 3 in both cases...

Max
http://oracleitalia.wordpress.com
• Member Posts: 1,490
Ah! You are correct! Which I guess makes me, "less than correct". But if you take the median ...
• Member Posts: 1,386
Hi,
Another interesting query
```SELECT median(prob)
FROM (SELECT DISTINCT prob AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION
SELECT 2 AS prob
FROM dual
UNION
SELECT 3 AS prob
FROM dual
UNION
SELECT 4 AS prob
FROM dual
UNION
SELECT 5 AS prob
FROM dual
UNION
SELECT 1 AS prob
FROM dual
UNION
SELECT 2 AS prob FROM dual) ORDER BY prob)```
Remove UNION ALL and you get 3 again

Cheers!!!
Bhushan
• Member Posts: 8,424 Bronze Crown
Looks indeed as a bug: Once you materialize the inner view you get the expected result:
```SQL> with t as (select /*+ materialize */
distinct *
from (select 1 as prob from dual
union all
select 2 as prob from dual
union all
select 3 as prob from dual
union all
select 4 as prob from dual
union all
select 5 as prob from dual
union all
select 1 as prob from dual
union all
select 2 as prob from dual))
select median (prob)
from t
/
MEDIAN(PROB)
------------
3
1 row selected.```
This discussion has been closed.