Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Median function in 10gR2

756666Feb 26 2010 — edited Feb 26 2010
I am seeing some unexpected behavior using the median function applied to a set of distinct values returned by a subquery. I didnt find an explanation by searching the web; hence the question here. These queries are on a 10gR2 database.

In both cases below, the select distinct should return 1,2,3,4,5; for a median of 3.

However the median is different when the 1,2 are repeated

This query returns 2 as median -

SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob
FROM dual
UNION ALL
SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob FROM dual) sub);

This query returns 3 as the median.

SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob FROM dual) sub);

What am I missing about the median or the distinct function to cause this different behavior? My expectation was to get the same median - 3 - in both cases since the subquery that the median uses returns the same values in both.

Thanks in advance for any insight on this.

Comments

Frank Kulash
Hi,

Welcome to the forum!

Sorry, I can't explain that. I agree with you; it's odd. It seems like the first query should return 3.
The analytic MEDIAN function does:
SELECT DISTINCT         MEDIAN (prob) OVER (PARTITION BY 1)
FROM 	(   SELECT DISTINCT (prob) AS prob
	    FROM   (   SELECT  1 AS prob	FROM dual	UNION ALL
	    	       SELECT  2 AS prob	FROM dual	UNION ALL
		       SELECT  3 AS prob	FROM dual	UNION ALL
		       SELECT  4 AS prob	FROM dual	UNION ALL
		       SELECT  5 AS prob	FROM dual	UNION ALL
		       SELECT  1 AS prob	FROM dual	UNION ALL
		       SELECT  2 AS prob 	FROM dual
		   ) sub
	);
It also retuns 3 if you do a GROUP BY instead of SELECT DISTINCT in the anonymous in-line view:
SELECT	MEDIAN (prob)
FROM	(   SELECT    prob
	    FROM      sub
	    GROUP BY  prob
	);
Edited by: Frank Kulash on Feb 26, 2010 3:05 PM
isotope
Oops, sorry I didn't notice the DISTINCT in the subquery.

isotope

Edited by: isotope on Feb 26, 2010 12:03 PM
756666
Thanks Frank.
Using the analytic median gives me the solution to fix this issue.
John Spencer
My guess would be that it is a bug, not sure if it is a known one, but I can reproduce it in 11.1.0.7.

From the documentation:

"The result of MEDIAN is computed by first ordering the rows. Using N as the number of rows in the group, Oracle calculates the row number (RN) of interest with the formula RN = (1 + (0.5*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN)."

What I think is going on is that the distinct causes a hash unique to be performed, but somewhere median (or the optimizer) is assuming that the rows are already ordered because of the distinct so is not doing its own sort.
SQL> SELECT distinct prob
  2        FROM (SELECT 1 AS prob FROM dual UNION ALL
  3              SELECT 2 FROM dual UNION ALL
  4              SELECT 3 FROM dual UNION ALL
  5              SELECT 4 FROM dual UNION ALL
  6              SELECT 5 FROM dual UNION ALL
  7              SELECT 1 FROM dual UNION ALL
  8              SELECT 2 FROM dual);
 
      PROB
----------
         5
         1
         2
         3
         4
So, it correctly calculates RN as 3, but the value at row 3 is 2.

The analytic version probably forces a sort, and in the group by version, median (or the optimizer) probably recognizes that a hash algorithm would be used for the grouping.
Gaff
I think it's correct, actually. I'm not going by the formula quoted from Oracle docs below, I'm going by the general mathematical formula posted [http://easycalculation.com/statistics/learn-median.php|here.] (What the hell am I doing wrong with links? They aren't links!

Anyway. When the count of numbers is odd, the formula is to take the (n+1)/2 number of the ordered sequence.

Sequence 1:
1,2,3,4,5,1,2         1,1,2,2,3,4,5   (7+1)/2=4.  # at pos. 4 = 2
(7+1)/2 = 8/2 = 4. The 4th number in the ordered set it 2. Which is what you got.





Sequence 2:
1,2,3,4,5               1,2,3,4,5      (5+1)/2 = 3.  # at pos. 3 = 3
(5+1)/2 = 6/2 = 3. The 3rd number in the ordered set happens to be 3. Which is also what you got.


So it works as mathematically expected, no?
730428
So it works as mathematically expected, no?
No, because the Op is executing a DISTINCT before the MEDIAN, hence the set of values on which Oracle has to calculate the MEDIAN is

1,2,3,4,5

in both cases...

And the result must be 3 in both cases...

Max
http://oracleitalia.wordpress.com
Gaff
Ah! You are correct! Which I guess makes me, "less than correct". But if you take the median ...
728534
Hi,
Another interesting query
SELECT median(prob)
FROM (SELECT DISTINCT prob AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION 
SELECT 2 AS prob
FROM dual
UNION 
SELECT 3 AS prob
FROM dual
UNION 
SELECT 4 AS prob
FROM dual
UNION 
SELECT 5 AS prob
FROM dual
UNION 
SELECT 1 AS prob
FROM dual
UNION 
SELECT 2 AS prob FROM dual) ORDER BY prob)
Remove UNION ALL and you get 3 again :)

Cheers!!!
Bhushan
MichaelS
Looks indeed as a bug: Once you materialize the inner view you get the expected result:
SQL> with t as (select /*+ materialize */
                 distinct *
             from (select 1 as prob from dual
                   union all
                   select 2 as prob from dual
                   union all
                   select 3 as prob from dual
                   union all
                   select 4 as prob from dual
                   union all
                   select 5 as prob from dual
                   union all
                   select 1 as prob from dual
                   union all
                   select 2 as prob from dual))
select median (prob)
  from t
/
MEDIAN(PROB)
------------
           3
1 row selected.
1 - 9
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Mar 26 2010
Added on Feb 26 2010
9 comments
1,878 views