Discussions
Categories
 197K All Categories
 2.5K Data
 546 Big Data Appliance
 1.9K Data Science
 450.8K Databases
 221.9K General Database Discussions
 3.8K Java and JavaScript in the Database
 31 Multilingual Engine
 552 MySQL Community Space
 479 NoSQL Database
 7.9K Oracle Database Express Edition (XE)
 3.1K ORDS, SODA & JSON in the Database
 556 SQLcl
 4K SQL Developer Data Modeler
 187.2K SQL & PL/SQL
 21.4K SQL Developer
 296.4K Development
 17 Developer Projects
 139 Programming Languages
 293.1K Development Tools
 110 DevOps
 3.1K QA/Testing
 646.1K Java
 28 Java Learning Subscription
 37K Database Connectivity
 161 Java Community Process
 105 Java 25
 22.1K Java APIs
 138.2K Java Development Tools
 165.3K Java EE (Java Enterprise Edition)
 19 Java Essentials
 162 Java 8 Questions
 86K Java Programming
 81 Java Puzzle Ball
 65.1K New To Java
 1.7K Training / Learning / Certification
 13.8K Java HotSpot Virtual Machine
 94.3K Java SE
 13.8K Java Security
 205 Java User Groups
 24 JavaScript  Nashorn
 Programs
 475 LiveLabs
 39 Workshops
 10.2K Software
 6.7K Berkeley DB Family
 3.5K JHeadstart
 5.7K Other Languages
 2.3K Chinese
 175 Deutsche Oracle Community
 1.1K Español
 1.9K Japanese
 233 Portuguese
Median function in 10gR2
756666
Member Posts: 2
I am seeing some unexpected behavior using the median function applied to a set of distinct values returned by a subquery. I didnt find an explanation by searching the web; hence the question here. These queries are on a 10gR2 database.
In both cases below, the select distinct should return 1,2,3,4,5; for a median of 3.
However the median is different when the 1,2 are repeated
This query returns 2 as median 
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob
FROM dual
UNION ALL
SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob FROM dual) sub);
This query returns 3 as the median.
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob FROM dual) sub);
What am I missing about the median or the distinct function to cause this different behavior? My expectation was to get the same median  3  in both cases since the subquery that the median uses returns the same values in both.
Thanks in advance for any insight on this.
In both cases below, the select distinct should return 1,2,3,4,5; for a median of 3.
However the median is different when the 1,2 are repeated
This query returns 2 as median 
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob
FROM dual
UNION ALL
SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob FROM dual) sub);
This query returns 3 as the median.
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob FROM dual) sub);
What am I missing about the median or the distinct function to cause this different behavior? My expectation was to get the same median  3  in both cases since the subquery that the median uses returns the same values in both.
Thanks in advance for any insight on this.
Answers

Hi,
Welcome to the forum!
Sorry, I can't explain that. I agree with you; it's odd. It seems like the first query should return 3.
The analytic MEDIAN function does:SELECT DISTINCT MEDIAN (prob) OVER (PARTITION BY 1) FROM ( SELECT DISTINCT (prob) AS prob FROM ( SELECT 1 AS prob FROM dual UNION ALL SELECT 2 AS prob FROM dual UNION ALL SELECT 3 AS prob FROM dual UNION ALL SELECT 4 AS prob FROM dual UNION ALL SELECT 5 AS prob FROM dual UNION ALL SELECT 1 AS prob FROM dual UNION ALL SELECT 2 AS prob FROM dual ) sub );
It also retuns 3 if you do a GROUP BY instead of SELECT DISTINCT in the anonymous inline view:SELECT MEDIAN (prob) FROM ( SELECT prob FROM sub GROUP BY prob );
Edited by: Frank Kulash on Feb 26, 2010 3:05 PM 
Oops, sorry I didn't notice the DISTINCT in the subquery.
isotope
Edited by: isotope on Feb 26, 2010 12:03 PM 
Thanks Frank.
Using the analytic median gives me the solution to fix this issue. 
My guess would be that it is a bug, not sure if it is a known one, but I can reproduce it in 11.1.0.7.
From the documentation:
"The result of MEDIAN is computed by first ordering the rows. Using N as the number of rows in the group, Oracle calculates the row number (RN) of interest with the formula RN = (1 + (0.5*(N1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN)."
What I think is going on is that the distinct causes a hash unique to be performed, but somewhere median (or the optimizer) is assuming that the rows are already ordered because of the distinct so is not doing its own sort.SQL> SELECT distinct prob 2 FROM (SELECT 1 AS prob FROM dual UNION ALL 3 SELECT 2 FROM dual UNION ALL 4 SELECT 3 FROM dual UNION ALL 5 SELECT 4 FROM dual UNION ALL 6 SELECT 5 FROM dual UNION ALL 7 SELECT 1 FROM dual UNION ALL 8 SELECT 2 FROM dual); PROB  5 1 2 3 4
So, it correctly calculates RN as 3, but the value at row 3 is 2.
The analytic version probably forces a sort, and in the group by version, median (or the optimizer) probably recognizes that a hash algorithm would be used for the grouping. 
I think it's correct, actually. I'm not going by the formula quoted from Oracle docs below, I'm going by the general mathematical formula posted [http://easycalculation.com/statistics/learnmedian.phphere.] (What the hell am I doing wrong with links? They aren't links!
Anyway. When the count of numbers is odd, the formula is to take the (n+1)/2 number of the ordered sequence.
Sequence 1:1,2,3,4,5,1,2 1,1,2,2,3,4,5 (7+1)/2=4. # at pos. 4 = 2
(7+1)/2 = 8/2 = 4. The 4th number in the ordered set it 2. Which is what you got.
Sequence 2:1,2,3,4,5 1,2,3,4,5 (5+1)/2 = 3. # at pos. 3 = 3
(5+1)/2 = 6/2 = 3. The 3rd number in the ordered set happens to be 3. Which is also what you got.
So it works as mathematically expected, no? 
So it works as mathematically expected, no?No, because the Op is executing a DISTINCT before the MEDIAN, hence the set of values on which Oracle has to calculate the MEDIAN is
1,2,3,4,5
in both cases...
And the result must be 3 in both cases...
Max
http://oracleitalia.wordpress.com 
Ah! You are correct! Which I guess makes me, "less than correct". But if you take the median ...

Hi,
Another interesting querySELECT median(prob) FROM (SELECT DISTINCT prob AS prob FROM (SELECT 1 AS prob FROM dual UNION SELECT 2 AS prob FROM dual UNION SELECT 3 AS prob FROM dual UNION SELECT 4 AS prob FROM dual UNION SELECT 5 AS prob FROM dual UNION SELECT 1 AS prob FROM dual UNION SELECT 2 AS prob FROM dual) ORDER BY prob)
Remove UNION ALL and you get 3 again
Cheers!!!
Bhushan 
Looks indeed as a bug: Once you materialize the inner view you get the expected result:
SQL> with t as (select /*+ materialize */ distinct * from (select 1 as prob from dual union all select 2 as prob from dual union all select 3 as prob from dual union all select 4 as prob from dual union all select 5 as prob from dual union all select 1 as prob from dual union all select 2 as prob from dual)) select median (prob) from t / MEDIAN(PROB)  3 1 row selected.
This discussion has been closed.