Discussions
Categories
- 197K All Categories
- 2.5K Data
- 546 Big Data Appliance
- 1.9K Data Science
- 450.8K Databases
- 221.9K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 552 MySQL Community Space
- 479 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.1K ORDS, SODA & JSON in the Database
- 556 SQLcl
- 4K SQL Developer Data Modeler
- 187.2K SQL & PL/SQL
- 21.4K SQL Developer
- 296.4K Development
- 17 Developer Projects
- 139 Programming Languages
- 293.1K Development Tools
- 111 DevOps
- 3.1K QA/Testing
- 646.1K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 161 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.2K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 19 Java Essentials
- 162 Java 8 Questions
- 86K Java Programming
- 81 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 205 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 475 LiveLabs
- 39 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 175 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 233 Portuguese
Median function in 10gR2

756666
Member Posts: 2
I am seeing some unexpected behavior using the median function applied to a set of distinct values returned by a subquery. I didnt find an explanation by searching the web; hence the question here. These queries are on a 10gR2 database.
In both cases below, the select distinct should return 1,2,3,4,5; for a median of 3.
However the median is different when the 1,2 are repeated
This query returns 2 as median -
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob
FROM dual
UNION ALL
SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob FROM dual) sub);
This query returns 3 as the median.
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob FROM dual) sub);
What am I missing about the median or the distinct function to cause this different behavior? My expectation was to get the same median - 3 - in both cases since the subquery that the median uses returns the same values in both.
Thanks in advance for any insight on this.
In both cases below, the select distinct should return 1,2,3,4,5; for a median of 3.
However the median is different when the 1,2 are repeated
This query returns 2 as median -
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob
FROM dual
UNION ALL
SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob FROM dual) sub);
This query returns 3 as the median.
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob FROM dual) sub);
What am I missing about the median or the distinct function to cause this different behavior? My expectation was to get the same median - 3 - in both cases since the subquery that the median uses returns the same values in both.
Thanks in advance for any insight on this.
Answers
-
Hi,
Welcome to the forum!
Sorry, I can't explain that. I agree with you; it's odd. It seems like the first query should return 3.
The analytic MEDIAN function does:SELECT DISTINCT MEDIAN (prob) OVER (PARTITION BY 1) FROM ( SELECT DISTINCT (prob) AS prob FROM ( SELECT 1 AS prob FROM dual UNION ALL SELECT 2 AS prob FROM dual UNION ALL SELECT 3 AS prob FROM dual UNION ALL SELECT 4 AS prob FROM dual UNION ALL SELECT 5 AS prob FROM dual UNION ALL SELECT 1 AS prob FROM dual UNION ALL SELECT 2 AS prob FROM dual ) sub );
It also retuns 3 if you do a GROUP BY instead of SELECT DISTINCT in the anonymous in-line view:SELECT MEDIAN (prob) FROM ( SELECT prob FROM sub GROUP BY prob );
Edited by: Frank Kulash on Feb 26, 2010 3:05 PM -
Oops, sorry I didn't notice the DISTINCT in the subquery.
isotope
Edited by: isotope on Feb 26, 2010 12:03 PM -
Thanks Frank.
Using the analytic median gives me the solution to fix this issue. -
My guess would be that it is a bug, not sure if it is a known one, but I can reproduce it in 11.1.0.7.
From the documentation:
"The result of MEDIAN is computed by first ordering the rows. Using N as the number of rows in the group, Oracle calculates the row number (RN) of interest with the formula RN = (1 + (0.5*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN)."
What I think is going on is that the distinct causes a hash unique to be performed, but somewhere median (or the optimizer) is assuming that the rows are already ordered because of the distinct so is not doing its own sort.SQL> SELECT distinct prob 2 FROM (SELECT 1 AS prob FROM dual UNION ALL 3 SELECT 2 FROM dual UNION ALL 4 SELECT 3 FROM dual UNION ALL 5 SELECT 4 FROM dual UNION ALL 6 SELECT 5 FROM dual UNION ALL 7 SELECT 1 FROM dual UNION ALL 8 SELECT 2 FROM dual); PROB ---------- 5 1 2 3 4
So, it correctly calculates RN as 3, but the value at row 3 is 2.
The analytic version probably forces a sort, and in the group by version, median (or the optimizer) probably recognizes that a hash algorithm would be used for the grouping. -
I think it's correct, actually. I'm not going by the formula quoted from Oracle docs below, I'm going by the general mathematical formula posted [http://easycalculation.com/statistics/learn-median.php|here.] (What the hell am I doing wrong with links? They aren't links!
Anyway. When the count of numbers is odd, the formula is to take the (n+1)/2 number of the ordered sequence.
Sequence 1:1,2,3,4,5,1,2 1,1,2,2,3,4,5 (7+1)/2=4. # at pos. 4 = 2
(7+1)/2 = 8/2 = 4. The 4th number in the ordered set it 2. Which is what you got.
Sequence 2:1,2,3,4,5 1,2,3,4,5 (5+1)/2 = 3. # at pos. 3 = 3
(5+1)/2 = 6/2 = 3. The 3rd number in the ordered set happens to be 3. Which is also what you got.
So it works as mathematically expected, no? -
So it works as mathematically expected, no?No, because the Op is executing a DISTINCT before the MEDIAN, hence the set of values on which Oracle has to calculate the MEDIAN is
1,2,3,4,5
in both cases...
And the result must be 3 in both cases...
Max
http://oracleitalia.wordpress.com -
Ah! You are correct! Which I guess makes me, "less than correct". But if you take the median ...
-
Hi,
Another interesting querySELECT median(prob) FROM (SELECT DISTINCT prob AS prob FROM (SELECT 1 AS prob FROM dual UNION SELECT 2 AS prob FROM dual UNION SELECT 3 AS prob FROM dual UNION SELECT 4 AS prob FROM dual UNION SELECT 5 AS prob FROM dual UNION SELECT 1 AS prob FROM dual UNION SELECT 2 AS prob FROM dual) ORDER BY prob)
Remove UNION ALL and you get 3 again
Cheers!!!
Bhushan -
Looks indeed as a bug: Once you materialize the inner view you get the expected result:
SQL> with t as (select /*+ materialize */ distinct * from (select 1 as prob from dual union all select 2 as prob from dual union all select 3 as prob from dual union all select 4 as prob from dual union all select 5 as prob from dual union all select 1 as prob from dual union all select 2 as prob from dual)) select median (prob) from t / MEDIAN(PROB) ------------ 3 1 row selected.
This discussion has been closed.