Discussions
Categories
- 196.8K All Categories
- 2.2K Data
- 238 Big Data Appliance
- 1.9K Data Science
- 450.2K Databases
- 221.7K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 550 MySQL Community Space
- 478 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3K ORDS, SODA & JSON in the Database
- 544 SQLcl
- 4K SQL Developer Data Modeler
- 187K SQL & PL/SQL
- 21.3K SQL Developer
- 295.8K Development
- 17 Developer Projects
- 138 Programming Languages
- 292.5K Development Tools
- 107 DevOps
- 3.1K QA/Testing
- 646K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 154 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.1K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 17 Java Essentials
- 160 Java 8 Questions
- 86K Java Programming
- 80 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 204 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 436 LiveLabs
- 38 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 171 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 232 Portuguese
Median function in 10gR2

756666
Member Posts: 2
I am seeing some unexpected behavior using the median function applied to a set of distinct values returned by a subquery. I didnt find an explanation by searching the web; hence the question here. These queries are on a 10gR2 database.
In both cases below, the select distinct should return 1,2,3,4,5; for a median of 3.
However the median is different when the 1,2 are repeated
This query returns 2 as median -
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob
FROM dual
UNION ALL
SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob FROM dual) sub);
This query returns 3 as the median.
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob FROM dual) sub);
What am I missing about the median or the distinct function to cause this different behavior? My expectation was to get the same median - 3 - in both cases since the subquery that the median uses returns the same values in both.
Thanks in advance for any insight on this.
In both cases below, the select distinct should return 1,2,3,4,5; for a median of 3.
However the median is different when the 1,2 are repeated
This query returns 2 as median -
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob
FROM dual
UNION ALL
SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob FROM dual) sub);
This query returns 3 as the median.
SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob FROM dual) sub);
What am I missing about the median or the distinct function to cause this different behavior? My expectation was to get the same median - 3 - in both cases since the subquery that the median uses returns the same values in both.
Thanks in advance for any insight on this.
Answers
-
Hi,
Welcome to the forum!
Sorry, I can't explain that. I agree with you; it's odd. It seems like the first query should return 3.
The analytic MEDIAN function does:SELECT DISTINCT MEDIAN (prob) OVER (PARTITION BY 1) FROM ( SELECT DISTINCT (prob) AS prob FROM ( SELECT 1 AS prob FROM dual UNION ALL SELECT 2 AS prob FROM dual UNION ALL SELECT 3 AS prob FROM dual UNION ALL SELECT 4 AS prob FROM dual UNION ALL SELECT 5 AS prob FROM dual UNION ALL SELECT 1 AS prob FROM dual UNION ALL SELECT 2 AS prob FROM dual ) sub );
It also retuns 3 if you do a GROUP BY instead of SELECT DISTINCT in the anonymous in-line view:SELECT MEDIAN (prob) FROM ( SELECT prob FROM sub GROUP BY prob );
Edited by: Frank Kulash on Feb 26, 2010 3:05 PM -
Oops, sorry I didn't notice the DISTINCT in the subquery.
isotope
Edited by: isotope on Feb 26, 2010 12:03 PM -
Thanks Frank.
Using the analytic median gives me the solution to fix this issue. -
My guess would be that it is a bug, not sure if it is a known one, but I can reproduce it in 11.1.0.7.
From the documentation:
"The result of MEDIAN is computed by first ordering the rows. Using N as the number of rows in the group, Oracle calculates the row number (RN) of interest with the formula RN = (1 + (0.5*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN)."
What I think is going on is that the distinct causes a hash unique to be performed, but somewhere median (or the optimizer) is assuming that the rows are already ordered because of the distinct so is not doing its own sort.SQL> SELECT distinct prob 2 FROM (SELECT 1 AS prob FROM dual UNION ALL 3 SELECT 2 FROM dual UNION ALL 4 SELECT 3 FROM dual UNION ALL 5 SELECT 4 FROM dual UNION ALL 6 SELECT 5 FROM dual UNION ALL 7 SELECT 1 FROM dual UNION ALL 8 SELECT 2 FROM dual); PROB ---------- 5 1 2 3 4
So, it correctly calculates RN as 3, but the value at row 3 is 2.
The analytic version probably forces a sort, and in the group by version, median (or the optimizer) probably recognizes that a hash algorithm would be used for the grouping. -
I think it's correct, actually. I'm not going by the formula quoted from Oracle docs below, I'm going by the general mathematical formula posted [http://easycalculation.com/statistics/learn-median.php|here.] (What the hell am I doing wrong with links? They aren't links!
Anyway. When the count of numbers is odd, the formula is to take the (n+1)/2 number of the ordered sequence.
Sequence 1:1,2,3,4,5,1,2 1,1,2,2,3,4,5 (7+1)/2=4. # at pos. 4 = 2
(7+1)/2 = 8/2 = 4. The 4th number in the ordered set it 2. Which is what you got.
Sequence 2:1,2,3,4,5 1,2,3,4,5 (5+1)/2 = 3. # at pos. 3 = 3
(5+1)/2 = 6/2 = 3. The 3rd number in the ordered set happens to be 3. Which is also what you got.
So it works as mathematically expected, no? -
So it works as mathematically expected, no?No, because the Op is executing a DISTINCT before the MEDIAN, hence the set of values on which Oracle has to calculate the MEDIAN is
1,2,3,4,5
in both cases...
And the result must be 3 in both cases...
Max
http://oracleitalia.wordpress.com -
Ah! You are correct! Which I guess makes me, "less than correct". But if you take the median ...
-
Hi,
Another interesting querySELECT median(prob) FROM (SELECT DISTINCT prob AS prob FROM (SELECT 1 AS prob FROM dual UNION SELECT 2 AS prob FROM dual UNION SELECT 3 AS prob FROM dual UNION SELECT 4 AS prob FROM dual UNION SELECT 5 AS prob FROM dual UNION SELECT 1 AS prob FROM dual UNION SELECT 2 AS prob FROM dual) ORDER BY prob)
Remove UNION ALL and you get 3 again
Cheers!!!
Bhushan -
Looks indeed as a bug: Once you materialize the inner view you get the expected result:
SQL> with t as (select /*+ materialize */ distinct * from (select 1 as prob from dual union all select 2 as prob from dual union all select 3 as prob from dual union all select 4 as prob from dual union all select 5 as prob from dual union all select 1 as prob from dual union all select 2 as prob from dual)) select median (prob) from t / MEDIAN(PROB) ------------ 3 1 row selected.
This discussion has been closed.