Forum Stats

  • 3,837,640 Users
  • 2,262,276 Discussions
  • 7,900,337 Comments

Discussions

Median function in 10gR2

756666
756666 Member Posts: 2
edited Feb 26, 2010 5:46PM in SQL & PL/SQL
I am seeing some unexpected behavior using the median function applied to a set of distinct values returned by a subquery. I didnt find an explanation by searching the web; hence the question here. These queries are on a 10gR2 database.

In both cases below, the select distinct should return 1,2,3,4,5; for a median of 3.

However the median is different when the 1,2 are repeated

This query returns 2 as median -

SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob
FROM dual
UNION ALL
SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob FROM dual) sub);

This query returns 3 as the median.

SELECT median(prob)
FROM (SELECT DISTINCT (prob) AS prob
FROM (SELECT 1 AS prob
FROM dual
UNION ALL
SELECT 2 AS prob
FROM dual
UNION ALL
SELECT 3 AS prob
FROM dual
UNION ALL
SELECT 4 AS prob
FROM dual
UNION ALL
SELECT 5 AS prob FROM dual) sub);

What am I missing about the median or the distinct function to cause this different behavior? My expectation was to get the same median - 3 - in both cases since the subquery that the median uses returns the same values in both.

Thanks in advance for any insight on this.

Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,219 Red Diamond
    edited Feb 26, 2010 3:06PM
    Hi,

    Welcome to the forum!

    Sorry, I can't explain that. I agree with you; it's odd. It seems like the first query should return 3.
    The analytic MEDIAN function does:
    SELECT DISTINCT         MEDIAN (prob) OVER (PARTITION BY 1)
    FROM 	(   SELECT DISTINCT (prob) AS prob
    	    FROM   (   SELECT  1 AS prob	FROM dual	UNION ALL
    	    	       SELECT  2 AS prob	FROM dual	UNION ALL
    		       SELECT  3 AS prob	FROM dual	UNION ALL
    		       SELECT  4 AS prob	FROM dual	UNION ALL
    		       SELECT  5 AS prob	FROM dual	UNION ALL
    		       SELECT  1 AS prob	FROM dual	UNION ALL
    		       SELECT  2 AS prob 	FROM dual
    		   ) sub
    	);
    It also retuns 3 if you do a GROUP BY instead of SELECT DISTINCT in the anonymous in-line view:
    SELECT	MEDIAN (prob)
    FROM	(   SELECT    prob
    	    FROM      sub
    	    GROUP BY  prob
    	);
    Edited by: Frank Kulash on Feb 26, 2010 3:05 PM
    Frank Kulash
  • isotope
    isotope Member Posts: 748 Bronze Badge
    edited Feb 26, 2010 3:04PM
    Oops, sorry I didn't notice the DISTINCT in the subquery.

    isotope

    Edited by: isotope on Feb 26, 2010 12:03 PM
  • 756666
    756666 Member Posts: 2
    Thanks Frank.
    Using the analytic median gives me the solution to fix this issue.
  • John Spencer
    John Spencer Member Posts: 8,567 Bronze Crown
    My guess would be that it is a bug, not sure if it is a known one, but I can reproduce it in 11.1.0.7.

    From the documentation:

    "The result of MEDIAN is computed by first ordering the rows. Using N as the number of rows in the group, Oracle calculates the row number (RN) of interest with the formula RN = (1 + (0.5*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN)."

    What I think is going on is that the distinct causes a hash unique to be performed, but somewhere median (or the optimizer) is assuming that the rows are already ordered because of the distinct so is not doing its own sort.
    SQL> SELECT distinct prob
      2        FROM (SELECT 1 AS prob FROM dual UNION ALL
      3              SELECT 2 FROM dual UNION ALL
      4              SELECT 3 FROM dual UNION ALL
      5              SELECT 4 FROM dual UNION ALL
      6              SELECT 5 FROM dual UNION ALL
      7              SELECT 1 FROM dual UNION ALL
      8              SELECT 2 FROM dual);
     
          PROB
    ----------
             5
             1
             2
             3
             4
    So, it correctly calculates RN as 3, but the value at row 3 is 2.

    The analytic version probably forces a sort, and in the group by version, median (or the optimizer) probably recognizes that a hash algorithm would be used for the grouping.
  • Gaff
    Gaff Member Posts: 1,490
    edited Feb 26, 2010 5:21PM
    I think it's correct, actually. I'm not going by the formula quoted from Oracle docs below, I'm going by the general mathematical formula posted [http://easycalculation.com/statistics/learn-median.php|here.] (What the hell am I doing wrong with links? They aren't links!

    Anyway. When the count of numbers is odd, the formula is to take the (n+1)/2 number of the ordered sequence.

    Sequence 1:
    1,2,3,4,5,1,2         1,1,2,2,3,4,5   (7+1)/2=4.  # at pos. 4 = 2
    (7+1)/2 = 8/2 = 4. The 4th number in the ordered set it 2. Which is what you got.





    Sequence 2:
    1,2,3,4,5               1,2,3,4,5      (5+1)/2 = 3.  # at pos. 3 = 3
    (5+1)/2 = 6/2 = 3. The 3rd number in the ordered set happens to be 3. Which is also what you got.


    So it works as mathematically expected, no?
  • 730428
    730428 Member Posts: 2,087
    So it works as mathematically expected, no?
    No, because the Op is executing a DISTINCT before the MEDIAN, hence the set of values on which Oracle has to calculate the MEDIAN is

    1,2,3,4,5

    in both cases...

    And the result must be 3 in both cases...

    Max
    http://oracleitalia.wordpress.com
  • Gaff
    Gaff Member Posts: 1,490
    Ah! You are correct! Which I guess makes me, "less than correct". But if you take the median ...
  • 728534
    728534 Member Posts: 1,386
    Hi,
    Another interesting query
    SELECT median(prob)
    FROM (SELECT DISTINCT prob AS prob
    FROM (SELECT 1 AS prob
    FROM dual
    UNION 
    SELECT 2 AS prob
    FROM dual
    UNION 
    SELECT 3 AS prob
    FROM dual
    UNION 
    SELECT 4 AS prob
    FROM dual
    UNION 
    SELECT 5 AS prob
    FROM dual
    UNION 
    SELECT 1 AS prob
    FROM dual
    UNION 
    SELECT 2 AS prob FROM dual) ORDER BY prob)
    Remove UNION ALL and you get 3 again :)

    Cheers!!!
    Bhushan
  • MichaelS
    MichaelS Member Posts: 8,424 Bronze Crown
    Looks indeed as a bug: Once you materialize the inner view you get the expected result:
    SQL> with t as (select /*+ materialize */
                     distinct *
                 from (select 1 as prob from dual
                       union all
                       select 2 as prob from dual
                       union all
                       select 3 as prob from dual
                       union all
                       select 4 as prob from dual
                       union all
                       select 5 as prob from dual
                       union all
                       select 1 as prob from dual
                       union all
                       select 2 as prob from dual))
    select median (prob)
      from t
    /
    MEDIAN(PROB)
    ------------
               3
    1 row selected.
This discussion has been closed.