Forum Stats

  • 3,854,698 Users
  • 2,264,408 Discussions
  • 7,905,761 Comments

Discussions

ANY_VALUE set function

Kevan Gelling
Kevan Gelling Member Posts: 33 Green Ribbon
edited May 19, 2016 6:26AM in Database Ideas - Ideas

In the example below, I want to display the department number, department name and number of employees in each department.  I should only need to group by DEPTNO, but I have to include DNAME in my GROUP BY clause in order to avoid an "ORA-00979: not a GROUP BY expression" erro

SELECT  d.deptno

      , d.dname 

      , COUNT(*)

FROM    dept d

  JOIN  emp e  ON e.deptno = d.deptno

GROUP BY d.deptno

       , d.dname ;

But including DNAME in the GROUP BY can lead to a worse execution plan (as the optimiser may need to look past the DEPTNO index to check DNAME values).  I also think, it can be misleading - it implies that I want employee totals for each combination of department number and name.

A workaround is to use MIN or MAX (as shown), but I think this can be misleading and still requires superfluous work to be done by Oracle to determine the minimum or maximum value.

SELECT  d.deptno

      , MAX( d.dname )  AS dname

      , COUNT(*)

FROM    dept d

  JOIN  emp e  ON e.deptno = d.deptno

GROUP BY d.deptno ;

---------------------

An ANY_VALUE()  function would remove the optimiser redundancy (i.e. a DNAME grouping or min/max sorting) and allow for more readable code.

SELECT  d.deptno

      , ANY_VALUE( d.dname )  AS dname

      , COUNT(*)

FROM    dept d

  JOIN  emp e  ON e.deptno = d.deptno

GROUP BY d.deptno ;

Kevan GellingApexBineSven W.tonibony7ulohmannThorsten KettnerPeter Hraškogaverillblessed DBAcormacoberxLothar FlatzJeffrey KempMartin PreissWilliam RobertsonJose Carlos Pavón
19 votes

Active · Last Updated

«1

Comments

  • Jeffrey Kemp
    Jeffrey Kemp Member Posts: 193 Bronze Badge

    If performance was an issue in this case, wouldn't you just use a subquery, e.g.:

    SELECT  d.deptno

          , d.dname

       , e.c

    FROM    dept d

    JOIN (

      SELECT  e.deptno

       , COUNT(*) as c

      FROM    emp e

      GROUP BY e.deptno

      ) e

    ON e.deptno = d.deptno;

    BEDEblessed DBA
  • Kevan Gelling
    Kevan Gelling Member Posts: 33 Green Ribbon

    @Jeff

    That's a good option for that particular test case, but there are other scenarios (see below) where it's not so easy to find an optimal solution.


    De-normalised data (e.g. a Datawarehouse dimension table):

    SELECT  e.deptno
          , MAX( e.dname )  AS dname
          , COUNT(*)
    FROM    emp_dim e
    GROUP BY e.deptno ;


    Intersect table:

    SELECT  d.deptno
          , MAX( d.dname )  AS dname
          , s.stcode
          , MAX( s.stname ) AS stname
          , COUNT(*)
    FROM    dept d
      JOIN  emp e   ON e.deptno = d.deptno
      JOIN  addr a  ON a.addrno = e.addrno
      JOIN  state s ON s.stcode  = a.stcode
    GROUP BY d.deptno

           , s.stcode ;

  • ApexBine
    ApexBine Member Posts: 153 Silver Badge

    Good idea!

    When you know more about the data than the optimizer (which sometimes is the case) this would be a really helpful time saver.

    Thorsten Kettner
  • Sven W.
    Sven W. Member Posts: 10,551 Gold Crown

    So ANY_VALUE would be like a shorter version of the KEEP (DENSE_RANK ...) syntax?

    https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions056.htm

    So if you want the number of employees per state and just the name of one of the departments, where they are working.

    Instead of this (untested),

    SELECT  s.stcode    

          , s.stname

          , MAX( d.dname ) KEEP (DENSE_RANK first order by d.dname desc) AS dname
          , COUNT(*)
    FROM    dept d
      JOIN  emp e   ON e.deptno = d.deptno
      JOIN  addr a  ON a.addrno = e.addrno
      JOIN  state s ON s.stcode  = a.stcode
    GROUP BY s.stcode, s.stname ;

    you would like to do

    SELECT  s.stcode    

          , s.stname

          , ANY_VALUE( d.dname ) AS dname
          , COUNT(*)
    FROM    dept d
      JOIN  emp e   ON e.deptno = d.deptno
      JOIN  addr a  ON a.addrno = e.addrno
      JOIN  state s ON s.stcode  = a.stcode
    GROUP BY s.stcode, s.stname ;

    BEDE
  • Kevan Gelling
    Kevan Gelling Member Posts: 33 Green Ribbon

    So ANY_VALUE would be like a shorter version of the KEEP (DENSE_RANK ...) syntax?

    https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions056.htm

    So if you want the number of employees per state and just the name of one of the departments, where they are working.

    Instead of this (untested),

    SELECT  s.stcode    

          , s.stname

          , MAX( d.dname ) KEEP (DENSE_RANK first order by d.dname desc) AS dname
          , COUNT(*)
    FROM    dept d
      JOIN  emp e   ON e.deptno = d.deptno
      JOIN  addr a  ON a.addrno = e.addrno
      JOIN  state s ON s.stcode  = a.stcode
    GROUP BY s.stcode, s.stname ;

    you would like to do

    SELECT  s.stcode    

          , s.stname

          , ANY_VALUE( d.dname ) AS dname
          , COUNT(*)
    FROM    dept d
      JOIN  emp e   ON e.deptno = d.deptno
      JOIN  addr a  ON a.addrno = e.addrno
      JOIN  state s ON s.stcode  = a.stcode
    GROUP BY s.stcode, s.stname ;

    MAX() is the shorter version MAX() KEEP ( DENSE_RANK ).  Both require a sort and both, arguably, are misleading (as a maximum value isn't required).

    ANY_VALUE doesn't require a sort and, hopefully, is self-documenting

    ApexBineSven W.
  • Sven W.
    Sven W. Member Posts: 10,551 Gold Crown

    MAX() is the shorter version MAX() KEEP ( DENSE_RANK ).  Both require a sort and both, arguably, are misleading (as a maximum value isn't required).

    ANY_VALUE doesn't require a sort and, hopefully, is self-documenting

    Kevan Gelling wrote:MAX() is the shorter version MAX() KEEP ( DENSE_RANK ). Both require a sort and both, arguably, are misleading (as a maximum value isn't required).ANY_VALUE doesn't require a sort and, hopefully, is self-documenting

    Not sure if I got you right here. For the KEEP syntax the important bit is not the function at the beginning, but instead the ordering.

    keep (dense_rank FIRST ... ORDER BY DESC ) 

    FIRST + order by ASC = MIN

    FIRST + order by DESC = MAX

    LAST + order by ASC = MAX

    LAST+ order by DESC = MIN

    The difference to a normal MAX function would be that more complex scenarios are possible and we are allowed to access data from one column based upon the min/max value of another column. I admit my example didn't show that possibility. However we can even allow to order by some random value
    e.g.

    MAX( d.dname ) KEEP (DENSE_RANK first order by DBMS_RANDOM.VALUE) AS dname

    I totally agree with the sort. Thats why I think the ANY_VALUE function would be usefull.

  • Kevan Gelling
    Kevan Gelling Member Posts: 33 Green Ribbon
    Kevan Gelling wrote:MAX() is the shorter version MAX() KEEP ( DENSE_RANK ). Both require a sort and both, arguably, are misleading (as a maximum value isn't required).ANY_VALUE doesn't require a sort and, hopefully, is self-documenting

    Not sure if I got you right here. For the KEEP syntax the important bit is not the function at the beginning, but instead the ordering.

    keep (dense_rank FIRST ... ORDER BY DESC ) 

    FIRST + order by ASC = MIN

    FIRST + order by DESC = MAX

    LAST + order by ASC = MAX

    LAST+ order by DESC = MIN

    The difference to a normal MAX function would be that more complex scenarios are possible and we are allowed to access data from one column based upon the min/max value of another column. I admit my example didn't show that possibility. However we can even allow to order by some random value
    e.g.

    MAX( d.dname ) KEEP (DENSE_RANK first order by DBMS_RANDOM.VALUE) AS dname

    I totally agree with the sort. Thats why I think the ANY_VALUE function would be usefull.

    Maybe I should have said "MAX() is similar to MAX() KEEP ( DENSE_RANK ).  Both require a sort ..."

  • Thorsten Kettner
    Thorsten Kettner Member Posts: 44 Blue Ribbon

    I like the idea.

    Not for the given query, though, because if Oracle would be standard-compliant here, it would suffice to group by deptno, as dname is functionally dependent on deptno. So I would rather like to see Oracle detecting functional dependencies and become standard-compliant in this regard.

    But there are other cases where we don’t care which value to grab. Let’s say we read order details, looking for all products that were printed like '%trousers%', but want to show each product just once. Then we don’t care that product 12345 was printed 'grey trousers' in one order and 'women’s black trousers' in another. We could use MIN(dname) or MAX(dname) to get one of the strings, but why make the DBMS compare all entries, when we just don’t care which of the strings it returns? ANY_VALUE(dname) would tell the DBMS exactly that.

  • blessed DBA
    blessed DBA Member Posts: 218

    I stick with Sven.W

  • Lothar Flatz
    Lothar Flatz Member Posts: 687 Silver Badge

    So ANY_VALUE would be like a shorter version of the KEEP (DENSE_RANK ...) syntax?

    https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions056.htm

    So if you want the number of employees per state and just the name of one of the departments, where they are working.

    Instead of this (untested),

    SELECT  s.stcode    

          , s.stname

          , MAX( d.dname ) KEEP (DENSE_RANK first order by d.dname desc) AS dname
          , COUNT(*)
    FROM    dept d
      JOIN  emp e   ON e.deptno = d.deptno
      JOIN  addr a  ON a.addrno = e.addrno
      JOIN  state s ON s.stcode  = a.stcode
    GROUP BY s.stcode, s.stname ;

    you would like to do

    SELECT  s.stcode    

          , s.stname

          , ANY_VALUE( d.dname ) AS dname
          , COUNT(*)
    FROM    dept d
      JOIN  emp e   ON e.deptno = d.deptno
      JOIN  addr a  ON a.addrno = e.addrno
      JOIN  state s ON s.stcode  = a.stcode
    GROUP BY s.stcode, s.stname ;

    HI Swen,

    as any_value does not require a sort , therefore it should be applicable to datatypes like clob.