9 Replies Latest reply: Jul 29, 2011 2:34 PM by Frank Kulash RSS

    "FIRST_VALUE() OVER () vs MAX() DENSE_RANK FIRST OVER()

    799879
      I am trying to determine the differences between using the analytic function first_value vs using the DENSE_RANK FIRST OVER() style analytic function.

      In particular, when is it appropriate to use one vs the other? Truthfully the DENSE_RANK FIRST OVER() seems like an much more flexible version of the first_value function.

      Any enlightenment is much appreciated.
        • 1. Re: "FIRST_VALUE() OVER () vs MAX() DENSE_RANK FIRST OVER()
          Frank Kulash
          Hi,
          Claytonian wrote:
          I am trying to determine the differences between using the analytic function first_value vs using the DENSE_RANK FIRST OVER() style analytic function.

          In particular, when is it appropriate to use one vs the other? Truthfully the DENSE_RANK FIRST OVER() seems like an much more flexible version of the first_value function. Those functions are actually quite different.
          DENSE_RANK tells you how the current row compares to other rows in some measure, but it says nothing about what that measure is. For example if
          DENSE_RANK () OVER (ORDER BY sal)
          returns 2, that might tell you that this row has the 2nd lowest sal. It doesn't say if the lowest sal is 1 lower, or 100 lower, or .001 lower.

          FIRST_VALUE tells you what some measure was at the beiginning of a period. It says nothing about how that compares to other values, though, of course, you may use the value in comparisons. A common use of FIRST_VALUE is
          SELECT       balance
          ,       balance - FIRST_VALUE (balance) OVER ( PARTITION BY  account_no
                                                  ORDER BY      transaction_date
                                          )       AS diff
          ...
          which calculates how much balnace has changed between the frist row and the current row.
          Claytonian wrote:
          ... Truthfully the DENSE_RANK FIRST OVER() seems like an much more flexible version of the first_value function.

          Any enlightenment is much appreciated.
          Can you shed a little light on the statement above? Have you found a problem that can be solved using FIRST_VALUE, but it's simpler to use DENSE_RANK instead? That may be; the two functions were designed for different purposes. There are other problems that can be solved using DENSE_RANK, but are simpler using FIRST_VALUE instead.

          I don't use DENSE_RANK much; it just doesn't seem like the best tool for many of my problems, but it sure is handy when I need it.
          I do, however, use DENSE_RANK's close relatives RANK and ROW_NUMBER very often.
          • 2. Re: "FIRST_VALUE() OVER () vs MAX() DENSE_RANK FIRST OVER()
            Solomon Yakobson
            Don't confuse analytic with aggregate.

            SY.
            • 3. Re: "FIRST_VALUE() OVER () vs MAX() DENSE_RANK FIRST OVER()
              bluefrog
              Also,
              DENSE_RANK computes the rank of a row in an ordered group of rows and returns the rank as a NUMBER.

              DENSE_RANK returns a number whereas FIRST_VALUE returns the actual value. You can choose to ignore
              NULLS with FIRST_VALUE and LAST_VALUE, but not DENSE_RANK.

              From what I've gleaned from http://www.morganslibrary.org/reference/analytic_functions.html
              specifying a bound to the ORDER BY can be done with the FIRST/LAST_VALUE and a few other functions.
              functions, but not the DENSE_RANK. Its shown as "<analytic clause>" in morgans library.

              DENSE_RANK can syntactically precede an ORDER BY or used to return the ranking number.

              The similarity is that they are both based on an order. They are different though.
              • 4. Re: "FIRST_VALUE() OVER () vs MAX() DENSE_RANK FIRST OVER()
                799879
                Let me rephrase the question with an example to explain what I am asking about

                Take the expression
                                  FIRST_VALUE(act_inst.activity_instance_id) 
                                    OVER (
                                      PARTITION BY
                                          act_inst.process_instance_id
                                        , act.activity_name
                                      ORDER BY
                                          act_inst.activity_ready_date ASC
                                        , act_inst.foreign_activity_instance_id ASC
                                    ) activity_instance_id
                and compare it to
                                  MIN(act_inst.activity_instance_id) 
                                    KEEP (DENSE_RANK FIRST ORDER BY
                                          act_inst.activity_ready_date ASC
                                        , act_inst.foreign_activity_instance_id ASC)
                                    OVER (PARTITION BY
                                          act_inst.process_instance_id
                                        , act.activity_name) activity_instance_id2
                they both return the exact same thing.

                so the question holds, what is the difference between these two analytic functions and when should I use one vs the other?

                Edited by: Claytonian on Jul 29, 2011 10:13 AM

                Edited by: Claytonian on Jul 29, 2011 10:50 AM
                • 5. Re: "FIRST_VALUE() OVER () vs MAX() DENSE_RANK FIRST OVER()
                  Frank Kulash
                  Hi
                  Claytonian wrote:
                  Let me rephrase the question with an example to explain what I am asking about

                  Take the expression
                  -----
                  FIRST_VALUE(act_inst.activity_instance_id)
                  OVER (
                  PARTITION BY
                  act_inst.process_instance_id
                  , act.activity_name
                  ORDER BY
                  act_inst.activity_ready_date ASC
                  , act_inst.foreign_activity_instance_id ASC
                  ) activity_instance_id
                  -----
                  and compare it to
                  -----
                  MIN(act_inst.activity_instance_id)
                  KEEP (DENSE_RANK FIRST ORDER BY
                  act_inst.activity_ready_date ASC
                  , act_inst.foreign_activity_instance_id ASC)
                  OVER (PARTITION BY
                  act_inst.process_instance_id
                  , act.activity_name) activity_instance_id2
                  -----

                  they both return the exact same thing.
                  They both return errors when I try to run them.
                  If you want to talk about your act and act_inst tables, then post CREATE TABLE and INSERT statments for your sample data.
                  so the question holds, what is the difference between these two analytic functions and when should I use one vs the other?

                  Edited by: Claytonian on Jul 29, 2011 10:13 AM
                  What I said about DENSE_RANK earlier appled to the analytic fucntion, e.g.
                  DENSE_RANK () OVER (PARTITION BY deptno ORDER BY ename)
                  If I had read your initial message more carefully, I would have seen that. Sorry.
                  In your 2nd example, you're not using that function; you're using FIRST fucntion. FIRST is always accompanied by some aggeregate function (in this case, MIN) and the keyword DENSE_RANK. This is is not the same as the DENSE_RANK analytic function.

                  As to your question: I personally find the first way, using FIRST_VALUE clearer, but I won't be maintaining your code. if you, and the people who will have to maintain your code, like the second way for some reason, then use it.
                  • 6. Re: "FIRST_VALUE() OVER () vs MAX() DENSE_RANK FIRST OVER()
                    799879
                    So to clarify,

                    there is not really any difference between the FIRST_VALUE analytic function and the FIRST analytic function and it comes down to to code preference/maintainability?
                    • 7. Re: "FIRST_VALUE() OVER () vs MAX() DENSE_RANK FIRST OVER()
                      Frank Kulash
                      Hi,
                      Claytonian wrote:
                      So to clarify,

                      there is not really any difference between the FIRST_VALUE analytic function and the FIRST analytic function and it comes down to to code preference/maintainability?
                      I'm just guessing that there no performance difference. I haven't seen any studies, or done any experiments to back that up.

                      The only difference between the analytic FIRST and FIRST_VALUE value functions is how you call them. You were very careful to mention that you're talking about the analytic functions. That's good.
                      There's a big difference between the aggregate FIRST and FIRST_VALUE value functions: the aggregate FIRST function exists, but there is no aggregate FIRST_VALUE function. If you want to see the x that is associated with the lowest y in an aggregate query, then use FIRST.
                      • 8. Re: "FIRST_VALUE() OVER () vs MAX() DENSE_RANK FIRST OVER()
                        799879
                        Thanks for all the excellent information! I haven't seen any studies to back it up, and my non-scientific analysis shows no difference in performance. They generate identical explain plans(no surprise) and execute about the same speed. Unfortunately in my environment I do not have DB permissions to do a real analysis(e.g. autotrace or something similar that provides # of dbreads, consistent gets, etc...).

                        And you are very correct on a world of difference when talking about aggregate functions.

                        Thanks again.
                        • 9. Very Fine Point
                          Frank Kulash
                          Hi,

                          I just thought of a potential difference, that would only matter if there are ties in the ORDER BY.
                          If there are several rows with an equal claim to being the first row (however you've defined "first") the aggregate function that you specified along with FIRST determines which of those values to return. That needn't be the value on any one of those rows; you could specify AVG or STRAGG.
                          When there is a tie using FIRST_VALUE, it seems to return the MIN of the contenders. I don't see anything in the documentation to guarantee that; it could be returning any one of the values arbitrarily, and the experiments I've been doing all happened to return the MIN value.

                          For example, using the scott.emp table, say we want to find, for each job, the sal of first employee, in order by deptno. There is a 4-way tie among salesmen for who is the "first", since all 4 rows have the same deptno.
                          SELECT       job
                          ,       deptno
                          ,       ename
                          ,       sal
                          ,       MAX (sal) KEEP ( DENSE_RANK  FIRST ORDER BY deptno)
                                                   OVER ( PARTITION BY  job )     AS f
                          ,       FIRST_VALUE (sal) OVER ( PARTITION BY  job
                                                             ORDER BY        deptno
                                                 ,            sal     DESC
                                               )                    AS fvfo
                          ,       FIRST_VALUE (sal) OVER ( PARTITION BY  job
                                                             ORDER BY        deptno
                                               )                    AS fv
                          FROM       scott.emp
                          ORDER BY  job
                          ,            deptno
                          ,       sal
                          ;
                          Output:
                          JOB       DEPTNO ENAME         SAL      F   FVFO     FV
                          --------- ------ ---------- ------ ------ ------ ------
                          ANALYST       20 FORD         3000   3000   3000   3000
                          ANALYST       20 SCOTT        3000   3000   3000   3000
                          
                          CLERK         10 MILLER       1300   1300   1300   1300
                          CLERK         20 SMITH         800   1300   1300   1300
                          CLERK         20 ADAMS        1100   1300   1300   1300
                          CLERK         30 JAMES         950   1300   1300   1300
                          
                          MANAGER       10 CLARK        2450   2450   2450   2450
                          MANAGER       20 JONES        2975   2450   2450   2450
                          MANAGER       30 BLAKE        2850   2450   2450   2450
                          
                          PRESIDENT     10 KING         5000   5000   5000   5000
                          
                          SALESMAN      30 MARTIN       1250   1600   1600   1250
                          SALESMAN      30 WARD         1250   1600   1600   1250
                          SALESMAN      30 TURNER       1500   1600   1600   1250
                          SALESMAN      30 ALLEN        1600   1600   1600   1250
                          Column f is derived using the analytic FIRST fucntion, along with MAX (not MIN, as in your example). That means, when it sees that there are 4 rows that all have job='SALESMAN' and deptno=30, and that 30 is the lowest deptno, it is guaranteed to bring back the largest of the sal values from those 4 rows.
                          Column fv shows what you get from FIRST_VALUE when there is a tie. Notice that, only in the case of job='SALESMAN', this is different from column f. If we wanted to be sure FIRST_VALUE was returning the same value as MAX ... FIRST, then we could add sal itself to the analytic ORDER BY clause, as in column fvfo.