Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

need example of analytic function single pass

OU_230Sep 27 2017 — edited Sep 27 2017

Could someone please explain what is a single pass vs multipass operation and how analytic function could make SQL single pass to improve performance?

Comments

BluShadow

Ok, let's start by asking you what you think the answer to those questions is.  Then we can understand your understanding before we answer.

OU_230

:-)   I think single pass is accessing the table once, multi-pass is accessing the table multiple times

mathguy

Single pass vs multipass: that's the correct definition.

Then:

In the example below, I use the EMP table in the SCOTT schema, which exists on many (most?) Oracle installations. If you don't have it, consider any employee table in any of the schemas you have access to, or build your own. You need four columns: DEPTNO, EMPNO, ENAME and SAL (obvious meaning).

If there were no analytic functions, how would you solve the following problem?

Show all rows (DEPTNO, EMPNO, ENAME, SAL) where the salary is highest WITHIN THE DEPARTMENT.  Note: there may be ties for "highest salary in the department"; in those cases, show all employees with the highest salary.

You could find the max in each department and then do a join; or, more efficiently, a semi-join (using an IN condition with a subquery).

Then how would you solve the same problem using analytic functions? I would suggest using the analytic version of MAX().

Then run both queries - convince yourself you get the same answer - and look at the execution plans. You will see that the solution without analytic functions must read the base table twice; the one with analytic functions only needs to read the table once.

Frank Kulash

Hi,

Let's take an example using the scott.emp table (called scott_emp on my system).

Say we want to get some information about individual employees, and some information about departments as a whole (e.g., the total sal for each department), like this:

    DEPTNO ENAME             SAL   DEPT_SAL

---------- ---------- ---------- ----------

        10 CLARK            2450       8750

        10 KING             5000       8750

        10 MILLER           1300       8750

        20 ADAMS            1100      10875

        20 FORD             3000      10875

        20 JONES            2975      10875

        20 SCOTT            3000      10875

        20 SMITH             800      10875

        30 ALLEN            1600       9400

        30 BLAKE            2850       9400

        30 JAMES             950       9400

        30 MARTIN           1250       9400

        30 TURNER           1500       9400

        30 WARD             1250       9400

Here's one way you might get those results:

WITH    dept_summary    AS

(

    SELECT    deptno

    ,         SUM (sal)  AS dept_sal

    FROM      scott_emp

    GROUP BY  deptno

)

SELECT    e.deptno, e.ename, e.sal

,         d.dept_sal

FROM      scott_emp     e

JOIN      dept_summary  d  ON   d.deptno  = e.deptno

ORDER BY  e.deptno, e.ename; 

Let's look at the explain plain to see how that query was executed.  That is

EXPLAIN PLAN FOR

WITH    dept_summary    AS

...;

SELECT  *

FROM    TABLE (dbms_xplan.display);

The output is

| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time|

---------------------------------------------------------------------------------

|   0 | SELECT STATEMENT    |           |    70 |  2240 |     4   (0)| 00:00:01|

|   1 |  SORT GROUP BY      |           |    70 |  2240 |     4   (0)| 00:00:01|

|*  2 |   HASH JOIN         |           |    70 |  2240 |     4   (0)| 00:00:01|

|   3 |    TABLE ACCESS FULL| SCOTT_EMP |    14 |   350 |     2   (0)| 00:00:01|

|   4 |    TABLE ACCESS FULL| SCOTT_EMP |    14 |    98 |     2   (0)| 00:00:01

--------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   2 - access("DEPTNO"="E"."DEPTNO")

As you can see, the scott_emp table was scanned completely 2 times.

We can get the same results using the analytic SUM function, like this:

SELECT    deptno, ename, sal

,         SUM (sal) OVER (PARTITION BY  deptno)

             AS dept_sal

FROM      scott_emp  m

ORDER BY  deptno, ename; 

But look at the explain plan for this query:

| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |           |    14 |   182 |     2   (0)| 00:00:01 |

|   1 |  WINDOW SORT       |           |    14 |   182 |     2   (0)| 00:00:01 |

|   2 |   TABLE ACCESS FULL| SCOTT_EMP |    14 |   182 |     2   (0)| 00:00:01 |

--------------------------------------------------------------------------------

It only needs 1 full table scan.  The analytic query computes the SUM while it is reading each row, so it doesn't need to make a second table scan.

OU_230

Perfect! Thanks.

mathguy

Lol - I see; you would rather have someone else do the whole homework for you, rather than just help you with it. Did you try what I suggested, though?

OU_230

I do not have scott schema , I was  trying to do it in livesql.oracle.com but not able to get explain plan to work. I am creating tables to in my own database to try it out now.

mathguy

You tried something on livesql...   I am not sure if you can get explain plan from there?  In any case, I assume you have access to a database, and you use a front-end like SQL*Plus or perhaps SQL Developer or Toad? Are you able to run EXPLAIN PLAN on your database, using your front-end application?

Do you have (access to) the HR schema? Many things we do in the SCOTT schema for illustration can be done in the HR schema. I believe scripts to install HR are available (for free) online, from Oracle itself; I am not sure if the same is true for SCOTT. Especially if you are going to use this forum in the future, it will be very helpful to have easy access to both.

OU_230

I do not have access to HR or SCOTT.  So, I tried in one of the tables  I have.  I can see multi-pass vs single -pass when we use the analytical function but I do not see any performance benefit.

Both completes in 10 seconds.  I have where condition in both the queries as it was taking more that a minutes.

Multi-pass operation has slightly higher  consistent gets  but single - pass SQL with  analytical function has sort ( memory).

Why is there no  performance benefit?

-- single pass operation with analytical function

SQL>select dsclr_pk, flng_pk ,indvl_pk,  max(flng_pk) over (partition by dsclr_pk) max_flng_pk  from dsclr_flng  where dsclr_pk > 7941071

SQL> /

43859 rows selected.

Elapsed: 00:00:10.00

Execution Plan

----------------------------------------------------------

Plan hash value: 2087123764

-----------------------------------------------------------------------------------------------

| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time    |

-----------------------------------------------------------------------------------------------

|  0 | SELECT STATEMENT            |                |    11 |  176 |    3  (0)| 00:00:01 |

|  1 |  WINDOW BUFFER              |                |    11 |  176 |    3  (0)| 00:00:01 |

|  2 |  TABLE ACCESS BY INDEX ROWID| DSCLR_FLNG    |    11 |  176 |    3  (0)| 00:00:01 |

|*  3 |    INDEX RANGE SCAN          | XIE1DSCLR_FLNG |    11 |      |    1  (0)| 00:00:01 |

-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

  3 - access("DSCLR_PK">7941071)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

      42097  consistent gets

          0  physical reads

          0  redo size

    1371477  bytes sent via SQL*Net to client

      32526  bytes received via SQL*Net from client

      2925  SQL*Net roundtrips to/from client

          1  sorts (memory)

          0  sorts (disk)

      43859  rows processed

-- multi pass operation

SQL> select  df.dsclr_pk, df.flng_pk , df.indvl_pk , dmf.max_flng_pk from dsclr_flng  df  join

  2  (select dsclr_pk, max(flng_pk) max_flng_pk from dsclr_flng  where dsclr_pk > 7941071

  3  group by dsclr_pk ) dmf    on (df.dsclr_pk = dmf.dsclr_pk);

43859 rows selected.

Elapsed: 00:00:10.24

Execution Plan

----------------------------------------------------------

Plan hash value: 4237910102

-----------------------------------------------------------------------------------------------

| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time    |

-----------------------------------------------------------------------------------------------

|  0 | SELECT STATEMENT            |                |    11 |  462 |    4  (0)| 00:00:01 |

|  1 |  NESTED LOOPS                |                |    11 |  462 |    4  (0)| 00:00:01 |

|  2 |  NESTED LOOPS              |                |    11 |  462 |    4  (0)| 00:00:01 |

|  3 |    VIEW                      |                |    1 |    26 |    1  (0)| 00:00:01 |

|  4 |    HASH GROUP BY            |                |    1 |    12 |    1  (0)| 00:00:01 |

|*  5 |      INDEX RANGE SCAN        | XPKDSCLR_FLNG  |    11 |  132 |    1  (0)| 00:00:01 |

|*  6 |    INDEX RANGE SCAN          | XIE1DSCLR_FLNG |    11 |      |    1  (0)| 00:00:01 |

|  7 |  TABLE ACCESS BY INDEX ROWID| DSCLR_FLNG    |    11 |  176 |    3  (0)| 00:00:01 |

-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

  5 - access("DSCLR_PK">7941071)

  6 - access("DF"."DSCLR_PK"="DMF"."DSCLR_PK")

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

      56132  consistent gets

          0  physical reads

          0  redo size

    1380915  bytes sent via SQL*Net to client

      32526  bytes received via SQL*Net from client

      2925  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

      43859  rows processed

SQL>

mathguy

In the first query, using analytic functions, you forgot to add the last filter - you are selecting all 48,539 rows with that DSCLR_PK condition, instead of just the 11 rows returned by the second query. Then you show an explain plan, seemingly for the same query (it only has the condition on DSCLR_PK in the predicate section), but it only shows 11 rows returned. Something is wrong there.

In any case:

If you want to create a table for testing (assuming you have enough privileges in your "sandbox"), you can create your own copy of the table, with fewer rows total, with something like

CREATE TABLE TEST_TBL AS

    SELECT DSCLR_PK, FLNG_PK, INDVL_PK

    FROM DSCLR_FLNG

    WHERE DSCLR_PK > 7941071;

and then you can re-create the indexes you had on the original table. Now you can test on TEST_TBL and you will not have to select from a (presumably much) larger table. With your current setup it is possible that most of the time goes into selecting the 43k rows to begin with, which will be the same for both queries.

OU_230

mathguy,

Why are you saying 2nd query returned 11 rows?

Although the 'estimated rows' in explain plan is 11  for both the queries,  the 'rows selected' as well as auto trace stat 'rows processed' is showing 43859.  In addition,  predicate information for both the explain plan has " access("DSCLR_PK">7941071)"

I have  privilege to create table. 

" With your current setup it is possible that most of the time goes into selecting the 43k rows to begin with, which will be the same for both queries."

          Are you saying most time is  spent on FETCH rather than EXECUTE?  Wouldn't  the multipass SQL with no analytical function, access  43K rows TWICE so it should be  slower?

1 - 11
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Oct 25 2017
Added on Sep 27 2017
11 comments
359 views