Forum Stats

  • 3,769,601 Users
  • 2,252,990 Discussions
  • 7,875,117 Comments

Discussions

need example of analytic function single pass

OU_230
OU_230 Member Posts: 83 Blue Ribbon
edited Sep 27, 2017 7:05PM in SQL & PL/SQL

Could someone please explain what is a single pass vs multipass operation and how analytic function could make SQL single pass to improve performance?

Tagged:
«1

Answers

  • BluShadow
    BluShadow Member, Moderator Posts: 41,491 Red Diamond
    edited Sep 27, 2017 10:31AM

    Ok, let's start by asking you what you think the answer to those questions is.  Then we can understand your understanding before we answer.

  • OU_230
    OU_230 Member Posts: 83 Blue Ribbon
    edited Sep 27, 2017 11:52AM

    :-)   I think single pass is accessing the table once, multi-pass is accessing the table multiple times

  • mathguy
    mathguy Member Posts: 10,156 Blue Diamond
    edited Sep 27, 2017 12:11PM

    Single pass vs multipass: that's the correct definition.

    Then:

    In the example below, I use the EMP table in the SCOTT schema, which exists on many (most?) Oracle installations. If you don't have it, consider any employee table in any of the schemas you have access to, or build your own. You need four columns: DEPTNO, EMPNO, ENAME and SAL (obvious meaning).

    If there were no analytic functions, how would you solve the following problem?

    Show all rows (DEPTNO, EMPNO, ENAME, SAL) where the salary is highest WITHIN THE DEPARTMENT.  Note: there may be ties for "highest salary in the department"; in those cases, show all employees with the highest salary.

    You could find the max in each department and then do a join; or, more efficiently, a semi-join (using an IN condition with a subquery).

    Then how would you solve the same problem using analytic functions? I would suggest using the analytic version of MAX().

    Then run both queries - convince yourself you get the same answer - and look at the execution plans. You will see that the solution without analytic functions must read the base table twice; the one with analytic functions only needs to read the table once.

    OU_230
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,226 Red Diamond
    edited Sep 27, 2017 12:18PM

    Hi,

    Let's take an example using the scott.emp table (called scott_emp on my system).

    Say we want to get some information about individual employees, and some information about departments as a whole (e.g., the total sal for each department), like this:

        DEPTNO ENAME             SAL   DEPT_SAL---------- ---------- ---------- ----------        10 CLARK            2450       8750        10 KING             5000       8750        10 MILLER           1300       8750        20 ADAMS            1100      10875        20 FORD             3000      10875        20 JONES            2975      10875        20 SCOTT            3000      10875        20 SMITH             800      10875        30 ALLEN            1600       9400        30 BLAKE            2850       9400        30 JAMES             950       9400        30 MARTIN           1250       9400        30 TURNER           1500       9400        30 WARD             1250       9400

    Here's one way you might get those results:

    WITH    dept_summary    AS(    SELECT    deptno    ,         SUM (sal)  AS dept_sal    FROM      scott_emp    GROUP BY  deptno)SELECT    e.deptno, e.ename, e.sal,         d.dept_salFROM      scott_emp     eJOIN      dept_summary  d  ON   d.deptno  = e.deptnoORDER BY  e.deptno, e.ename;  

    Let's look at the explain plain to see how that query was executed.  That is

    EXPLAIN PLAN FOR WITH    dept_summary    AS...;SELECT  * FROM    TABLE (dbms_xplan.display);

    The output is

    | Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time|---------------------------------------------------------------------------------|   0 | SELECT STATEMENT    |           |    70 |  2240 |     4   (0)| 00:00:01||   1 |  SORT GROUP BY      |           |    70 |  2240 |     4   (0)| 00:00:01||*  2 |   HASH JOIN         |           |    70 |  2240 |     4   (0)| 00:00:01||   3 |    TABLE ACCESS FULL| SCOTT_EMP |    14 |   350 |     2   (0)| 00:00:01||   4 |    TABLE ACCESS FULL| SCOTT_EMP |    14 |    98 |     2   (0)| 00:00:01--------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   2 - access("DEPTNO"="E"."DEPTNO")

    As you can see, the scott_emp table was scanned completely 2 times.

    We can get the same results using the analytic SUM function, like this:

    SELECT    deptno, ename, sal,         SUM (sal) OVER (PARTITION BY  deptno)             AS dept_salFROM      scott_emp  mORDER BY  deptno, ename;  

    But look at the explain plan for this query:

    | Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |--------------------------------------------------------------------------------|   0 | SELECT STATEMENT   |           |    14 |   182 |     2   (0)| 00:00:01 ||   1 |  WINDOW SORT       |           |    14 |   182 |     2   (0)| 00:00:01 ||   2 |   TABLE ACCESS FULL| SCOTT_EMP |    14 |   182 |     2   (0)| 00:00:01 |--------------------------------------------------------------------------------

    It only needs 1 full table scan.  The analytic query computes the SUM while it is reading each row, so it doesn't need to make a second table scan.

    OU_230
  • OU_230
    OU_230 Member Posts: 83 Blue Ribbon
    edited Sep 27, 2017 12:27PM

    Perfect! Thanks.

  • mathguy
    mathguy Member Posts: 10,156 Blue Diamond
    edited Sep 27, 2017 12:33PM

    Lol - I see; you would rather have someone else do the whole homework for you, rather than just help you with it. Did you try what I suggested, though?

  • OU_230
    OU_230 Member Posts: 83 Blue Ribbon
    edited Sep 27, 2017 12:40PM

    I do not have scott schema , I was  trying to do it in livesql.oracle.com but not able to get explain plan to work. I am creating tables to in my own database to try it out now.

  • mathguy
    mathguy Member Posts: 10,156 Blue Diamond
    edited Sep 27, 2017 2:20PM

    You tried something on livesql...   I am not sure if you can get explain plan from there?  In any case, I assume you have access to a database, and you use a front-end like SQL*Plus or perhaps SQL Developer or Toad? Are you able to run EXPLAIN PLAN on your database, using your front-end application?

    Do you have (access to) the HR schema? Many things we do in the SCOTT schema for illustration can be done in the HR schema. I believe scripts to install HR are available (for free) online, from Oracle itself; I am not sure if the same is true for SCOTT. Especially if you are going to use this forum in the future, it will be very helpful to have easy access to both.

    OU_230
  • OU_230
    OU_230 Member Posts: 83 Blue Ribbon
    edited Sep 27, 2017 5:40PM

    I do not have access to HR or SCOTT.  So, I tried in one of the tables  I have.  I can see multi-pass vs single -pass when we use the analytical function but I do not see any performance benefit.

    Both completes in 10 seconds.  I have where condition in both the queries as it was taking more that a minutes.

    Multi-pass operation has slightly higher  consistent gets  but single - pass SQL with  analytical function has sort ( memory).

    Why is there no  performance benefit?

    -- single pass operation with analytical functionSQL>select dsclr_pk, flng_pk ,indvl_pk,  max(flng_pk) over (partition by dsclr_pk) max_flng_pk  from dsclr_flng  where dsclr_pk > 7941071SQL> /43859 rows selected.Elapsed: 00:00:10.00Execution Plan----------------------------------------------------------Plan hash value: 2087123764-----------------------------------------------------------------------------------------------| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time    |-----------------------------------------------------------------------------------------------|  0 | SELECT STATEMENT            |                |    11 |  176 |    3  (0)| 00:00:01 ||  1 |  WINDOW BUFFER              |                |    11 |  176 |    3  (0)| 00:00:01 ||  2 |  TABLE ACCESS BY INDEX ROWID| DSCLR_FLNG    |    11 |  176 |    3  (0)| 00:00:01 ||*  3 |    INDEX RANGE SCAN          | XIE1DSCLR_FLNG |    11 |      |    1  (0)| 00:00:01 |-----------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------  3 - access("DSCLR_PK">7941071)Statistics----------------------------------------------------------          0  recursive calls          0  db block gets      42097  consistent gets          0  physical reads          0  redo size    1371477  bytes sent via SQL*Net to client      32526  bytes received via SQL*Net from client      2925  SQL*Net roundtrips to/from client          1  sorts (memory)          0  sorts (disk)      43859  rows processed-- multi pass operationSQL> select  df.dsclr_pk, df.flng_pk , df.indvl_pk , dmf.max_flng_pk from dsclr_flng  df  join  2  (select dsclr_pk, max(flng_pk) max_flng_pk from dsclr_flng  where dsclr_pk > 7941071  3  group by dsclr_pk ) dmf    on (df.dsclr_pk = dmf.dsclr_pk);43859 rows selected.Elapsed: 00:00:10.24Execution Plan----------------------------------------------------------Plan hash value: 4237910102-----------------------------------------------------------------------------------------------| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time    |-----------------------------------------------------------------------------------------------|  0 | SELECT STATEMENT            |                |    11 |  462 |    4  (0)| 00:00:01 ||  1 |  NESTED LOOPS                |                |    11 |  462 |    4  (0)| 00:00:01 ||  2 |  NESTED LOOPS              |                |    11 |  462 |    4  (0)| 00:00:01 ||  3 |    VIEW                      |                |    1 |    26 |    1  (0)| 00:00:01 ||  4 |    HASH GROUP BY            |                |    1 |    12 |    1  (0)| 00:00:01 ||*  5 |      INDEX RANGE SCAN        | XPKDSCLR_FLNG  |    11 |  132 |    1  (0)| 00:00:01 ||*  6 |    INDEX RANGE SCAN          | XIE1DSCLR_FLNG |    11 |      |    1  (0)| 00:00:01 ||  7 |  TABLE ACCESS BY INDEX ROWID| DSCLR_FLNG    |    11 |  176 |    3  (0)| 00:00:01 |-----------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------  5 - access("DSCLR_PK">7941071)  6 - access("DF"."DSCLR_PK"="DMF"."DSCLR_PK")Statistics----------------------------------------------------------          0  recursive calls          0  db block gets      56132  consistent gets          0  physical reads          0  redo size    1380915  bytes sent via SQL*Net to client      32526  bytes received via SQL*Net from client      2925  SQL*Net roundtrips to/from client          0  sorts (memory)          0  sorts (disk)      43859  rows processedSQL>
  • mathguy
    mathguy Member Posts: 10,156 Blue Diamond
    edited Sep 27, 2017 6:19PM

    In the first query, using analytic functions, you forgot to add the last filter - you are selecting all 48,539 rows with that DSCLR_PK condition, instead of just the 11 rows returned by the second query. Then you show an explain plan, seemingly for the same query (it only has the condition on DSCLR_PK in the predicate section), but it only shows 11 rows returned. Something is wrong there.

    In any case:

    If you want to create a table for testing (assuming you have enough privileges in your "sandbox"), you can create your own copy of the table, with fewer rows total, with something like

    CREATE TABLE TEST_TBL AS

        SELECT DSCLR_PK, FLNG_PK, INDVL_PK

        FROM DSCLR_FLNG

        WHERE DSCLR_PK > 7941071;

    and then you can re-create the indexes you had on the original table. Now you can test on TEST_TBL and you will not have to select from a (presumably much) larger table. With your current setup it is possible that most of the time goes into selecting the 43k rows to begin with, which will be the same for both queries.

This discussion has been closed.