need example of analytic function single pass

BluShadow

Ok, let's start by asking you what you think the answer to those questions is. Then we can understand your understanding before we answer.

OU_230

:-) I think single pass is accessing the table once, multi-pass is accessing the table multiple times

mathguy

Single pass vs multipass: that's the correct definition.

Then:

In the example below, I use the EMP table in the SCOTT schema, which exists on many (most?) Oracle installations. If you don't have it, consider any employee table in any of the schemas you have access to, or build your own. You need four columns: DEPTNO, EMPNO, ENAME and SAL (obvious meaning).

If there were no analytic functions, how would you solve the following problem?

Show all rows (DEPTNO, EMPNO, ENAME, SAL) where the salary is highest WITHIN THE DEPARTMENT. Note: there may be ties for "highest salary in the department"; in those cases, show all employees with the highest salary.

You could find the max in each department and then do a join; or, more efficiently, a semi-join (using an IN condition with a subquery).

Then how would you solve the same problem using analytic functions? I would suggest using the analytic version of MAX().

Then run both queries - convince yourself you get the same answer - and look at the execution plans. You will see that the solution without analytic functions must read the base table twice; the one with analytic functions only needs to read the table once.

1 person found this helpful

Frank Kulash

Hi,

Let's take an example using the scott.emp table (called scott_emp on my system).

Say we want to get some information about individual employees, and some information about departments as a whole (e.g., the total sal for each department), like this:

DEPTNO ENAME SAL DEPT_SAL

---------- ---------- ---------- ----------

10 CLARK 2450 8750

10 KING 5000 8750

10 MILLER 1300 8750

20 ADAMS 1100 10875

20 FORD 3000 10875

20 JONES 2975 10875

20 SCOTT 3000 10875

20 SMITH 800 10875

30 ALLEN 1600 9400

30 BLAKE 2850 9400

30 JAMES 950 9400

30 MARTIN 1250 9400

30 TURNER 1500 9400

30 WARD 1250 9400

Here's one way you might get those results:

WITH dept_summary AS

(

SELECT deptno

, SUM (sal) AS dept_sal

FROM scott_emp

GROUP BY deptno

)

SELECT e.deptno, e.ename, e.sal

, d.dept_sal

FROM scott_emp e

JOIN dept_summary d ON d.deptno = e.deptno

ORDER BY e.deptno, e.ename;

Let's look at the explain plain to see how that query was executed. That is

EXPLAIN PLAN FOR 
WITH    dept_summary    AS
...;
SELECT  * 
FROM    TABLE (dbms_xplan.display);

The output is

| Id  | Operation           | Name      | Rows  | Bytes | Cost (%CPU)| Time|
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |    70 |  2240 |     4   (0)| 00:00:01|
|   1 |  SORT GROUP BY      |           |    70 |  2240 |     4   (0)| 00:00:01|
|*  2 |   HASH JOIN         |           |    70 |  2240 |     4   (0)| 00:00:01|
|   3 |    TABLE ACCESS FULL| SCOTT_EMP |    14 |   350 |     2   (0)| 00:00:01|
|   4 |    TABLE ACCESS FULL| SCOTT_EMP |    14 |    98 |     2   (0)| 00:00:01
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("DEPTNO"="E"."DEPTNO")

As you can see, the scott_emp table was scanned completely 2 times.

We can get the same results using the analytic SUM function, like this:

SELECT    deptno, ename, sal
,         SUM (sal) OVER (PARTITION BY  deptno)
             AS dept_sal
FROM      scott_emp  m
ORDER BY  deptno, ename;

But look at the explain plan for this query:

| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |    14 |   182 |     2   (0)| 00:00:01 |
|   1 |  WINDOW SORT       |           |    14 |   182 |     2   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SCOTT_EMP |    14 |   182 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------

It only needs 1 full table scan. The analytic query computes the SUM while it is reading each row, so it doesn't need to make a second table scan.

1 person found this helpful

OU_230

Perfect! Thanks.

mathguy

Lol - I see; you would rather have someone else do the whole homework for you, rather than just help you with it. Did you try what I suggested, though?

OU_230

I do not have scott schema , I was trying to do it in livesql.oracle.com but not able to get explain plan to work. I am creating tables to in my own database to try it out now.

mathguy

You tried something on livesql... I am not sure if you can get explain plan from there? In any case, I assume you have access to a database, and you use a front-end like SQL*Plus or perhaps SQL Developer or Toad? Are you able to run EXPLAIN PLAN on your database, using your front-end application?

Do you have (access to) the HR schema? Many things we do in the SCOTT schema for illustration can be done in the HR schema. I believe scripts to install HR are available (for free) online, from Oracle itself; I am not sure if the same is true for SCOTT. Especially if you are going to use this forum in the future, it will be very helpful to have easy access to both.

1 person found this helpful

OU_230

I do not have access to HR or SCOTT. So, I tried in one of the tables I have. I can see multi-pass vs single -pass when we use the analytical function but I do not see any performance benefit.

Both completes in 10 seconds. I have where condition in both the queries as it was taking more that a minutes.

Multi-pass operation has slightly higher consistent gets but single - pass SQL with analytical function has sort ( memory).

Why is there no performance benefit?

-- single pass operation with analytical function
SQL>select dsclr_pk, flng_pk ,indvl_pk,  max(flng_pk) over (partition by dsclr_pk) max_flng_pk  from dsclr_flng  where dsclr_pk > 7941071
SQL> /
43859 rows selected.
Elapsed: 00:00:10.00
Execution Plan
----------------------------------------------------------
Plan hash value: 2087123764
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                |    11 |  176 |    3  (0)| 00:00:01 |
|  1 |  WINDOW BUFFER              |                |    11 |  176 |    3  (0)| 00:00:01 |
|  2 |  TABLE ACCESS BY INDEX ROWID| DSCLR_FLNG    |    11 |  176 |    3  (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | XIE1DSCLR_FLNG |    11 |      |    1  (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
  3 - access("DSCLR_PK">7941071)
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      42097  consistent gets
          0  physical reads
          0  redo size
    1371477  bytes sent via SQL*Net to client
      32526  bytes received via SQL*Net from client
      2925  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
      43859  rows processed
-- multi pass operation
SQL> select  df.dsclr_pk, df.flng_pk , df.indvl_pk , dmf.max_flng_pk from dsclr_flng  df  join
  2  (select dsclr_pk, max(flng_pk) max_flng_pk from dsclr_flng  where dsclr_pk > 7941071
  3  group by dsclr_pk ) dmf    on (df.dsclr_pk = dmf.dsclr_pk);
43859 rows selected.
Elapsed: 00:00:10.24
Execution Plan
----------------------------------------------------------
Plan hash value: 4237910102
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name          | Rows  | Bytes | Cost (%CPU)| Time    |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                |    11 |  462 |    4  (0)| 00:00:01 |
|  1 |  NESTED LOOPS                |                |    11 |  462 |    4  (0)| 00:00:01 |
|  2 |  NESTED LOOPS              |                |    11 |  462 |    4  (0)| 00:00:01 |
|  3 |    VIEW                      |                |    1 |    26 |    1  (0)| 00:00:01 |
|  4 |    HASH GROUP BY            |                |    1 |    12 |    1  (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN        | XPKDSCLR_FLNG  |    11 |  132 |    1  (0)| 00:00:01 |
|*  6 |    INDEX RANGE SCAN          | XIE1DSCLR_FLNG |    11 |      |    1  (0)| 00:00:01 |
|  7 |  TABLE ACCESS BY INDEX ROWID| DSCLR_FLNG    |    11 |  176 |    3  (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
  5 - access("DSCLR_PK">7941071)
  6 - access("DF"."DSCLR_PK"="DMF"."DSCLR_PK")
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      56132  consistent gets
          0  physical reads
          0  redo size
    1380915  bytes sent via SQL*Net to client
      32526  bytes received via SQL*Net from client
      2925  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      43859  rows processed
SQL>

mathguy

In the first query, using analytic functions, you forgot to add the last filter - you are selecting all 48,539 rows with that DSCLR_PK condition, instead of just the 11 rows returned by the second query. Then you show an explain plan, seemingly for the same query (it only has the condition on DSCLR_PK in the predicate section), but it only shows 11 rows returned. Something is wrong there.

In any case:

If you want to create a table for testing (assuming you have enough privileges in your "sandbox"), you can create your own copy of the table, with fewer rows total, with something like

CREATE TABLE TEST_TBL AS

SELECT DSCLR_PK, FLNG_PK, INDVL_PK

FROM DSCLR_FLNG

WHERE DSCLR_PK > 7941071;

and then you can re-create the indexes you had on the original table. Now you can test on TEST_TBL and you will not have to select from a (presumably much) larger table. With your current setup it is possible that most of the time goes into selecting the 43k rows to begin with, which will be the same for both queries.

OU_230

mathguy,

Why are you saying 2nd query returned 11 rows?

Although the 'estimated rows' in explain plan is 11 for both the queries, the 'rows selected' as well as auto trace stat 'rows processed' is showing 43859. In addition, predicate information for both the explain plan has " access("DSCLR_PK">7941071)"

I have privilege to create table.

" With your current setup it is possible that most of the time goes into selecting the 43k rows to begin with, which will be the same for both queries."

Are you saying most time is spent on FETCH rather than EXECUTE? Wouldn't the multipass SQL with no analytical function, access 43K rows TWICE so it should be slower?

SQL & PL/SQL

need example of analytic function single pass

Comments

Post Details