Discussions
Categories
- 197.1K All Categories
- 2.5K Data
- 546 Big Data Appliance
- 1.9K Data Science
- 450.7K Databases
- 221.9K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 552 MySQL Community Space
- 479 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.1K ORDS, SODA & JSON in the Database
- 555 SQLcl
- 4K SQL Developer Data Modeler
- 187.2K SQL & PL/SQL
- 21.3K SQL Developer
- 296.3K Development
- 17 Developer Projects
- 139 Programming Languages
- 293K Development Tools
- 110 DevOps
- 3.1K QA/Testing
- 646.1K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 158 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.2K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 19 Java Essentials
- 162 Java 8 Questions
- 86K Java Programming
- 81 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 205 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 468 LiveLabs
- 39 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 175 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 233 Portuguese
need example of analytic function single pass

Could someone please explain what is a single pass vs multipass operation and how analytic function could make SQL single pass to improve performance?
Answers
-
Ok, let's start by asking you what you think the answer to those questions is. Then we can understand your understanding before we answer.
-
:-) I think single pass is accessing the table once, multi-pass is accessing the table multiple times
-
Single pass vs multipass: that's the correct definition.
Then:
In the example below, I use the EMP table in the SCOTT schema, which exists on many (most?) Oracle installations. If you don't have it, consider any employee table in any of the schemas you have access to, or build your own. You need four columns: DEPTNO, EMPNO, ENAME and SAL (obvious meaning).
If there were no analytic functions, how would you solve the following problem?
Show all rows (DEPTNO, EMPNO, ENAME, SAL) where the salary is highest WITHIN THE DEPARTMENT. Note: there may be ties for "highest salary in the department"; in those cases, show all employees with the highest salary.
You could find the max in each department and then do a join; or, more efficiently, a semi-join (using an IN condition with a subquery).
Then how would you solve the same problem using analytic functions? I would suggest using the analytic version of MAX().
Then run both queries - convince yourself you get the same answer - and look at the execution plans. You will see that the solution without analytic functions must read the base table twice; the one with analytic functions only needs to read the table once.
-
Hi,
Let's take an example using the scott.emp table (called scott_emp on my system).
Say we want to get some information about individual employees, and some information about departments as a whole (e.g., the total sal for each department), like this:
DEPTNO ENAME SAL DEPT_SAL---------- ---------- ---------- ---------- 10 CLARK 2450 8750 10 KING 5000 8750 10 MILLER 1300 8750 20 ADAMS 1100 10875 20 FORD 3000 10875 20 JONES 2975 10875 20 SCOTT 3000 10875 20 SMITH 800 10875 30 ALLEN 1600 9400 30 BLAKE 2850 9400 30 JAMES 950 9400 30 MARTIN 1250 9400 30 TURNER 1500 9400 30 WARD 1250 9400
Here's one way you might get those results:
WITH dept_summary AS( SELECT deptno , SUM (sal) AS dept_sal FROM scott_emp GROUP BY deptno)SELECT e.deptno, e.ename, e.sal, d.dept_salFROM scott_emp eJOIN dept_summary d ON d.deptno = e.deptnoORDER BY e.deptno, e.ename;
Let's look at the explain plain to see how that query was executed. That is
EXPLAIN PLAN FOR WITH dept_summary AS...;SELECT * FROM TABLE (dbms_xplan.display);
The output is
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time|---------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 70 | 2240 | 4 (0)| 00:00:01|| 1 | SORT GROUP BY | | 70 | 2240 | 4 (0)| 00:00:01||* 2 | HASH JOIN | | 70 | 2240 | 4 (0)| 00:00:01|| 3 | TABLE ACCESS FULL| SCOTT_EMP | 14 | 350 | 2 (0)| 00:00:01|| 4 | TABLE ACCESS FULL| SCOTT_EMP | 14 | 98 | 2 (0)| 00:00:01--------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - access("DEPTNO"="E"."DEPTNO")
As you can see, the scott_emp table was scanned completely 2 times.
We can get the same results using the analytic SUM function, like this:
SELECT deptno, ename, sal, SUM (sal) OVER (PARTITION BY deptno) AS dept_salFROM scott_emp mORDER BY deptno, ename;
But look at the explain plan for this query:
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |--------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 14 | 182 | 2 (0)| 00:00:01 || 1 | WINDOW SORT | | 14 | 182 | 2 (0)| 00:00:01 || 2 | TABLE ACCESS FULL| SCOTT_EMP | 14 | 182 | 2 (0)| 00:00:01 |--------------------------------------------------------------------------------
It only needs 1 full table scan. The analytic query computes the SUM while it is reading each row, so it doesn't need to make a second table scan.
-
Perfect! Thanks.
-
Lol - I see; you would rather have someone else do the whole homework for you, rather than just help you with it. Did you try what I suggested, though?
-
I do not have scott schema , I was trying to do it in livesql.oracle.com but not able to get explain plan to work. I am creating tables to in my own database to try it out now.
-
You tried something on livesql... I am not sure if you can get explain plan from there? In any case, I assume you have access to a database, and you use a front-end like SQL*Plus or perhaps SQL Developer or Toad? Are you able to run EXPLAIN PLAN on your database, using your front-end application?
Do you have (access to) the HR schema? Many things we do in the SCOTT schema for illustration can be done in the HR schema. I believe scripts to install HR are available (for free) online, from Oracle itself; I am not sure if the same is true for SCOTT. Especially if you are going to use this forum in the future, it will be very helpful to have easy access to both.
-
I do not have access to HR or SCOTT. So, I tried in one of the tables I have. I can see multi-pass vs single -pass when we use the analytical function but I do not see any performance benefit.
Both completes in 10 seconds. I have where condition in both the queries as it was taking more that a minutes.
Multi-pass operation has slightly higher consistent gets but single - pass SQL with analytical function has sort ( memory).
Why is there no performance benefit?
-- single pass operation with analytical functionSQL>select dsclr_pk, flng_pk ,indvl_pk, max(flng_pk) over (partition by dsclr_pk) max_flng_pk from dsclr_flng where dsclr_pk > 7941071SQL> /43859 rows selected.Elapsed: 00:00:10.00Execution Plan----------------------------------------------------------Plan hash value: 2087123764-----------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |-----------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 11 | 176 | 3 (0)| 00:00:01 || 1 | WINDOW BUFFER | | 11 | 176 | 3 (0)| 00:00:01 || 2 | TABLE ACCESS BY INDEX ROWID| DSCLR_FLNG | 11 | 176 | 3 (0)| 00:00:01 ||* 3 | INDEX RANGE SCAN | XIE1DSCLR_FLNG | 11 | | 1 (0)| 00:00:01 |-----------------------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 3 - access("DSCLR_PK">7941071)Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 42097 consistent gets 0 physical reads 0 redo size 1371477 bytes sent via SQL*Net to client 32526 bytes received via SQL*Net from client 2925 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 43859 rows processed-- multi pass operationSQL> select df.dsclr_pk, df.flng_pk , df.indvl_pk , dmf.max_flng_pk from dsclr_flng df join 2 (select dsclr_pk, max(flng_pk) max_flng_pk from dsclr_flng where dsclr_pk > 7941071 3 group by dsclr_pk ) dmf on (df.dsclr_pk = dmf.dsclr_pk);43859 rows selected.Elapsed: 00:00:10.24Execution Plan----------------------------------------------------------Plan hash value: 4237910102-----------------------------------------------------------------------------------------------| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |-----------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 11 | 462 | 4 (0)| 00:00:01 || 1 | NESTED LOOPS | | 11 | 462 | 4 (0)| 00:00:01 || 2 | NESTED LOOPS | | 11 | 462 | 4 (0)| 00:00:01 || 3 | VIEW | | 1 | 26 | 1 (0)| 00:00:01 || 4 | HASH GROUP BY | | 1 | 12 | 1 (0)| 00:00:01 ||* 5 | INDEX RANGE SCAN | XPKDSCLR_FLNG | 11 | 132 | 1 (0)| 00:00:01 ||* 6 | INDEX RANGE SCAN | XIE1DSCLR_FLNG | 11 | | 1 (0)| 00:00:01 || 7 | TABLE ACCESS BY INDEX ROWID| DSCLR_FLNG | 11 | 176 | 3 (0)| 00:00:01 |-----------------------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 5 - access("DSCLR_PK">7941071) 6 - access("DF"."DSCLR_PK"="DMF"."DSCLR_PK")Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 56132 consistent gets 0 physical reads 0 redo size 1380915 bytes sent via SQL*Net to client 32526 bytes received via SQL*Net from client 2925 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 43859 rows processedSQL>
-
In the first query, using analytic functions, you forgot to add the last filter - you are selecting all 48,539 rows with that DSCLR_PK condition, instead of just the 11 rows returned by the second query. Then you show an explain plan, seemingly for the same query (it only has the condition on DSCLR_PK in the predicate section), but it only shows 11 rows returned. Something is wrong there.
In any case:
If you want to create a table for testing (assuming you have enough privileges in your "sandbox"), you can create your own copy of the table, with fewer rows total, with something like
CREATE TABLE TEST_TBL AS
SELECT DSCLR_PK, FLNG_PK, INDVL_PK
FROM DSCLR_FLNG
WHERE DSCLR_PK > 7941071;
and then you can re-create the indexes you had on the original table. Now you can test on TEST_TBL and you will not have to select from a (presumably much) larger table. With your current setup it is possible that most of the time goes into selecting the 43k rows to begin with, which will be the same for both queries.