Discussions
Categories
- 197K All Categories
- 2.5K Data
- 546 Big Data Appliance
- 1.9K Data Science
- 450.8K Databases
- 221.9K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 552 MySQL Community Space
- 479 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.1K ORDS, SODA & JSON in the Database
- 555 SQLcl
- 4K SQL Developer Data Modeler
- 187.2K SQL & PL/SQL
- 21.4K SQL Developer
- 296.3K Development
- 17 Developer Projects
- 139 Programming Languages
- 293K Development Tools
- 110 DevOps
- 3.1K QA/Testing
- 646.1K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 158 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.2K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 19 Java Essentials
- 162 Java 8 Questions
- 86K Java Programming
- 81 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 205 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 468 LiveLabs
- 39 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 175 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 233 Portuguese
LEAD and LAG taking hours on a very large table!
Answers
-
Hi Jonathan:
LAG Function is slow when using ignore nulls (Doc ID 2811596.1)
I am not sure why it says "This needs enhancement with the design or algorithm". I have no idea why it is using NTH_VALUE to begin with. LAST_VALUE (as I suggested in this topic)
LAST_VALUE(CODE ignore nulls) OVER(ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) CODE
seems like a simple to implement solution. And iterestingly enough LEAD IGNORE NULLS is using NTH_VALUE too but it works fast.
SY.
-
Thanks,
I found it a few minutes after I posted the question - I'd been too sophisticated with my choice of search terms but finally got it with "lag slow".
I wrote an alternative version using the first_value() function rather than last_value() - and I think gives more of a clue about why Oracle may do a rewrite to nth_value() -- first_value() is just nth_value(xxx, 1); so if you can get rid of lead(), lag() and first_value(), last_value() by creating a single nth_value() function that's a good thing. (last_value() being first_value() in the opposite order, of course).
I'm writing a blog note to explain the transformation and symmetry, and ask why the symmetry (in processing requirements, at leasat) breaks when lead() and lag() are turned into nth_value() - if I can describe it clearly enough I might figure out why it happens.
Regards
Jonathan Lewis
-
Ah, I completely missed that LEAD/LAG second parameter indicates offset which explains why Oracle uses NTH_VALUE. However it doesn't explain why LAG that expands into
NVL(NTH_VALUE("A1"."CODE",1) FROM LAST IGNORE NULLS OVER(ORDER BY "A1"."ID" ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),NULL)
is slow while LEAD that expands intoNVL(NTH_VALUE("A1"."CODE",1) IGNORE NULLS OVER(ORDER BY "A1"."ID" ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING),NULL)
is fast.For whatever reason I though LEAD IGNORE NULLS was fast - not true. It is same as LAG - slow.
SY.
-
A simple hypothesis to explain the lead/lag difference.
The LEAD is following the pre-ordered data set and carries on in the same direction to the next (and next and next ...) row to find the targeted value.
The LAG may have been written to go to the beginning of the partition and copy out the the non-null data in order before scanning it in reverse to find the targeted value.
Possible test case - create a data set where almost all the CODE values are null except for the very first and very last row of each partition to see if that affects the relative performance.
Regards
Jonathan Lewis
Update - just tested this, in my model setting ed_ucc_dt (the prev/next thing) to null in every row. At this point the lag() and lead() calculations took the same time. Have you changed your data if you're now finding lag and lead behaving the same, they
-
@Solomon Yakobson and @Jonathan Lewis
When our IT DBA opened an SR with Oracle Support, they shared the same document and simply said that it won't get fixed anytime soon. So, good luck! They did not provide any other alternatives, they did not mention to use FIRST_VALUE / LAST_VALUE or use DBMS_PARALLEL_EXECUTE or anything else, despite our organization paying heavily for the Support contract.
-
@Jonathan Lewis Please share your blog article when it becomes available, thanks!
-
Blog note now published: https://jonathanlewis.wordpress.com/2022/05/05/lag-lead-slow/
Regards
Jonathan Lewis
-
Jonathan,
What we know is (tested on 19C):
Test table:
DROP TABLE A PURGE / CREATE TABLE A AS SELECT LEVEL ID, CASE WHEN MOD(LEVEL,3) != 0 THEN 'CODE' || LEVEL END CODE FROM DUAL CONNECT BY LEVEL < 20001 /
LAG IGNORE NULLS performance:
DROP TABLE TEST_LAG PURGE / CREATE TABLE TEST_LAG AS SELECT ID, CODE, LAG(CODE,5) IGNORE NULLS OVER(ORDER BY ID) CODE5 FROM A / Table created. Elapsed: 00:00:40.67 SQL>
LEAD IGNORE NULLS performance:
DROP TABLE TEST_LEAD PURGE / CREATE TABLE TEST_LEAD AS SELECT ID, CODE, LEAD(CODE,5) IGNORE NULLS OVER(ORDER BY ID) CODE5 FROM A / Table created. Elapsed: 00:00:00.09 SQL>
As we can see LAG IGNORE NULLS is slow while LEAD IGNORE NULLS isn't. Proposed workaround - convert LAG into LEAD by changing order to descending and nulls first (to make it generic - not needed if order by column is not null):
DROP TABLE TEST_LAG_VIA_LEAD PURGE / CREATE TABLE TEST_LAG_VIA_LEAD AS SELECT ID, CODE, LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID DESC NULLS FIRST) CODE5 FROM A / Table created. Elapsed: 00:00:00.11 SQL>
Now we compare both tables:
SELECT TEST_LAG.*, COUNT(*) OVER() CNT FROM TEST_LAG MINUS SELECT TEST_LAG_VIA_LEAD.*, COUNT(*) OVER() CNT FROM TEST_LAG_VIA_LEAD / no rows selected Elapsed: 00:00:00.06 SQL>
Everyone is welcome to poke holes in my proposed workaround.
SY.
-
Solomon,
The drawback to that workaround is that if you want lag() and lead() in a single query then you have two different over() clauses, and will have to (window) sort the data twice. (And the second sort will have be carrying the columns created by the first sort).
set serveroutput off CREATE TABLE TEST_LAG_VIA_LEAD AS SELECT ID, CODE, LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID DESC NULLS FIRST) CODE5_lag, LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID NULLS FIRST) CODE5_lead FROM A / select * from table(dbms_xplan.display_cursor); ------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------ | 0 | CREATE TABLE STATEMENT | | | | 28 (100)| | | 1 | LOAD AS SELECT | TEST_LAG_VIA_LEAD | | | | | | 2 | OPTIMIZER STATISTICS GATHERING | | 20000 | 214K| 21 (24)| 00:00:01 | | 3 | WINDOW SORT | | 20000 | 214K| 21 (24)| 00:00:01 | | 4 | WINDOW SORT | | 20000 | 214K| 21 (24)| 00:00:01 | | 5 | TABLE ACCESS FULL | A | 20000 | 214K| 16 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------
Regards
Jonathan Lewis
-
Agreed, but that still will be way faster than what we have now :)
SQL> CREATE TABLE TEST_LAG_VIA_LEAD 2 AS 3 SELECT ID, 4 CODE, 5 LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID DESC NULLS FIRST) CODE5_lag, 6 LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID NULLS FIRST) CODE5_lead 7 FROM A 8 / Table created. Elapsed: 00:00:00.15 SQL>
SY.