Discussions
Categories
- 197K All Categories
- 2.5K Data
- 546 Big Data Appliance
- 1.9K Data Science
- 450.8K Databases
- 221.9K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 552 MySQL Community Space
- 479 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.1K ORDS, SODA & JSON in the Database
- 555 SQLcl
- 4K SQL Developer Data Modeler
- 187.2K SQL & PL/SQL
- 21.4K SQL Developer
- 296.3K Development
- 17 Developer Projects
- 139 Programming Languages
- 293K Development Tools
- 110 DevOps
- 3.1K QA/Testing
- 646.1K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 158 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.2K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 19 Java Essentials
- 162 Java 8 Questions
- 86K Java Programming
- 81 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 205 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 468 LiveLabs
- 39 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 175 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 233 Portuguese
LEAD and LAG taking hours on a very large table!
Answers
-
The relative performance of lag() and lead() depends on a variety of things - most significantly the sizes of the partitions defined by the OVER() clause. Your example falls into a "worst case" area with just one partition of 20,000 rows. In a test case I set up with 20 rows per partition the ratio was was roughly 2.5 to 1, and with 40 rows per partition it was roughly 4.5 to 1.
The fraction of nulls that have to be ignored to acquire the final values also makes a difference - in a test case where every row had a null for the column where nulls were to be ignored the lead() and lag() did very similar amounts of work. (That was an extreme case I was using to test a hypothesis, not an attempt to create a realistic model.)
In the case of the OP with what looked as if it was going to be a very small number of rows per partition but a very large (156M row) table the effect of sorting twice (especially after adding an extra 10 columns or so derived columns for the second sort) could easily outweight the penalty of the lag() overhead.
The OP, of course, was conveniently using lag(,1)/lead(,1) which could be translated into last_value() / first_value() pairs, bypassing the problem completely lagging with offsets greater than one might force a difficult choice between two sorts, a slow lag, or complex stratgies of minimum sizing for intermediate results generating a join result.
Regards
Jonathan Lewis
-
pssst.... you may want to quickly edit to correct Solomon's surname in your article. Think "Solomon son of Jacob" rather than "Solomon sen of Jacob" 😉
-
-
@Jonathan Lewis You also tagged a different Sam than myself. Regardless, I read your blog and it reflects accurately what we have been experiencing, however, do you have any connections with the appropriate folks at Oracle who could actually fix this design flaw (bug)?
-
I've fixed the tag - not sure how I missed the right one when I typed the whole thing in. sorry.
Unfortunately I don't have a contact that would have a direct interest in this one. I can mention it to someone who might know who to pass it on to; even so I got the impression that it was going to be much work to change the algorithm (given the number of people who might benefit). Since the problem has been around since 11g (at least) and hasn't been fixed in 21c I suspect it's been given a very low priority.
Regards
Jonathan Lewis
-
If you raise an SR and quote the MOS document ID, and reference the blog note, it might get this topic re-raised at Oracle. If anyone else who has been reading the thread and has an interest in seeing the fix does the same that might get enough attention for the algorithm to be revisited.
Regards
Jonathan Lewis
-
Just curious did you manage to run the updated code.
Was there any improvement in time.
Thanks
Alvinder