Forum Stats

  • 3,838,121 Users
  • 2,262,333 Discussions
  • 7,900,518 Comments

Discussions

LEAD and LAG taking hours on a very large table!

123457»

Answers

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,004 Blue Diamond


    The relative performance of lag() and lead() depends on a variety of things - most significantly the sizes of the partitions defined by the OVER() clause. Your example falls into a "worst case" area with just one partition of 20,000 rows. In a test case I set up with 20 rows per partition the ratio was was roughly 2.5 to 1, and with 40 rows per partition it was roughly 4.5 to 1.

    The fraction of nulls that have to be ignored to acquire the final values also makes a difference - in a test case where every row had a null for the column where nulls were to be ignored the lead() and lag() did very similar amounts of work. (That was an extreme case I was using to test a hypothesis, not an attempt to create a realistic model.)

    In the case of the OP with what looked as if it was going to be a very small number of rows per partition but a very large (156M row) table the effect of sorting twice (especially after adding an extra 10 columns or so derived columns for the second sort) could easily outweight the penalty of the lag() overhead.

    The OP, of course, was conveniently using lag(,1)/lead(,1) which could be translated into last_value() / first_value() pairs, bypassing the problem completely lagging with offsets greater than one might force a difficult choice between two sorts, a slow lag, or complex stratgies of minimum sizing for intermediate results generating a join result.

    Regards

    Jonathan Lewis

  • BluShadow
    BluShadow Member, Moderator Posts: 42,121 Red Diamond


    pssst.... you may want to quickly edit to correct Solomon's surname in your article. Think "Solomon son of Jacob" rather than "Solomon sen of Jacob" 😉

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,004 Blue Diamond

    @BluShadow


    Thank you, now corrected


    Regards

    Jonathan Lewis

    BluShadow
  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon

    @Jonathan Lewis You also tagged a different Sam than myself. Regardless, I read your blog and it reflects accurately what we have been experiencing, however, do you have any connections with the appropriate folks at Oracle who could actually fix this design flaw (bug)?

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,004 Blue Diamond

    I've fixed the tag - not sure how I missed the right one when I typed the whole thing in. sorry.

    Unfortunately I don't have a contact that would have a direct interest in this one. I can mention it to someone who might know who to pass it on to; even so I got the impression that it was going to be much work to change the algorithm (given the number of people who might benefit). Since the problem has been around since 11g (at least) and hasn't been fixed in 21c I suspect it's been given a very low priority.

    Regards

    Jonathan Lewis

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,004 Blue Diamond

    @Sam_P

    If you raise an SR and quote the MOS document ID, and reference the blog note, it might get this topic re-raised at Oracle. If anyone else who has been reading the thread and has an interest in seeing the fix does the same that might get enough attention for the algorithm to be revisited.

    Regards

    Jonathan Lewis

  • alvinder
    alvinder Member Posts: 437 Silver Badge

    @Sam_P

    Just curious did you manage to run the updated code.

    Was there any improvement in time.

    Thanks

    Alvinder