Forum Stats

  • 3,838,140 Users
  • 2,262,335 Discussions
  • 7,900,523 Comments

Discussions

LEAD and LAG taking hours on a very large table!

12346

Answers

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,585 Red Diamond

    Hi Jonathan:

    LAG Function is slow when using ignore nulls (Doc ID 2811596.1)

    I am not sure why it says "This needs enhancement with the design or algorithm". I have no idea why it is using NTH_VALUE to begin with. LAST_VALUE (as I suggested in this topic)

    LAST_VALUE(CODE ignore nulls) OVER(ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) CODE
    

    seems like a simple to implement solution. And iterestingly enough LEAD IGNORE NULLS is using NTH_VALUE too but it works fast.

    SY.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,004 Blue Diamond

    Thanks,

    I found it a few minutes after I posted the question - I'd been too sophisticated with my choice of search terms but finally got it with "lag slow".

    I wrote an alternative version using the first_value() function rather than last_value() - and I think gives more of a clue about why Oracle may do a rewrite to nth_value() -- first_value() is just nth_value(xxx, 1); so if you can get rid of lead(), lag() and first_value(), last_value() by creating a single nth_value() function that's a good thing. (last_value() being first_value() in the opposite order, of course).

    I'm writing a blog note to explain the transformation and symmetry, and ask why the symmetry (in processing requirements, at leasat) breaks when lead() and lag() are turned into nth_value() - if I can describe it clearly enough I might figure out why it happens.

    Regards

    Jonathan Lewis

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,585 Red Diamond
    edited May 4, 2022 4:53PM

    Ah, I completely missed that LEAD/LAG second parameter indicates offset which explains why Oracle uses NTH_VALUE. However it doesn't explain why LAG that expands into

    NVL(NTH_VALUE("A1"."CODE",1) FROM LAST IGNORE NULLS
    OVER(ORDER BY "A1"."ID" ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),NULL)
    

    is slow while LEAD that expands into

    NVL(NTH_VALUE("A1"."CODE",1) IGNORE NULLS
    OVER(ORDER BY "A1"."ID" ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING),NULL)
    

    is fast.

    For whatever reason I though LEAD IGNORE NULLS was fast - not true. It is same as LAG - slow.

    SY.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,004 Blue Diamond
    edited May 4, 2022 5:20PM

    A simple hypothesis to explain the lead/lag difference.

    The LEAD is following the pre-ordered data set and carries on in the same direction to the next (and next and next ...) row to find the targeted value.

    The LAG may have been written to go to the beginning of the partition and copy out the the non-null data in order before scanning it in reverse to find the targeted value.

    Possible test case - create a data set where almost all the CODE values are null except for the very first and very last row of each partition to see if that affects the relative performance.

    Regards

    Jonathan Lewis


    Update - just tested this, in my model setting ed_ucc_dt (the prev/next thing) to null in every row. At this point the lag() and lead() calculations took the same time. Have you changed your data if you're now finding lag and lead behaving the same, they

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon
    edited May 4, 2022 5:01PM

    @Solomon Yakobson and @Jonathan Lewis

    When our IT DBA opened an SR with Oracle Support, they shared the same document and simply said that it won't get fixed anytime soon. So, good luck! They did not provide any other alternatives, they did not mention to use FIRST_VALUE / LAST_VALUE or use DBMS_PARALLEL_EXECUTE or anything else, despite our organization paying heavily for the Support contract.

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon
    edited May 4, 2022 5:15PM

    @Jonathan Lewis Please share your blog article when it becomes available, thanks!

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,004 Blue Diamond
    edited May 6, 2022 7:13PM
  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,585 Red Diamond

    Jonathan,

    What we know is (tested on 19C):

    Test table:

    DROP TABLE A PURGE
    /
    CREATE TABLE A
      AS
        SELECT  LEVEL ID,
                CASE
                  WHEN MOD(LEVEL,3) != 0 THEN 'CODE' || LEVEL
                END CODE
          FROM  DUAL
         CONNECT BY LEVEL < 20001
    /
    

    LAG IGNORE NULLS performance:

    DROP TABLE TEST_LAG PURGE
    /
    CREATE TABLE TEST_LAG
      AS
        SELECT  ID,
                CODE,
                LAG(CODE,5) IGNORE NULLS OVER(ORDER BY ID) CODE5
          FROM  A
    /
    
    Table created.
    
    Elapsed: 00:00:40.67
    SQL>
    

    LEAD IGNORE NULLS performance:

    DROP TABLE TEST_LEAD PURGE
    /
    CREATE TABLE TEST_LEAD
      AS
        SELECT  ID,
                CODE,
                LEAD(CODE,5) IGNORE NULLS OVER(ORDER BY ID) CODE5
          FROM  A
    /
    
    Table created.
    
    Elapsed: 00:00:00.09
    SQL>
    

    As we can see LAG IGNORE NULLS is slow while LEAD IGNORE NULLS isn't. Proposed workaround - convert LAG into LEAD by changing order to descending and nulls first (to make it generic - not needed if order by column is not null):

    DROP TABLE TEST_LAG_VIA_LEAD PURGE
    /
    CREATE TABLE TEST_LAG_VIA_LEAD
      AS
    SELECT  ID,
            CODE,
            LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID DESC NULLS FIRST) CODE5
      FROM  A
    /
    
    Table created.
    
    Elapsed: 00:00:00.11
    SQL>
    

    Now we compare both tables:

     SELECT  TEST_LAG.*,
             COUNT(*) OVER() CNT
       FROM  TEST_LAG
    MINUS
     SELECT  TEST_LAG_VIA_LEAD.*,
             COUNT(*) OVER() CNT
       FROM  TEST_LAG_VIA_LEAD
    /
    
    no rows selected
    
    Elapsed: 00:00:00.06
    SQL>
    

    Everyone is welcome to poke holes in my proposed workaround.

    SY.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,004 Blue Diamond

    Solomon,

    The drawback to that workaround is that if you want lag() and lead() in a single query then you have two different over() clauses, and will have to (window) sort the data twice. (And the second sort will have be carrying the columns created by the first sort).

    set serveroutput off
    
    CREATE TABLE TEST_LAG_VIA_LEAD
      AS
    SELECT  ID,
            CODE,
            LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID DESC NULLS FIRST) CODE5_lag,
            LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID      NULLS FIRST) CODE5_lead
      FROM  A
    /
    
    select * from table(dbms_xplan.display_cursor);
    
    ------------------------------------------------------------------------------------------------------
    | Id  | Operation                        | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------------------------------
    |   0 | CREATE TABLE STATEMENT           |                   |       |       |    28 (100)|          |
    |   1 |  LOAD AS SELECT                  | TEST_LAG_VIA_LEAD |       |       |            |          |
    |   2 |   OPTIMIZER STATISTICS GATHERING |                   | 20000 |   214K|    21  (24)| 00:00:01 |
    |   3 |    WINDOW SORT                   |                   | 20000 |   214K|    21  (24)| 00:00:01 |
    |   4 |     WINDOW SORT                  |                   | 20000 |   214K|    21  (24)| 00:00:01 |
    |   5 |      TABLE ACCESS FULL           | A                 | 20000 |   214K|    16   (0)| 00:00:01 |
    ------------------------------------------------------------------------------------------------------
    

    Regards

    Jonathan Lewis

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,585 Red Diamond

    Agreed, but that still will be way faster than what we have now :)

    SQL> CREATE TABLE TEST_LAG_VIA_LEAD
      2    AS
      3  SELECT  ID,
      4          CODE,
      5          LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID DESC NULLS FIRST) CODE5_lag,
      6          LEAD(CODE,5) IGNORE NULLS OVER ( ORDER BY ID      NULLS FIRST) CODE5_lead
      7    FROM  A
      8  /
    
    Table created.
    
    Elapsed: 00:00:00.15
    SQL>
    

    SY.