Forum Stats

  • 3,838,143 Users
  • 2,262,336 Discussions
  • 7,900,525 Comments

Discussions

LEAD and LAG taking hours on a very large table!

24567

Answers

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon

    Thanks @Jonathan Lewis for sticking by.

    My apologies in advance if I misunderstand the information you're asking -- by "length" do you mean this info below?


    Please let me know how do I get you the information you need, if the above is not what you're after.

    What do you mean when you say "unique number" column? Do you mean something like an "ID" column with auto-generated unique values that would act as a Primary Key in this intermediate table?

    Just an FYI -- the ROW_NUM column in my screenshots in earlier posts is not needed as it was only used for my own testing purposes. There is no physical column called ROW_NUM.

    A few things to note:-

    1. Sequence Number columns ED_UCC_SEQNUM and DAD_ACUTE_SEQNUM contain UNIQUE values.

    2. ED_UCC_SEQNUM produces 2 additional PREV + NEXT aliased columns.

    3. DAD_ACUTE_SEQNUM produces 2 additional PREV + NEXT aliased columns.

    4. Date columns VISIT_DATE_DT, ED_UCC_DT, DAD_ACUTE_DISP_DT and DAD_ACUTE_ADMIT_DT are physical table columns being referred to in the SELECT query.

    5. VISIT_DATE_DT does not produce any aliased (calculated) column but is being referred to in OVER (...ORDER BY clause).

    6. ED_UCC_DT produces 2 additional PREV + NEXT aliased columns.

    7. DAD_ACUTE_DISP_DT produces only 1 PREV aliased column captioned PREV_DAD_DISP_AC_DT.

    8. DAD_ACUTE_ADMIT_DT produces only 1 NEXT aliased column captioned NEXT_DAD_DISP_AC_DT.

    9. ULIMORE is a non-unique VARCHAR2 column that does not produce any aliased columns.


    So, in this intermediate table that I'd create, would I need the following minimum columns, in this order:-

    ("ID", ULIMORE, VISIT_DATE_DT, ED_UCC_DT, ED_UCC_SEQNUM,DAD_ACUTE_DISP_DT, DAD_ACUTE_ADMIT_DT, DAD_ACUTE_SEQNUM)

    OR without the VISIT_DATE_DT column, as:

    ("ID", ULIMORE, ED_UCC_DT, ED_UCC_SEQNUM,DAD_ACUTE_DISP_DT, DAD_ACUTE_ADMIT_DT, DAD_ACUTE_SEQNUM)


    Thanks!

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,004 Blue Diamond

    Yes, the avg_row_len is a good enough indicator for a rough calculation.

    Unique number column - you have two of them: ED_UCC_SEQNUM and on DAD_ACUTE_SEQNUM, if either is declared not null then that can be used as the unique key to join the generated table back to tb_source.

    If you create the table to hold only one of the unique numbers

    • One of (1) above

    the 4 dates (2 x next/prev)

    • (6), (7) and (8) above - but I misread dad_acute as begin next/prev on the same date

    and the 4 sequence number (2 x next prev)

    • (2) and (3) above


    This is all the generated data from the leads and lags, and one column that is a unique identifier to join to the original TB_SOURCE table. That's all you would need in the intermediate table. The fact that your query used other columns to generate these results is irrelevant so long as you can join back correctly to get the rest of the row when you need it.

    I missed some overheads on my first estimate - allowing some guesswork for numeric sizes and memory overheads for sorting 1 number @ 5 bytes, 4 numbers at 3 bytes, 4 dates @ 7 bytes, 2 bytes per column o/head, 4 bytes per row o/head: 67 bytes per row for sorting that's about 9 GB memory. The table will be a little smaller (8GB) because the column overhead is only 1 byte per column but the row overhead will be 5 bytes. (To save space, create the table with pctfree = 0 - which might be a point worth mentioning for lots of other intermediate tables).

    Regards

    Jonathan Lewis

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon
    edited Apr 4, 2022 1:17PM

    Thanks @Jonathan Lewis .

    To give you more column details in the TB_SOURCE table:-


    Notes:

    1. The unique number columns ED_UCC_SEQNUM and DAD_ACUTE_SEQNUM are not declared as NOT NULL because the end user had produced this table using CTAS (by reading it from a prior table in a chain) and no further constraints were added to this table. These 2 columns are only unique in terms of their DISTINCT values but they do not span ALL the rows (155,093,878), as per the image above. The closest column which comes close is probably SEQNUM but even, it doesn't span ALL the rows. Does this matter whether we choose to include ED_UCC_SEQNUM or DAD_ACUTE_SEQNUM column in TB_INTERMEDIATE to act as a unique key to join back to TB_SOURCE?


    2. The current size of the TB_SOURCE table is:-


    3. Just so that I don't mess things up when creating the TB_INTERMEDIATE, should I create it as normal HEAP table or as an IOT table?

    Once you respond to this, I'll post a table creation statement for your review before I create the empty table and start populating it with 156M rows.


    4. I have never worked with an IOT table but reading up on it yesterday, it looks like that I have to specify which columns are to be marked as "key" columns and which columns are to be marked as "overflow" columns in the table creation statement.

    Once you respond to this, I'll post a table creation statement for your review before I create the empty table and start populating it with 156M rows.


    Many thanks!

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon

    Just wanted to provide a quick update @Jonathan Lewis .

    I spoke with the Data Scientist today and he mentioned that this intermediate table must include, at the very minimum, 3 columns: SEQNUM, ULIMORE, VISIT_DATE_DT as these 3 together do uniquely identify a record in TB_SOURCE table, hence, he would need it also in the TB_INTERMEDIATE in order for him to do the JOIN or look back up to TB_SOURCE. I did verify that these 3 composite columns do result into no duplicates and the SUM of the count of rows returned for all these 3 columns combined does equal to the exact NUM_ROWS in TB_SOURCE.


    I then proceeded to issue this CTAS:-

    CREATE TABLE TB_INTERMEDIATE 
    PCTFREE 0 
    PARALLEL 8 
    NOLOGGING 
    AS 
    SELECT /*+ FULL(p) PARALLEL(8) */ 
        p.SEQNUM,
        p.ULIMORE,
        p.VISIT_DATE_DT,
         
        LAG(p.ED_UCC_DT      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
        LEAD(p.ED_UCC_DT     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
         
        LAG(p.ED_UCC_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
        LEAD(p.ED_UCC_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
         
        LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
        LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
         
        LAG(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
        LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
    
    FROM
        TB_SOURCE p
    ;
    
    


    Notes:-

    1. I have explicitly specified PCTFREE 0 above based on your recommendation.

    2. There are 8 parallel threads running at the moment.

    3. The above CTAS is still running after 4.5 hrs have already passed.

    4. Here's the execution plan (while it's still running):-


  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,586 Red Diamond
    edited Apr 5, 2022 9:27AM

    As I already mentioned - you need index to a) trade large table full scan for way smaller indedx full scan b) which is even more important trade costly window sort for less costly window buffer. Compare:

    SQL> DROP TABLE TB_SOURCE PURGE
      2  /
    
    Table dropped.
    
    SQL> CREATE TABLE TB_SOURCE(
      2                         SEQNUM            VARCHAR2(18),
      3                         ULIMORE           VARCHAR2(15),
      4                         VISIT_DATE_DT     DATE,
      5                         ED_UCC_DT         DATE,
      6                         ED_UCC_SEQNUM     VARCHAR2(18),
      7                         DAD_ACUTE_DISP_DT DATE,
      8                         DAD_ACUTE_SEQNUM  VARCHAR2(18)
      9                        )
     10  /
    
    Table created.
    
    SQL> EXPLAIN PLAN FOR
      2  CREATE TABLE TB_INTERMEDIATE
      3  PCTFREE 0
      4  PARALLEL 8
      5  NOLOGGING
      6  AS
      7  SELECT /*+ PARALLEL(8) */
      8      p.SEQNUM,
      9      p.ULIMORE,
     10      p.VISIT_DATE_DT,
     11      LAG(p.ED_UCC_DT          IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
     12      LEAD(p.ED_UCC_DT         IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
     13      LAG(p.ED_UCC_SEQNUM      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
     14      LEAD(p.ED_UCC_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
     15      LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
     16      LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
     17      LAG(p.DAD_ACUTE_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
     18      LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
     19  FROM
     20      TB_SOURCE p
     21  /
    
    Explained.
    
    SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY)
      2  /
    
    PLAN_TABLE_OUTPUT
    ---------------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 516057336
    
    -----------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
    -----------------------------------------------------------------------------------------------------------------------------------
    |   0 | CREATE TABLE STATEMENT             |                 |     1 |    69 |     3  (34)| 00:00:01 |        |      |            |
    |   1 |  PX COORDINATOR                    |                 |       |       |            |          |        |      |            |
    |   2 |   PX SEND QC (RANDOM)              | :TQ10001        |     1 |    69 |     3  (34)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
    |   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_INTERMEDIATE |       |       |            |          |  Q1,01 | PCWP |            |
    |   4 |     OPTIMIZER STATISTICS GATHERING |                 |     1 |    69 |     3  (34)| 00:00:01 |  Q1,01 | PCWP |            |
    |   5 |      WINDOW SORT                   |                 |     1 |    69 |     3  (34)| 00:00:01 |  Q1,01 | PCWP |            |
    |   6 |       PX RECEIVE                   |                 |     1 |    69 |     2   (0)| 00:00:01 |  Q1,01 | PCWP |            |
    |   7 |        PX SEND HASH                | :TQ10000        |     1 |    69 |     2   (0)| 00:00:01 |  Q1,00 | P->P | HASH       |
    |   8 |         PX BLOCK ITERATOR          |                 |     1 |    69 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
    |   9 |          TABLE ACCESS FULL         | TB_SOURCE       |     1 |    69 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
    -----------------------------------------------------------------------------------------------------------------------------------
    
    Note
    -----
       - dynamic statistics used: dynamic sampling (level=2)
       - Degree of Parallelism is 8 because of hint
    
    21 rows selected.
    
    SQL> CREATE INDEX TB_SOURCE_IDX1
      2    ON TB_SOURCE(
      3                 ULIMORE,
      4                 VISIT_DATE_DT,
      5                 SEQNUM,
      6                 ED_UCC_DT,
      7                 ED_UCC_SEQNUM,
      8                 DAD_ACUTE_DISP_DT,
      9                 DAD_ACUTE_SEQNUM,
     10                 1 -- we need FBI since all the above index columns are NULLable
     11                )
     12  /
    
    Index created.
    
    SQL> EXPLAIN PLAN FOR
      2  CREATE TABLE TB_INTERMEDIATE
      3  PCTFREE 0
      4  PARALLEL 8
      5  NOLOGGING
      6  AS
      7  SELECT /*+ PARALLEL(8) */
      8      p.SEQNUM,
      9      p.ULIMORE,
     10      p.VISIT_DATE_DT,
     11      LAG(p.ED_UCC_DT          IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
     12      LEAD(p.ED_UCC_DT         IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
     13      LAG(p.ED_UCC_SEQNUM      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
     14      LEAD(p.ED_UCC_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
     15      LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
     16      LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
     17      LAG(p.DAD_ACUTE_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
     18      LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
     19  FROM
     20      TB_SOURCE p
     21  /
    
    Explained.
    
    SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY)
      2  /
    
    PLAN_TABLE_OUTPUT
    ---------------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 4041818352
    
    -----------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
    -----------------------------------------------------------------------------------------------------------------------------------
    |   0 | CREATE TABLE STATEMENT             |                 |     1 |    69 |     2   (0)| 00:00:01 |        |      |            |
    |   1 |  PX COORDINATOR                    |                 |       |       |            |          |        |      |            |
    |   2 |   PX SEND QC (RANDOM)              | :TQ10001        |     1 |    69 |     1   (0)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
    |   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_INTERMEDIATE |       |       |            |          |  Q1,01 | PCWP |            |
    |   4 |     OPTIMIZER STATISTICS GATHERING |                 |     1 |    69 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
    |   5 |      PX RECEIVE                    |                 |     1 |    69 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
    |   6 |       PX SEND ROUND-ROBIN          | :TQ10000        |     1 |    69 |     1   (0)| 00:00:01 |  Q1,00 | S->P | RND-ROBIN  |
    |   7 |        PX SELECTOR                 |                 |       |       |            |          |  Q1,00 | SCWC |            |
    |   8 |         WINDOW BUFFER              |                 |     1 |    69 |     1   (0)| 00:00:01 |  Q1,00 | SCWC |            |
    |   9 |          INDEX FULL SCAN           | TB_SOURCE_IDX1  |     1 |    69 |     1   (0)| 00:00:01 |  Q1,00 | SCWP |            |
    -----------------------------------------------------------------------------------------------------------------------------------
    
    Note
    -----
       - dynamic statistics used: dynamic sampling (level=2)
       - Degree of Parallelism is 8 because of hint
    
    21 rows selected.
    
    SQL>
    

    And, you don't need intermediate table.

    SY.

    P.S. You have LAG/LEAD sets for all columns except LAG for DAD_ACUTE_DISP_DT but LEAD for DAD_ACUTE_ADMIT_DT. I assumed it was a typo and changed LEAD to DAD_ACUTE_DISP_DT. Add DAD_ACUTE_ADMIT_DT to index if it wasn't a typo.

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon
    edited Apr 5, 2022 6:28PM

    Thanks @Solomon Yakobson . It wasn't a typo but they're 2 different columns, which I added to the index creation statement, as shown:-

    --This took 14 mins
    CREATE INDEX IX_COMPOSITE_COLUMNS ON TB_SOURCE
    (
        SEQNUM,
        ULIMORE,
        VISIT_DATE_DT,     
        ED_UCC_DT,
        ED_UCC_SEQNUM,
        DAD_ACUTE_DISP_DT,
        DAD_ACUTE_ADMIT_DT,
        DAD_ACUTE_SEQNUM
    ) 
    TABLESPACE "ANALYTICS_INDEX" 
    PARALLEL 8 
    NOLOGGING 
    COMPUTE STATISTICS
    ;
    /
    

    I then proceeded to run an EXPLAIN PLAN:-

    EXPLAIN PLAN FOR 
    CREATE TABLE TB_TARGET 
    PCTFREE 0 
    PARALLEL 8 
    NOLOGGING 
    AS 
    SELECT /*+ PARALLEL(8) */ 
        p.SEQNUM,
        p.ULIMORE,
        p.VISIT_DATE_DT,
         
        LAG(p.ED_UCC_DT      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
        LEAD(p.ED_UCC_DT     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
         
        LAG(p.ED_UCC_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
        LEAD(p.ED_UCC_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
         
        LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
        LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
         
        LAG(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
        LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
    
    FROM
        TB_SOURCE p
    ;
    /
    


    Oracle's optimizer is choosing to do a FULL TABLE SCAN and it is not picking up the index. I also explicitly added an an INDEX hint in the SELECT but still, it is NOT picking up the index.


  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,586 Red Diamond
    edited Apr 5, 2022 7:02PM

    You missed:

    1 -- we need FBI since all the above index columns are NULLable.

    Optimizer can't use index when all index columns are NULLable sine rows where all such columns are NULL aren't indexed. That's why we need to add something that's not null. I added constant 1:

    SQL> CREATE INDEX TB_SOURCE_IDX1
      2    ON TB_SOURCE(
      3                 ULIMORE,
      4                 VISIT_DATE_DT,
      5                 SEQNUM,
      6                 ED_UCC_DT,
      7                 ED_UCC_SEQNUM,
      8                 DAD_ACUTE_DISP_DT,
      9                 DAD_ACUTE_SEQNUM,
     10                 1 -- we need FBI since all the above index columns are NULLable
     11                )
     12  /
    

    SY.

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon
    edited Apr 5, 2022 8:00PM

    Thanks @Solomon Yakobson , I totally missed that!

    I recreated the index with FBI constant (Btw, I didn't know this technique of creating a Function-Based Index). Here's the EXPLAIN PLAN:-


    I am now executing the CTAS and will wait for it to finish and will let you know how long it takes.

    Thanks!

  • alvinder
    alvinder Member Posts: 437 Silver Badge
    edited Apr 6, 2022 2:02AM


    Another way to get the this done quickly. 155M rows are not that many.


    I have noticed that doing something on the big table it is better to do it using dbms_parallel_execute.

    I have assumed ULIMORE    as a  number but if it is a string you may still be able to use this code. 
    I haven't tested on string 
    




    DROP TABLE TB_SOURCE PURGE;
    
    CREATE TABLE TB_SOURCE(
                  SEQNUM      VARCHAR2(18),
                  ULIMORE      VARCHAR2(15),
                  VISIT_DATE_DT   DATE,
                  ED_UCC_DT     DATE,
                  ED_UCC_SEQNUM   VARCHAR2(18),
                  DAD_ACUTE_DISP_DT DATE,
                  DAD_ACUTE_SEQNUM VARCHAR2(18)
                  )
     /
    --Creating sample data.              
    INSERT INTO tb_source
      SELECT
        ROWNUM
       , round(
          dbms_random.value(
            1
           , 1000
          )
        )
       , sysdate - ROWNUM
       , sysdate - ROWNUM
       , ROWNUM
       , sysdate - ROWNUM
       , ROWNUM
      FROM
        dual
      CONNECT BY
        level < 10000;     
    
           
    
     CREATE TABLE TB_SOURCE_final(
                  SEQNUM      VARCHAR2(18),
                  ULIMORE      VARCHAR2(15),
                  VISIT_DATE_DT   DATE,
                  ED_UCC_DT     DATE,
                  ED_UCC_SEQNUM   VARCHAR2(18),
                  DAD_ACUTE_DISP_DT DATE,
                  DAD_ACUTE_SEQNUM VARCHAR2(18),
                  ED_UCC_DT_prev     DATE,
                  ED_UCC_DT_next     DATE,
                  ED_UCC_SEQNUM_prev   VARCHAR2(18),
                  ED_UCC_SEQNUM_next   VARCHAR2(18),
                  DAD_ACUTE_DISP_DT_prev DATE,
                   DAD_ACUTE_DISP_DT_next DATE,
                  DAD_ACUTE_SEQNUM_prev VARCHAR2(18),
                   DAD_ACUTE_SEQNUM_next VARCHAR2(18)
                 )
     /
     
     
     create index tb_source_ix1 on tb_source(
                  ULIMORE      ,
                  VISIT_DATE_DT   ,
                  SEQNUM,
                  ED_UCC_DT     ,
                  ED_UCC_SEQNUM   ,
                  DAD_ACUTE_DISP_DT ,
                  DAD_ACUTE_SEQNUM  
                  ) parallel(degree 8);
    Alter index tb_source_ix1 noparallel;
       
             
     
    CREATE OR REPLACE PROCEDURE insert_parallel_execute AS
      l_chunk_sql VARCHAR2(1000);
      l_sql_stmt VARCHAR2(32000);
      l_try    NUMBER;
      l_status  NUMBER;
    BEGIN
      BEGIN
        dbms_parallel_execute.drop_task('insert_aps');
      EXCEPTION
        WHEN OTHERS THEN
          NULL;
      END;
     -- Create the TASK
      dbms_parallel_execute.create_task('insert_aps');
    
     -- Chunk the table by MANAGER_ID
      l_chunk_sql := 'WITH t AS (
      SELECT DISTINCT
        ulimore
      FROM
        tb_source
    ), t1 AS (
      SELECT
        ulimore
       , NTILE(100)
         OVER(
          ORDER BY
            ulimore
         ) bucket
      FROM
        t
    )
    SELECT
      bucket
     , MIN(ulimore)start_id
     , MAX(ulimore) end_id
    FROM
      t1
    GROUP BY
      bucket;';
      dbms_parallel_execute.create_chunks_by_sql(
                           'insert_aps'
                          , l_chunk_sql
                          , false
      );
    
     -- Execute the DML in parallel
     --  the WHERE clause contain a condition on manager_id, which is the chunk
     --  column. In this case, grouping rows is by manager_id.
      l_sql_stmt := ' insert into TB_SOURCE_final
     SELECT 
        p.SEQNUM,
        p.ULIMORE,
        p.VISIT_DATE_DT,
        LAG(p.ED_UCC_DT     IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_ED_UCC_DT,
        LEAD(p.ED_UCC_DT     IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_ED_UCC_DT,
        LAG(p.ED_UCC_SEQNUM   IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_ED_UCC_SEQNUM,
        LEAD(p.ED_UCC_SEQNUM   IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_ED_UCC_SEQNUM,
        LAG(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_DAD_DISP_AC_DT,
        LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_DAD_DISP_AC_DT,
        LAG(p.DAD_ACUTE_SEQNUM  IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_DAD_DISP_AC_SEQNUM,
        LEAD(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_DAD_DISP_AC_SEQNUM
      FROM
        TB_SOURCE p
        where ulimore between :start_id and :end_id ';
      dbms_parallel_execute.run_task(
                     'insert_aps'
                    , l_sql_stmt
                    , dbms_sql.native
                    , parallel_level => 15
      );
    END insert_parallel_execute;
    /
    
    
    
    begin
    insert_parallel_execute;
    end;
    /
    
    
    you can monitor this in a different session.
    
    
    SELECT status, COUNT(*)
    FROM   user_parallel_execute_chunks
    WHERE task_name = 'insert_aps'
    GROUP BY status
    ORDER BY status;
    


      

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,586 Red Diamond

    Show us how you created index. I have a feeling it wasn't created right. It MUST start with same column as in PARTITION BY analytic functions, otherwise it will not trade WINDOW SORT (cost = 2, %cpu = 50 ) for less costly WINDOWS BUFFER (cost = 1,%cpu = 0). Compare plans (SORT operation) when ULIMORE is index first column and when it is not:

    SQL> CREATE TABLE TB_SOURCE(
      2                         SEQNUM             VARCHAR2(18),
      3                         ULIMORE            VARCHAR2(15),
      4                         VISIT_DATE_DT      DATE,
      5                         ED_UCC_DT          DATE,
      6                         ED_UCC_SEQNUM      VARCHAR2(18),
      7                         DAD_ACUTE_DISP_DT  DATE,
      8                         DAD_ACUTE_ADMIT_DT DATE,
      9                         DAD_ACUTE_SEQNUM   VARCHAR2(18)
     10                        )
     11  /
    
    Table created.
    
    SQL> CREATE INDEX TB_SOURCE_IDX1
      2    ON TB_SOURCE(
      3                 ULIMORE,
      4                 VISIT_DATE_DT,
      5                 SEQNUM,
      6                 ED_UCC_DT,
      7                 ED_UCC_SEQNUM,
      8                 DAD_ACUTE_DISP_DT,
      9                 DAD_ACUTE_ADMIT_DT,
     10                 DAD_ACUTE_SEQNUM,
     11                 1
     12                )
     13  /
    
    Index created.
    
    SQL> EXPLAIN PLAN FOR
      2  CREATE TABLE TB_INTERMEDIATE
      3  PCTFREE 0
      4  PARALLEL 8
      5  NOLOGGING
      6  AS
      7  SELECT /*+ PARALLEL(8) */
      8      p.SEQNUM,
      9      p.ULIMORE,
     10      p.VISIT_DATE_DT,
     11      LAG(p.ED_UCC_DT            IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
     12      LEAD(p.ED_UCC_DT           IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
     13      LAG(p.ED_UCC_SEQNUM        IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
     14      LEAD(p.ED_UCC_SEQNUM       IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
     15      LAG(p.DAD_ACUTE_DISP_DT    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
     16      LEAD(p.DAD_ACUTE_ADMIT_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
     17      LAG(p.DAD_ACUTE_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
     18      LEAD(p.DAD_ACUTE_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
     19  FROM
     20      TB_SOURCE p
     21  /
    
    Explained.
    
    SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY)
      2  /
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 4041818352
    
    -----------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
    -----------------------------------------------------------------------------------------------------------------------------------
    |   0 | CREATE TABLE STATEMENT             |                 |     1 |    78 |     2   (0)| 00:00:01 |        |      |            |
    |   1 |  PX COORDINATOR                    |                 |       |       |            |          |        |      |            |
    |   2 |   PX SEND QC (RANDOM)              | :TQ10001        |     1 |    78 |     1   (0)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
    |   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_INTERMEDIATE |       |       |            |          |  Q1,01 | PCWP |            |
    |   4 |     OPTIMIZER STATISTICS GATHERING |                 |     1 |    78 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
    |   5 |      PX RECEIVE                    |                 |     1 |    78 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
    |   6 |       PX SEND ROUND-ROBIN          | :TQ10000        |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | S->P | RND-ROBIN  |
    |   7 |        PX SELECTOR                 |                 |       |       |            |          |  Q1,00 | SCWC |            |
    |   8 |         WINDOW BUFFER              |                 |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | SCWC |            |
    |   9 |          INDEX FULL SCAN           | TB_SOURCE_IDX1  |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | SCWP |            |
    -----------------------------------------------------------------------------------------------------------------------------------
    
    Note
    -----
       - dynamic statistics used: dynamic sampling (level=2)
       - Degree of Parallelism is 8 because of hint
    
    21 rows selected.
    
    SQL> DROP INDEX TB_SOURCE_IDX1
      2  /
    
    Index dropped.
    
    SQL> CREATE INDEX TB_SOURCE_IDX1
      2    ON TB_SOURCE(
      3                 VISIT_DATE_DT,
      4                 ULIMORE,
      5                 SEQNUM,
      6                 ED_UCC_DT,
      7                 ED_UCC_SEQNUM,
      8                 DAD_ACUTE_DISP_DT,
      9                 DAD_ACUTE_ADMIT_DT,
     10                 DAD_ACUTE_SEQNUM,
     11                 1
     12                )
     13  /
    
    Index created.
    
    SQL> EXPLAIN PLAN FOR
      2  CREATE TABLE TB_INTERMEDIATE
      3  PCTFREE 0
      4  PARALLEL 8
      5  NOLOGGING
      6  AS
      7  SELECT /*+ PARALLEL(8) */
      8      p.SEQNUM,
      9      p.ULIMORE,
     10      p.VISIT_DATE_DT,
     11      LAG(p.ED_UCC_DT            IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
     12      LEAD(p.ED_UCC_DT           IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
     13      LAG(p.ED_UCC_SEQNUM        IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
     14      LEAD(p.ED_UCC_SEQNUM       IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
     15      LAG(p.DAD_ACUTE_DISP_DT    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
     16      LEAD(p.DAD_ACUTE_ADMIT_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
     17      LAG(p.DAD_ACUTE_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
     18      LEAD(p.DAD_ACUTE_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
     19  FROM
     20      TB_SOURCE p
     21  /
    
    Explained.
    
    SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY)
      2  /
    
    PLAN_TABLE_OUTPUT
    --------------------------------------------------------------------------------------------------------------------------------------------
    Plan hash value: 4289522319
    
    -----------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
    -----------------------------------------------------------------------------------------------------------------------------------
    |   0 | CREATE TABLE STATEMENT             |                 |     1 |    78 |     3  (34)| 00:00:01 |        |      |            |
    |   1 |  PX COORDINATOR                    |                 |       |       |            |          |        |      |            |
    |   2 |   PX SEND QC (RANDOM)              | :TQ10001        |     1 |    78 |     2  (50)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
    |   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_INTERMEDIATE |       |       |            |          |  Q1,01 | PCWP |            |
    |   4 |     OPTIMIZER STATISTICS GATHERING |                 |     1 |    78 |     2  (50)| 00:00:01 |  Q1,01 | PCWP |            |
    |   5 |      PX RECEIVE                    |                 |     1 |    78 |     2  (50)| 00:00:01 |  Q1,01 | PCWP |            |
    |   6 |       PX SEND ROUND-ROBIN          | :TQ10000        |     1 |    78 |     2  (50)| 00:00:01 |  Q1,00 | S->P | RND-ROBIN  |
    |   7 |        PX SELECTOR                 |                 |       |       |            |          |  Q1,00 | SCWC |            |
    |   8 |         WINDOW SORT                |                 |     1 |    78 |     2  (50)| 00:00:01 |  Q1,00 | SCWC |            |
    |   9 |          INDEX FULL SCAN           | TB_SOURCE_IDX1  |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | SCWP |            |
    -----------------------------------------------------------------------------------------------------------------------------------
    
    Note
    -----
       - dynamic statistics used: dynamic sampling (level=2)
       - Degree of Parallelism is 8 because of hint
    
    21 rows selected.
    
    SQL>
    

    SY.