SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

LEAD and LAG taking hours on a very large table!

Sam_PApr 1 2022 — edited Apr 3 2022

Hello,
I have a source table (partitioned and indexed) with approx. 156 Million records. I am trying to create a target table using CTAS method by reading records from the source table with a simple SELECT query that does simple LEAD/LAG operations on a date-column and it is taking over 8 hrs!!
Upon searching online, one developer recommended to try the MATCH_RECOGNIZE technique as it is claimed to run faster than the Analytic SQL functions on very large tables. I have not tested this yet as I need help in converting the SQL query into the MATCH_RECOGNIZE statement, which would result in the same result-set.
CREATE TABLE TB_TARGET
PARALLEL 8
NOLOGGING
AS
SELECT /*+ PARALLEL_INDEX(8) PARALLEL(8) */
PAT_ID,
VISIT_DATE_DT,
ROW_NUMBER() OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS ROW_NUM,
ED_UCC_DT,
LAG(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS PREV_ED_UCC_DT,
LEAD(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS NEXT_ED_UCC_DT
FROM TB_SOURCE; --Very large table!

The result set for a single PAT_ID record (100123456) looks like this:-

Questions:-
1. How do I speed this query up when it runs over 156 Million records with multiple LEAD and LAG operations added into the above query on other columns in addition to the above column?
2. If the fastest solution is to use MATCH_RECOGNIZE, could you please help me with composing a SQL query statement such that it yields the same results as shown above (along with NULLs)?
3. Any other solution is welcomed as long as I can reduce the query execution time to an acceptable duration other than 8 hrs.
Thank you in advance.

This post has been answered by alvinder on Apr 6 2022

Jump to Answer

Frank Kulash

Hi, @sam-p
Whenever you have a question, please post a little sample data (CREATE TABLE and INSERT statements for all tables involved, relevant columns only) so the people who want to help you can re-create the problem and test their ideas. Also post the exact results you want from that data, and explain why you want those results from that data. Always post your complete Oracle version (e.g. 18.4.0.0.0).
with multiple LEAD and LAG operations added into the above query on other columns in addition to the above column?
To how many columns (besides ed_ucc_dt) will you want to apply LAG and LEAD? Include one in your sample data and results
LAG(ED_UCC_DT) IGNORE NULLS OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS PREV_ED_UCC_DT,
Since you're concerned about NULLS, include examples of NULLs in your sample data and results.
Is the combination (pat_id, visit_date_dt) unique? If not, include examples of ties in your sample data, results and explanation.
Given that you need to generate the ROW_NUM column, analytic functions might be the best you can do. LAG and LEAD can always be replaced FIRST_VALUE or LAST_VALUE, but I'm not sure they will be any faster.

Solomon Yakobson

In order to do LEAD/LAG oracle needs to fetch ALL rows. Then it needs to sort it. Check execution plan. Most likely sort doesn't fit in memory and time is spent on disk sort. Having index on PAT_ID, VISIT_DATE_DT or better PAT_ID, VISIT_DATE_DT, ED_UCC_DT could improve sort. Also, how table is partitioned? If it is partitioned by PAT_ID then paralllel could also improve performance.
SY.

Sam_P

@frank-kulash
My apologies I didn't attach sample data as it is Healthcare data and private in nature, so I'll have to spend more time to de-identify to create a sample data.
Oracle version is "Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.11.0.0.0".
Combination (pat_id, visit_date_dt) is not unique. ROW_NUM() is not required per say, it was only for my own purposes.

@solomon-yakobson
I think you're on to something as I'm also leaning towards the same suspicion that the LEAD/LAG do require a lot of sorting area in PGA to be able to churn through a large table like this. I'll see if more RAM can be allocated to PGA.
There is a partition on column VISIT_DATE_DT (interval range partition) which stores by YEAR and there is a non-unique LOCAL index created on column PAT_ID and some other columns as well.

Rocky

Hi,
Adding to @solomon-yakobson point - you can also think of doing it in parallel through using dbms_parallel_execute package.

Sam_P

@rocky10
Thanks, I’ve never used that package before. Ideally, I’d want to break the parallel tasks based on the Partition column VISIT_DATE_DT yearly values for distrubuted parallel processing for further performance gain.
Is this doable with this package?
Do you have an example of how to use this package?

Jonathan Lewis

Unless you have a global index that starts with (pat_id, visit_date_dt) and includes all the columns you're going to use to create the result table Oracle will have to sort 156M rows, including all the columns that you want to copy from the source table. (The same basic requirement would be true for a solution based on match_recognize).
Because your table is range-partitioned by visit_date_dt the dbms_parallel_execute() package will not help as it tries to handle the table in (contiguous) chunks, but your requirement is to pick all the rows for a patient from lots of different parts of the table.
Do you want the target table partitioned in any way as this may dictate the optimal solution.
You've got two options for using an increased volume of concurrent resources
a) execute the statement as a parallel statement - Oracle should be smart enough to do parallel tablescans and distribute the data by range on pat_id so that each PX server process sorts a subset of the data.
b) create some simple code to do an "insert as select ... where pat_id between {parameter1} and {parameter2}, the start up a set processes that use non-overlapping ranges that cover the entire range of pat_id. Probably best if the execution path starts with a tablescan in each case.
Following on from (b) - before you do anything else - check the execution plan for the statement you're using at present. It's possible that Oracle is trying to use your index on pat_id to minimise sorting and spending all its time on random I/Os when (possibly) it should just be doing a brute force tablescan.
Regards
Jonathan Lewis

Sam_P

Thanks @jonathan-lewis2 for helping me navigate through this issue. I am a big fan of yours and I'm honored that you have taken notice to this issue and are willing to help. I truly appreciate it, sir! :)
Just to give you a bit of a context first -- we're working in an Analytics team with Data Scientists wanting to analyze the data (one of them wanted to create a separate table with LEAD/LAG so that they can proceed with their analysis but their query lasted for over 8 hrs and the DBA had to kill it). Since this Data Scientist couldn't proceed any further, I am trying to help to resolve this issue, however, I'm not the DBA, but I do have some privileges that enable me to perform some administration tasks in Oracle.
The ETL team, which is separate from our team, performs monthly data refreshes to thousands of source tables in Oracle, which then gets exposed to us as VIEWS from a given schema in Oracle, for us to consider and query from these source VIEWS and create our own tables (mostly using CTAS to create either a full or a subset of the data from the source VIEWS) in our own "working" schemas, which then act as "source tables" for us in our relevant "working" schemas.
Currently, in our analysis "working" schema, the Data Scientist has created a "source table" (based on a subset of data from the source VIEWS), and this "source table" (i.e. TB_SOURCE) is the one that has 156M records. Currently, this table DOES NOT have any table partitions, however, it does contain a number of individually created (separate) indexes based on individual columns, as shown:-
Notes:
(a) The column "ULIMORE" in the image above is actually what I wrote in this post as "PAT_ID" but this column does not contain unique values, as some of the patients come back for recurring visits.
(b) There is no PRIMARY KEY constraint or any other constraints on this table.
(c) At the moment, this TB_SOURCE table gets destroyed (dropped) and re-created from scratch, every month, by the Data Scientist, to capture new data that was refreshed and made available in source VIEWS. Because of this current practice and the fact that this table is very large, plus the fact that every month, we get approx. 5M new records, we need a new approach.
(d) For this new approach, my plan is to create a new table, partition it and re-create the LOCAL indexes as the original TB_SOURCE table (as shown above in the image) and keep this new partitioned table as "steady" going forward, without the Data Scientist dropping and recreating it every month. Then once the performance and optimization actions are performed on this new partitioned table to be able to run the SQL query in a reasonable time, I was planning to drop the existing TB_SOURCE table and rename the new partitioned table (TB_SOURCE_PARTITIONED) as the new TB_SOURCE. Then, I was going to have the Data Scientist use a parallel MERGE statement to keep this table populated with newly changed (refreshed) data from the source VIEWS, every month.
(e) The above columns were chosen to be indexed because these columns participate in one way or the other in the SELECT below, in addition to all the columns from the TB_SOURCE table:-
--This is taking over 8 hrs!
CREATE TABLE tb_target
PARALLEL 8
NOLOGGING
AS
SELECT /*+ PARALLEL_INDEX(8) PARALLEL(8) */
s.*, --all columns from the original table plus additional ones below
LAG(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS PREV_ED_UCC_DT,
LEAD(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS NEXT_ED_UCC_DT,
LAG(ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS PREV_ED_UCC_SEQNUM,
LEAD(ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS NEXT_ED_UCC_SEQNUM,
LAG(DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS PREV_DAD_DISP_AC_DT,
LEAD(DAD_ACUTE_ADMIT_DT IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_DT,
LAG(DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS PREV_DAD_DISP_AC_SEQNUM,
LEAD(DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_SEQNUM
FROM tb_source s;

Because I did not want to perform any maintenance on the existing TB_SOURCE table and potentially disrupt any downstream reporting based off of this table, I chose to create a separate, new, partitioned table, TB_SOURCE_PARTITIONED and populated it with 156M records (using a direct-path load INSERT /*+ APPEND PARALLEL(8) */ INTO TB_SOURCE_PARTITIONED SELECT /*+ PARALLEL_INDEX(8) PARALLEL(8) */ * FROM TB_SOURCE; --This took 17 mins to load 156M records).
I then proceeded to create the same indexes in this partitioned table as they are in original TB_SOURCE but could not create UNIQUE LOCAL indexes on columns ED_UCC_SEQNUM and on DAD_ACUTE_SEQNUM as it threw an error saying something to the effect that UNIQUE columns must be part of the partition key or something similar. So, I just created NON-UNIQUE indexes instead and all of the indexes created are LOCAL indices on the partitioned table, as shown:-

I then gathered full table and index stats and then tried running the above CTAS (but reading FROM tb_source_partitioned s;) to see if there was any improvement but sadly, it was taking longer than 4 hrs based on my testing and I had to manually cancel the query. Unfortunately, partitions did not help.

A few questions:-
1. Do you think that my use of the PARALLEL_INDEX(8) hint in the above CTAS is forcing Oracle to use indexes instead of the full table scan?
2. Should I keep the TB_SOURCE_PARTITIONED table or drop it?
3. Should I keep using the non-partitioned TB_SOURCE and use MERGE to populate it going forward?

There is no requirement for the TARGET table to be partitioned but the Data Scientist wants to create additional follow-up tables based off of this TB_TARGET to perform subsequent analysis until the final analysis tables are created to be used for final reporting.
I am looking forward to performing the next steps based on your recommendations and instructions.
Thank you once again for your help.

Jonathan Lewis

If someone hired me to address this question I'd spend a couple of hours talking to your data scientist about what they wanted to do with the data before I made any suggestions about how to add an ordering column to a data set of 156M rows. Try to imagine how many exchanges of notes that's going to turn into if I have have to read your replies and write more questions asking for clarification - I don't have that much time free.
A couple of points:
If the tb_source table is recreated each month why not include the analytic sort in its creation - a brute force, one-off create is probably the most efficient way to prepare the data set.
You said the tb_source was a subset of the data - that suggests there may be data in the table this month that won't be needed next month, or maybe next year. Have you considered the possibility of having to remove data.
A MERGE command requires uniqueness on the "ON" columns of the new data otherwise you risk a run-time error (about a "stable set of rows") and Oracle rolling back the update so far. If you do manage to deal with the uniqueness how are you going to avoid updating every row in the 156MB or conversely, how are you going to ensure that the rows (and only those rows) that need an update are updated.
Even if you manage to get a working MERGE to add data, how are you going to set the relevant row_number, lag/lead date columns for the new data populated. And what if a new row for a pat_id has a visit date that is earlier than the highest existing vist date for that pat_id - it probably "shouldn't" happen, but probably will happen.
If you don't know what the end-user is going to do with the data and you don't really understand how partitioning works then you shouldn't be using partitioning (at least not until you know why it might make the end-user's queries run significantly more efficiently).

I have a couple of ideas for design strategies that MIGHT be appropriate for the end-user's requirements, but there's no point in saying anything about any of them without knowing what the end-user hopes to do with the data. The strongest indication we have to date is that the end-user is interested in follow patient histories - which suggests that for efficiency purposes you need to think about putting all the data for a given patient into the smallest number of blocks possible: i.e. think about (hash) partitioning by pat_id or creating an index organized table based around finding a unique index starting with pat_id (or cluster by pat_id, but that's difficult to do well).

Regards
Jonathan Lewis

Sam_P

Points noted @jonathan-lewis2 .
I'll spend more time tomorrow with the end-user to understand how this table is being used and queried downstream.
About TB_SOURCE being a subset of the source data, I looked at his SQL scripts and looks like he's running CTAS from 6 different source VIEWS and creates their corresponding 6 tables as RAW tables. These tables contain ALL the records from their corresponding source VIEWS, not a subset. I stand corrected.
He then creates a ton of intermediate tables to perform various transformations from the RAW tables (like CASTs, handling NULLS, joins, adding more columns to standardize, etc.) in his scripts. Once the transformations are done with their own standardized tables with similar column names and data-types, he then consolidates them (UNION) into a single table, which is acting as this TB_SOURCE for this project purposes.
However, it doesn't stop here, as part of his project and the requirements are not always provided by his clients in advance but rather, the client (a medical group) may pose different questions as the project progresses and more insights are found by the Data Scientist. As a result, he creates more intermediate tables downstream (reading from TB_SOURCE) in order to answer many different types of questions/scenarios posed by his client. So, a table is most likely to be dependent on the prior table in a chain of many project-related tables to get to the final few tables that are then read by the reporting/dashboarding tool to bring it all together visually.
Just an FYI - as of yesterday evening at 9PM, I ran the CTAS (against TB_SOURCE directly) without the PARALLEL_INDEX hint (just kept the PARALLEL hint) and checked the execution plan and the optimizer chose to perform a FULL TABLE SCAN. I kept my laptop running all night and when I checked as of this morning at 9AM, the CTAS query was still running! I had to manually cancel it again.
Also, I have dropped the TB_SOURCE_PARTITIONED until I get more information from the end-user. I'll read up on HASH partitioning and on IOTs. Thank you for your suggestions :)
I'll provide more details soon.

Jonathan Lewis

A quick bit of arithmetic.
If you create the table to hold only one of the unique numbers, the 4 dates (2 x next/prev) and the 4 sequence number (2 x next prev) you will have done a sort carrying: unique number, ulimore, ed_ucc_dt, ed_ucc_seqnum, dad_acute_disp_dt, dad_acute_seqnum, which will be in the order of 45 bytes per entry, for a total of about 7GB of data - so when you try including the whole row (and you haven't said anything about its length) it's not surprising that the sort will take a long time.
If it's necessary to stick with the current sequence of event you could consider creating a table from just the columns I've listed above, then creating the final table by forcing a parallel hash join between this 7GB data set and the tb_source dataset, maybe partitioning the result by HASH 16 or 32 ways on the ulimore (and running with the same degree of parallelism as there are partitions - any maybe creating the intermediate 7GB table with the same partitioning clause to help optimize the final join and create.
Regards
Jonathan Lewis

Sam_P

Thanks @jonathan-lewis2 for sticking by.
My apologies in advance if I misunderstand the information you're asking -- by "length" do you mean this info below?

Please let me know how do I get you the information you need, if the above is not what you're after.
What do you mean when you say "unique number" column? Do you mean something like an "ID" column with auto-generated unique values that would act as a Primary Key in this intermediate table?
Just an FYI -- the ROW_NUM column in my screenshots in earlier posts is not needed as it was only used for my own testing purposes. There is no physical column called ROW_NUM.
A few things to note:-
1. Sequence Number columns ED_UCC_SEQNUM and DAD_ACUTE_SEQNUM contain UNIQUE values.
2. ED_UCC_SEQNUM produces 2 additional PREV + NEXT aliased columns.
3. DAD_ACUTE_SEQNUM produces 2 additional PREV + NEXT aliased columns.
4. Date columns VISIT_DATE_DT, ED_UCC_DT, DAD_ACUTE_DISP_DT and DAD_ACUTE_ADMIT_DT are physical table columns being referred to in the SELECT query.
5. VISIT_DATE_DT does not produce any aliased (calculated) column but is being referred to in OVER (...ORDER BY clause).
6. ED_UCC_DT produces 2 additional PREV + NEXT aliased columns.
7. DAD_ACUTE_DISP_DT produces only 1 PREV aliased column captioned PREV_DAD_DISP_AC_DT.
8. DAD_ACUTE_ADMIT_DT produces only 1 NEXT aliased column captioned NEXT_DAD_DISP_AC_DT.
9. ULIMORE is a non-unique VARCHAR2 column that does not produce any aliased columns.

So, in this intermediate table that I'd create, would I need the following minimum columns, in this order:-
("ID", ULIMORE, VISIT_DATE_DT, ED_UCC_DT, ED_UCC_SEQNUM,DAD_ACUTE_DISP_DT, DAD_ACUTE_ADMIT_DT, DAD_ACUTE_SEQNUM)
OR without the VISIT_DATE_DT column, as:
("ID", ULIMORE, ED_UCC_DT, ED_UCC_SEQNUM,DAD_ACUTE_DISP_DT, DAD_ACUTE_ADMIT_DT, DAD_ACUTE_SEQNUM)

Thanks!

Jonathan Lewis

Yes, the avg_row_len is a good enough indicator for a rough calculation.
Unique number column - you have two of them: ED_UCC_SEQNUM and on DAD_ACUTE_SEQNUM, if either is declared not null then that can be used as the unique key to join the generated table back to tb_source.
If you create the table to hold only one of the unique numbers
One of (1) above
the 4 dates (2 x next/prev)
(6), (7) and (8) above - but I misread dad_acute as begin next/prev on the same date
and the 4 sequence number (2 x next prev)
(2) and (3) above

This is all the generated data from the leads and lags, and one column that is a unique identifier to join to the original TB_SOURCE table. That's all you would need in the intermediate table. The fact that your query used other columns to generate these results is irrelevant so long as you can join back correctly to get the rest of the row when you need it.

I missed some overheads on my first estimate - allowing some guesswork for numeric sizes and memory overheads for sorting 1 number @ 5 bytes, 4 numbers at 3 bytes, 4 dates @ 7 bytes, 2 bytes per column o/head, 4 bytes per row o/head: 67 bytes per row for sorting that's about 9 GB memory. The table will be a little smaller (8GB) because the column overhead is only 1 byte per column but the row overhead will be 5 bytes. (To save space, create the table with pctfree = 0 - which might be a point worth mentioning for lots of other intermediate tables).
Regards
Jonathan Lewis

Sam_P

Thanks @jonathan-lewis2 .
To give you more column details in the TB_SOURCE table:-

Notes:
1. The unique number columns ED_UCC_SEQNUM and DAD_ACUTE_SEQNUM are not declared as NOT NULL because the end user had produced this table using CTAS (by reading it from a prior table in a chain) and no further constraints were added to this table. These 2 columns are only unique in terms of their DISTINCT values but they do not span ALL the rows (155,093,878), as per the image above. The closest column which comes close is probably SEQNUM but even, it doesn't span ALL the rows. Does this matter whether we choose to include ED_UCC_SEQNUM or DAD_ACUTE_SEQNUM column in TB_INTERMEDIATE to act as a unique key to join back to TB_SOURCE?

2. The current size of the TB_SOURCE table is:-

3. Just so that I don't mess things up when creating the TB_INTERMEDIATE, should I create it as normal HEAP table or as an IOT table?
Once you respond to this, I'll post a table creation statement for your review before I create the empty table and start populating it with 156M rows.

4. I have never worked with an IOT table but reading up on it yesterday, it looks like that I have to specify which columns are to be marked as "key" columns and which columns are to be marked as "overflow" columns in the table creation statement.
Once you respond to this, I'll post a table creation statement for your review before I create the empty table and start populating it with 156M rows.

Many thanks!

Sam_P

Just wanted to provide a quick update [Jonathan Lewis](/ords/forums/user/Jonathan Lewis) .
I spoke with the Data Scientist today and he mentioned that this intermediate table must include, at the very minimum, 3 columns: SEQNUM, ULIMORE, VISIT_DATE_DT as these 3 together do uniquely identify a record in TB_SOURCE table, hence, he would need it also in the TB_INTERMEDIATE in order for him to do the JOIN or look back up to TB_SOURCE. I did verify that these 3 composite columns do result into no duplicates and the SUM of the count of rows returned for all these 3 columns combined does equal to the exact NUM_ROWS in TB_SOURCE.

I then proceeded to issue this CTAS:-

CREATE TABLE TB_INTERMEDIATE 
PCTFREE 0 
PARALLEL 8 
NOLOGGING 
AS 
SELECT /*+ FULL(p) PARALLEL(8) */ 
    p.SEQNUM,
    p.ULIMORE,
    p.VISIT_DATE_DT,
     
    LAG(p.ED_UCC_DT      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
    LEAD(p.ED_UCC_DT     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
     
    LAG(p.ED_UCC_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
    LEAD(p.ED_UCC_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
     
    LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
    LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
     
    LAG(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
    LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM

FROM
    TB_SOURCE p
;

Notes:-
1. I have explicitly specified PCTFREE 0 above based on your recommendation.
2. There are 8 parallel threads running at the moment.
3. The above CTAS is still running after 4.5 hrs have already passed.
4. Here's the execution plan (while it's still running):-

Solomon Yakobson

As I already mentioned - you need index to a) trade large table full scan for way smaller indedx full scan b) which is even more important trade costly window sort for less costly window buffer. Compare:

SQL> DROP TABLE TB_SOURCE PURGE
  2  /

Table dropped.

SQL> CREATE TABLE TB_SOURCE(
  2                         SEQNUM            VARCHAR2(18),
  3                         ULIMORE           VARCHAR2(15),
  4                         VISIT_DATE_DT     DATE,
  5                         ED_UCC_DT         DATE,
  6                         ED_UCC_SEQNUM     VARCHAR2(18),
  7                         DAD_ACUTE_DISP_DT DATE,
  8                         DAD_ACUTE_SEQNUM  VARCHAR2(18)
  9                        )
 10  /

Table created.

SQL> EXPLAIN PLAN FOR
  2  CREATE TABLE TB_INTERMEDIATE
  3  PCTFREE 0
  4  PARALLEL 8
  5  NOLOGGING
  6  AS
  7  SELECT /*+ PARALLEL(8) */
  8      p.SEQNUM,
  9      p.ULIMORE,
 10      p.VISIT_DATE_DT,
 11      LAG(p.ED_UCC_DT          IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
 12      LEAD(p.ED_UCC_DT         IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
 13      LAG(p.ED_UCC_SEQNUM      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
 14      LEAD(p.ED_UCC_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
 15      LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
 16      LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
 17      LAG(p.DAD_ACUTE_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
 18      LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
 19  FROM
 20      TB_SOURCE p
 21  /

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 516057336

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT             |                 |     1 |    69 |     3  (34)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                    |                 |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001        |     1 |    69 |     3  (34)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_INTERMEDIATE |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |                 |     1 |    69 |     3  (34)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      WINDOW SORT                   |                 |     1 |    69 |     3  (34)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX RECEIVE                   |                 |     1 |    69 |     2   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        PX SEND HASH                | :TQ10000        |     1 |    69 |     2   (0)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   8 |         PX BLOCK ITERATOR          |                 |     1 |    69 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   9 |          TABLE ACCESS FULL         | TB_SOURCE       |     1 |    69 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - Degree of Parallelism is 8 because of hint

21 rows selected.

SQL> CREATE INDEX TB_SOURCE_IDX1
  2    ON TB_SOURCE(
  3                 ULIMORE,
  4                 VISIT_DATE_DT,
  5                 SEQNUM,
  6                 ED_UCC_DT,
  7                 ED_UCC_SEQNUM,
  8                 DAD_ACUTE_DISP_DT,
  9                 DAD_ACUTE_SEQNUM,
 10                 1 -- we need FBI since all the above index columns are NULLable
 11                )
 12  /

Index created.

SQL> EXPLAIN PLAN FOR
  2  CREATE TABLE TB_INTERMEDIATE
  3  PCTFREE 0
  4  PARALLEL 8
  5  NOLOGGING
  6  AS
  7  SELECT /*+ PARALLEL(8) */
  8      p.SEQNUM,
  9      p.ULIMORE,
 10      p.VISIT_DATE_DT,
 11      LAG(p.ED_UCC_DT          IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
 12      LEAD(p.ED_UCC_DT         IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
 13      LAG(p.ED_UCC_SEQNUM      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
 14      LEAD(p.ED_UCC_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
 15      LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
 16      LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
 17      LAG(p.DAD_ACUTE_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
 18      LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
 19  FROM
 20      TB_SOURCE p
 21  /

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4041818352

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT             |                 |     1 |    69 |     2   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                    |                 |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001        |     1 |    69 |     1   (0)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_INTERMEDIATE |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |                 |     1 |    69 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE                    |                 |     1 |    69 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND ROUND-ROBIN          | :TQ10000        |     1 |    69 |     1   (0)| 00:00:01 |  Q1,00 | S->P | RND-ROBIN  |
|   7 |        PX SELECTOR                 |                 |       |       |            |          |  Q1,00 | SCWC |            |
|   8 |         WINDOW BUFFER              |                 |     1 |    69 |     1   (0)| 00:00:01 |  Q1,00 | SCWC |            |
|   9 |          INDEX FULL SCAN           | TB_SOURCE_IDX1  |     1 |    69 |     1   (0)| 00:00:01 |  Q1,00 | SCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - Degree of Parallelism is 8 because of hint

21 rows selected.

SQL>

And, you don't need intermediate table.
SY.
P.S. You have LAG/LEAD sets for all columns except LAG for DAD_ACUTE_DISP_DT but LEAD for DAD_ACUTE_ADMIT_DT. I assumed it was a typo and changed LEAD to DAD_ACUTE_DISP_DT. Add DAD_ACUTE_ADMIT_DT to index if it wasn't a typo.

Sam_P

Thanks [Solomon Yakobson](/ords/forums/user/Solomon Yakobson) . It wasn't a typo but they're 2 different columns, which I added to the index creation statement, as shown:-

--This took 14 mins
CREATE INDEX IX_COMPOSITE_COLUMNS ON TB_SOURCE
(
    SEQNUM,
    ULIMORE,
    VISIT_DATE_DT,     
    ED_UCC_DT,
    ED_UCC_SEQNUM,
    DAD_ACUTE_DISP_DT,
    DAD_ACUTE_ADMIT_DT,
    DAD_ACUTE_SEQNUM
) 
TABLESPACE "ANALYTICS_INDEX" 
PARALLEL 8 
NOLOGGING 
COMPUTE STATISTICS
;
/

I then proceeded to run an EXPLAIN PLAN:-

EXPLAIN PLAN FOR 
CREATE TABLE TB_TARGET 
PCTFREE 0 
PARALLEL 8 
NOLOGGING 
AS 
SELECT /*+ PARALLEL(8) */ 
    p.SEQNUM,
    p.ULIMORE,
    p.VISIT_DATE_DT,
     
    LAG(p.ED_UCC_DT      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
    LEAD(p.ED_UCC_DT     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
     
    LAG(p.ED_UCC_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
    LEAD(p.ED_UCC_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
     
    LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
    LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
     
    LAG(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
    LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM

FROM
    TB_SOURCE p
;
/

Oracle's optimizer is choosing to do a FULL TABLE SCAN and it is not picking up the index. I also explicitly added an an INDEX hint in the SELECT but still, it is NOT picking up the index.

Solomon Yakobson

You missed:
1 -- we need FBI since all the above index columns are NULLable.
Optimizer can't use index when all index columns are NULLable sine rows where all such columns are NULL aren't indexed. That's why we need to add something that's not null. I added constant 1:

SQL> CREATE INDEX TB_SOURCE_IDX1
  2    ON TB_SOURCE(
  3                 ULIMORE,
  4                 VISIT_DATE_DT,
  5                 SEQNUM,
  6                 ED_UCC_DT,
  7                 ED_UCC_SEQNUM,
  8                 DAD_ACUTE_DISP_DT,
  9                 DAD_ACUTE_SEQNUM,
 10                 1 -- we need FBI since all the above index columns are NULLable
 11                )
 12  /

SY.

Sam_P

Thanks @solomon-yakobson , I totally missed that!
I recreated the index with FBI constant (Btw, I didn't know this technique of creating a Function-Based Index). Here's the EXPLAIN PLAN:-

I am now executing the CTAS and will wait for it to finish and will let you know how long it takes.
Thanks!

alvinder

Another way to get the this done quickly. 155M rows are not that many.

I have noticed that doing something on the big table it is better to do it using dbms_parallel_execute.

I have assumed ULIMORE    as a  number but if it is a string you may still be able to use this code. 
I haven't tested on string

DROP TABLE TB_SOURCE PURGE;

CREATE TABLE TB_SOURCE(
              SEQNUM      VARCHAR2(18),
              ULIMORE      VARCHAR2(15),
              VISIT_DATE_DT   DATE,
              ED_UCC_DT     DATE,
              ED_UCC_SEQNUM   VARCHAR2(18),
              DAD_ACUTE_DISP_DT DATE,
              DAD_ACUTE_SEQNUM VARCHAR2(18)
              )
 /
--Creating sample data.              
INSERT INTO tb_source
  SELECT
    ROWNUM
   , round(
      dbms_random.value(
        1
       , 1000
      )
    )
   , sysdate - ROWNUM
   , sysdate - ROWNUM
   , ROWNUM
   , sysdate - ROWNUM
   , ROWNUM
  FROM
    dual
  CONNECT BY
    level < 10000;     

       

 CREATE TABLE TB_SOURCE_final(
              SEQNUM      VARCHAR2(18),
              ULIMORE      VARCHAR2(15),
              VISIT_DATE_DT   DATE,
              ED_UCC_DT     DATE,
              ED_UCC_SEQNUM   VARCHAR2(18),
              DAD_ACUTE_DISP_DT DATE,
              DAD_ACUTE_SEQNUM VARCHAR2(18),
              ED_UCC_DT_prev     DATE,
              ED_UCC_DT_next     DATE,
              ED_UCC_SEQNUM_prev   VARCHAR2(18),
              ED_UCC_SEQNUM_next   VARCHAR2(18),
              DAD_ACUTE_DISP_DT_prev DATE,
               DAD_ACUTE_DISP_DT_next DATE,
              DAD_ACUTE_SEQNUM_prev VARCHAR2(18),
               DAD_ACUTE_SEQNUM_next VARCHAR2(18)
             )
 /
 
 
 create index tb_source_ix1 on tb_source(
              ULIMORE      ,
              VISIT_DATE_DT   ,
              SEQNUM,
              ED_UCC_DT     ,
              ED_UCC_SEQNUM   ,
              DAD_ACUTE_DISP_DT ,
              DAD_ACUTE_SEQNUM  
              ) parallel(degree 8);
Alter index tb_source_ix1 noparallel;
   
         
 
CREATE OR REPLACE PROCEDURE insert_parallel_execute AS
  l_chunk_sql VARCHAR2(1000);
  l_sql_stmt VARCHAR2(32000);
  l_try    NUMBER;
  l_status  NUMBER;
BEGIN
  BEGIN
    dbms_parallel_execute.drop_task('insert_aps');
  EXCEPTION
    WHEN OTHERS THEN
      NULL;
  END;
 -- Create the TASK
  dbms_parallel_execute.create_task('insert_aps');

 -- Chunk the table by MANAGER_ID
  l_chunk_sql := 'WITH t AS (
  SELECT DISTINCT
    ulimore
  FROM
    tb_source
), t1 AS (
  SELECT
    ulimore
   , NTILE(100)
     OVER(
      ORDER BY
        ulimore
     ) bucket
  FROM
    t
)
SELECT
  bucket
 , MIN(ulimore)start_id
 , MAX(ulimore) end_id
FROM
  t1
GROUP BY
  bucket;';
  dbms_parallel_execute.create_chunks_by_sql(
                       'insert_aps'
                      , l_chunk_sql
                      , false
  );

 -- Execute the DML in parallel
 --  the WHERE clause contain a condition on manager_id, which is the chunk
 --  column. In this case, grouping rows is by manager_id.
  l_sql_stmt := ' insert into TB_SOURCE_final
 SELECT 
    p.SEQNUM,
    p.ULIMORE,
    p.VISIT_DATE_DT,
    LAG(p.ED_UCC_DT     IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_ED_UCC_DT,
    LEAD(p.ED_UCC_DT     IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_ED_UCC_DT,
    LAG(p.ED_UCC_SEQNUM   IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_ED_UCC_SEQNUM,
    LEAD(p.ED_UCC_SEQNUM   IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_ED_UCC_SEQNUM,
    LAG(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_DAD_DISP_AC_DT,
    LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_DAD_DISP_AC_DT,
    LAG(p.DAD_ACUTE_SEQNUM  IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_DAD_DISP_AC_SEQNUM,
    LEAD(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_DAD_DISP_AC_SEQNUM
  FROM
    TB_SOURCE p
    where ulimore between :start_id and :end_id ';
  dbms_parallel_execute.run_task(
                 'insert_aps'
                , l_sql_stmt
                , dbms_sql.native
                , parallel_level => 15
  );
END insert_parallel_execute;
/



begin
insert_parallel_execute;
end;
/


you can monitor this in a different session.


SELECT status, COUNT(*)
FROM   user_parallel_execute_chunks
WHERE task_name = 'insert_aps'
GROUP BY status
ORDER BY status;

Solomon Yakobson

Show us how you created index. I have a feeling it wasn't created right. It MUST start with same column as in PARTITION BY analytic functions, otherwise it will not trade WINDOW SORT (cost = 2, %cpu = 50 ) for less costly WINDOWS BUFFER (cost = 1,%cpu = 0). Compare plans (SORT operation) when ULIMORE is index first column and when it is not:

SQL> CREATE TABLE TB_SOURCE(
  2                         SEQNUM             VARCHAR2(18),
  3                         ULIMORE            VARCHAR2(15),
  4                         VISIT_DATE_DT      DATE,
  5                         ED_UCC_DT          DATE,
  6                         ED_UCC_SEQNUM      VARCHAR2(18),
  7                         DAD_ACUTE_DISP_DT  DATE,
  8                         DAD_ACUTE_ADMIT_DT DATE,
  9                         DAD_ACUTE_SEQNUM   VARCHAR2(18)
 10                        )
 11  /

Table created.

SQL> CREATE INDEX TB_SOURCE_IDX1
  2    ON TB_SOURCE(
  3                 ULIMORE,
  4                 VISIT_DATE_DT,
  5                 SEQNUM,
  6                 ED_UCC_DT,
  7                 ED_UCC_SEQNUM,
  8                 DAD_ACUTE_DISP_DT,
  9                 DAD_ACUTE_ADMIT_DT,
 10                 DAD_ACUTE_SEQNUM,
 11                 1
 12                )
 13  /

Index created.

SQL> EXPLAIN PLAN FOR
  2  CREATE TABLE TB_INTERMEDIATE
  3  PCTFREE 0
  4  PARALLEL 8
  5  NOLOGGING
  6  AS
  7  SELECT /*+ PARALLEL(8) */
  8      p.SEQNUM,
  9      p.ULIMORE,
 10      p.VISIT_DATE_DT,
 11      LAG(p.ED_UCC_DT            IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
 12      LEAD(p.ED_UCC_DT           IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
 13      LAG(p.ED_UCC_SEQNUM        IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
 14      LEAD(p.ED_UCC_SEQNUM       IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
 15      LAG(p.DAD_ACUTE_DISP_DT    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
 16      LEAD(p.DAD_ACUTE_ADMIT_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
 17      LAG(p.DAD_ACUTE_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
 18      LEAD(p.DAD_ACUTE_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
 19  FROM
 20      TB_SOURCE p
 21  /

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY)
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4041818352

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT             |                 |     1 |    78 |     2   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                    |                 |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001        |     1 |    78 |     1   (0)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_INTERMEDIATE |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |                 |     1 |    78 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE                    |                 |     1 |    78 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND ROUND-ROBIN          | :TQ10000        |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | S->P | RND-ROBIN  |
|   7 |        PX SELECTOR                 |                 |       |       |            |          |  Q1,00 | SCWC |            |
|   8 |         WINDOW BUFFER              |                 |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | SCWC |            |
|   9 |          INDEX FULL SCAN           | TB_SOURCE_IDX1  |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | SCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - Degree of Parallelism is 8 because of hint

21 rows selected.

SQL> DROP INDEX TB_SOURCE_IDX1
  2  /

Index dropped.

SQL> CREATE INDEX TB_SOURCE_IDX1
  2    ON TB_SOURCE(
  3                 VISIT_DATE_DT,
  4                 ULIMORE,
  5                 SEQNUM,
  6                 ED_UCC_DT,
  7                 ED_UCC_SEQNUM,
  8                 DAD_ACUTE_DISP_DT,
  9                 DAD_ACUTE_ADMIT_DT,
 10                 DAD_ACUTE_SEQNUM,
 11                 1
 12                )
 13  /

Index created.

SQL> EXPLAIN PLAN FOR
  2  CREATE TABLE TB_INTERMEDIATE
  3  PCTFREE 0
  4  PARALLEL 8
  5  NOLOGGING
  6  AS
  7  SELECT /*+ PARALLEL(8) */
  8      p.SEQNUM,
  9      p.ULIMORE,
 10      p.VISIT_DATE_DT,
 11      LAG(p.ED_UCC_DT            IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
 12      LEAD(p.ED_UCC_DT           IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
 13      LAG(p.ED_UCC_SEQNUM        IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
 14      LEAD(p.ED_UCC_SEQNUM       IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
 15      LAG(p.DAD_ACUTE_DISP_DT    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
 16      LEAD(p.DAD_ACUTE_ADMIT_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
 17      LAG(p.DAD_ACUTE_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
 18      LEAD(p.DAD_ACUTE_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
 19  FROM
 20      TB_SOURCE p
 21  /

Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY)
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4289522319

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name            | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT             |                 |     1 |    78 |     3  (34)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                    |                 |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001        |     1 |    78 |     2  (50)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_INTERMEDIATE |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |                 |     1 |    78 |     2  (50)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE                    |                 |     1 |    78 |     2  (50)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND ROUND-ROBIN          | :TQ10000        |     1 |    78 |     2  (50)| 00:00:01 |  Q1,00 | S->P | RND-ROBIN  |
|   7 |        PX SELECTOR                 |                 |       |       |            |          |  Q1,00 | SCWC |            |
|   8 |         WINDOW SORT                |                 |     1 |    78 |     2  (50)| 00:00:01 |  Q1,00 | SCWC |            |
|   9 |          INDEX FULL SCAN           | TB_SOURCE_IDX1  |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | SCWP |            |
-----------------------------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - Degree of Parallelism is 8 because of hint

21 rows selected.

SQL>

SY.

Sam_P

Hi [Solomon Yakobson](/ords/forums/user/Solomon Yakobson) , I had created the index with the leading column being SEQNUM. Then ran the CTAS and it is taking forever.....over 4 hrs now. I had to manually cancel it, yet again.

I'll go ahead and drop the index and re-create it as follows:-

CREATE INDEX IX_COMPOSITE_COLUMNS ON TB_SOURCE
(
    ULIMORE,
    VISIT_DATE_DT,
    SEQNUM,
    ED_UCC_DT,
    ED_UCC_SEQNUM,
    DAD_ACUTE_DISP_DT,
    DAD_ACUTE_ADMIT_DT,
    DAD_ACUTE_SEQNUM,
    1 -- Since all the above columns are NULLable, we need to add a constant for the Function-Based Index to be created 
) 
TABLESPACE "ANALYTICS_INDEX" 
PARALLEL 8 
NOLOGGING 
COMPUTE STATISTICS
;
/

The EXPLAIN PLAN (it did not switch to WINDOW BUFFER):-

Sam_P

Hello @alvinder , thanks for sharing your code sample.
The ULIMORE column is a VARCHAR2, so your "bucket" query simply picks up the numeric values and ignores the rest of the string values.

Solomon Yakobson

Show CTAS statement.
SY.

Sam_P

[Solomon Yakobson](/ords/forums/user/Solomon Yakobson)

EXPLAIN PLAN FOR 
CREATE TABLE TB_TARGET
PCTFREE 0 
PARALLEL 8 
NOLOGGING 
AS 
SELECT /*+ PARALLEL(8) */ 
    p.SEQNUM,
    p.ULIMORE,
    p.VISIT_DATE_DT,
 
    LAG(p.ED_UCC_DT      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
    LEAD(p.ED_UCC_DT     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
     
    LAG(p.ED_UCC_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
    LEAD(p.ED_UCC_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
     
    LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
    LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
     
    LAG(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
    LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM

FROM
    TB_SOURCE p
;
/

I even tried matching the columns like they were in the INDEX above but still the EXPLAIN PLAN does not perform a WINDOW BUFFER but rather, it shows WINDOW SORT:-

EXPLAIN PLAN FOR 
CREATE TABLE TB_TARGET
PCTFREE 0 
PARALLEL 8 
NOLOGGING 
AS 
SELECT /*+ PARALLEL(8) */ 
    p.ULIMORE,
    p.VISIT_DATE_DT,
    p.SEQNUM, 

    LAG(p.ED_UCC_DT      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
    LEAD(p.ED_UCC_DT     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
     
    LAG(p.ED_UCC_SEQNUM    IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
    LEAD(p.ED_UCC_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
     
    LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
    LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
     
    LAG(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
    LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM

FROM
    TB_SOURCE p
;
/

Solomon Yakobson

Post create TB_SOURCE table.
SY.

alvinder

Answer

Tested with varchar2 for Ulimore.
I am using ora_hash function to assign a number to a string.

This code works for me. I have created 100 buckets running 15 in parallel. You can go for more data. My sample data is 100000

DROP TABLE TB_SOURCE PURGE;


CREATE TABLE TB_SOURCE(
                           SEQNUM            VARCHAR2(18),
                           ULIMORE           VARCHAR2(15),
                           VISIT_DATE_DT     DATE,
                           ED_UCC_DT         DATE,
                           ED_UCC_SEQNUM     VARCHAR2(18),
                           DAD_ACUTE_DISP_DT DATE,
                           DAD_ACUTE_SEQNUM  VARCHAR2(18)
                           )
 /
--Creating sample data.                           
INSERT INTO tb_source
    SELECT
        ROWNUM
      , 
            dbms_random.string(
                'x'
              , 5
            )
        
      , sysdate - ROWNUM
      , sysdate - ROWNUM
      , ROWNUM
      , sysdate - ROWNUM
      , ROWNUM
      
    FROM
        dual
    CONNECT BY
        level < 100000;         


  truncate table tb_source_final;          


 CREATE TABLE TB_SOURCE_final(
                           SEQNUM            VARCHAR2(18),
                           ULIMORE           VARCHAR2(15),
                           VISIT_DATE_DT     DATE,
                           ED_UCC_DT_prev         DATE,
                            ED_UCC_DT_next         DATE,
                           ED_UCC_SEQNUM_prev     VARCHAR2(18),
                            ED_UCC_SEQNUM_next     VARCHAR2(18),
                           DAD_ACUTE_DISP_DT_prev DATE,
                             DAD_ACUTE_DISP_DT_next DATE,
                           DAD_ACUTE_SEQNUM_prev  VARCHAR2(18),
                             DAD_ACUTE_SEQNUM_next  VARCHAR2(18)
                          )
  /
 
 
 create index tb_source_ix1 on tb_source(
                            ora_hash(ulimore,155000000)           ,
                           VISIT_DATE_DT     ,
                           SEQNUM,
                           ED_UCC_DT         ,
                           ED_UCC_SEQNUM     ,
                           DAD_ACUTE_DISP_DT ,
                           DAD_ACUTE_SEQNUM  
                           ) parallel(degree 8);
Alter index tb_source_ix1 noparallel;

create or replace PROCEDURE insert_parallel_execute AS
    l_chunk_sql VARCHAR2(1000);
    l_sql_stmt  VARCHAR2(32000);
    l_try       NUMBER;
    l_status    NUMBER;
BEGIN
    BEGIN
        dbms_parallel_execute.drop_task('insert_aps');
    EXCEPTION
        WHEN OTHERS THEN
            NULL;
    END;
  -- Create the TASK
    dbms_parallel_execute.create_task('insert_aps');


  -- Chunk the table by MANAGER_ID
    l_chunk_sql := 'WITH t AS (
    SELECT DISTINCT
        ora_hash(ulimore,155000000,0) ulimore
    FROM
        tb_source
), t1 AS (
    SELECT
        ulimore
      , NTILE(100)
          OVER(
            ORDER BY
                ulimore
          ) bucket
    FROM
        t
)
SELECT
   MIN(ulimore)start_id
  , MAX(ulimore) end_id
FROM
    t1
GROUP BY
    bucket ';
    
    dbms_parallel_execute.create_chunks_by_sql(
                                              'insert_aps'
                                            , l_chunk_sql
                                            , false
    );


  -- Execute the DML in parallel
  --   the WHERE clause contain a condition on manager_id, which is the chunk
  --   column. In this case, grouping rows is by manager_id.
    l_sql_stmt := ' insert into  TB_SOURCE_final
  SELECT 
        p.SEQNUM,
        p.ULIMORE,
       p.VISIT_DATE_DT,
       LAG(p.ED_UCC_DT          IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
       LEAD(p.ED_UCC_DT         IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
       LAG(p.ED_UCC_SEQNUM      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
       LEAD(p.ED_UCC_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
       LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
       LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
       LAG(p.DAD_ACUTE_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
       LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
   FROM
       TB_SOURCE p
       where ora_hash(ulimore,155000000) between  :start_id and :end_id';
             
    dbms_parallel_execute.run_task(
                                  'insert_aps'
                                , l_sql_stmt
                                , dbms_sql.native
                                , parallel_level => 15
    );
END insert_parallel_execute;
/

  select status,count(*) from user_parallel_execute_chunks
  where task_name = 'insert_aps'
  group by status;
  
<img src="https://objectstorage.us-phoenix-1.oraclecloud.com/p/BqK85Rn1zA5MP0vYiqbAdPgs7Z6OmMxw8SD3WCFVm5kY8uReidZ1KPIKkgJ1hCkG/n/axciphqpnohg/b/forums-legacy/o/uploads/OMDLXCDD3HVX/image.png" alt="image.png">

Marked as Answer by Sam_P · Apr 24 2022

Jonathan Lewis

Coming back to this after some time away.
Getting the unique identifying columns from the Data Scientist was a good move, and that's then the right query to generate the extra columns he needs. The plan shows an estimated 10G memory of an "optimal" i.e. in-memory sort, so my estimate of 9GB was in the right ballpark.
If this is taking 4.5 hours running parallel 8 the thing to check is HOW is the time being used. Are your PX slaves using 100% CPU, are they stalled waiting for CPU, is time going on interprocess communication with that hash distribution, are you spending a lot of time on I/O, and is this for tablescans, temp reads and writes, or reading undo. Is it possible that Oracle hasn't noticed a data skew and one process is doing almost all the sorting ... etc,
I've just published a script that will report all the session stats for the PX processes in a parallel query; I'm looking for the one that does the same for events and will publish that shortly. In 4.5 hours you should be able to get some good indications of why the CTAS is taking so long.

Regards
Jonathan Lewis

UPDATE: The script for parallel wait events is now online.

Sam_P

Thanks @jonathan-lewis2 . I had to manually cancel the CTAS after 4.5 hrs as I could not go on...
Thanks for your 2 scripts. If I understand correctly, I should run 3 separate windows (sessions) of Oracle SQLCl (command line) tool -- in one window, I start running the CTAS, followed by running your 2 separate scripts in the rest of the windows.
For this test, I have done the following:-
1. Created a duplicate copy of the source table with only the columns that we need and called it as TB_SOURCE_COPY_TRIMMED.
2. Columns SEQNUM, ULIMORE and VISIT_DATE_DT do not contain any NULL values.

3. Because we know from the Data Scientist that the composite columns (SEQNUM, ULIMORE, VISIT_DATE_DT) uniquely identify a record, I proceeded to create a UNIQUE index and a Primary Key constraint on TB_SOURCE_COPY_TRIMMED table. @jonathan-lewis2 @alvinder Please see the script attached.
4. Running the EXPLAIN PLAN shows:-

@jonathan-lewis2 I'll attach the px_stats.log and px_wait.log files after a few hours pass-by.
01_TB_SOURCE_COPY_TRIMMED.txt (4.42 KB)

Solomon Yakobson

Again, compare plans:

SQL> CREATE UNIQUE INDEX UX_COMPOSITE_KEYS
  2  ON TB_SOURCE_COPY_TRIMMED
  3  (
  4          SEQNUM,
  5          ULIMORE,
  6          VISIT_DATE_DT
  7  );


Index created.


SQL> ALTER TABLE     TB_SOURCE_COPY_TRIMMED
  2  ADD CONSTRAINT  PK_COMPOSITE_KEYS
  3  PRIMARY KEY     (SEQNUM, ULIMORE, VISIT_DATE_DT)
  4  USING INDEX     UX_COMPOSITE_KEYS
  5  ;


Table altered.


SQL> set sqlbla on
SQL> EXPLAIN PLAN FOR
  2  CREATE TABLE TB_TARGET_WITH_LEAD_LAG
  3  PARALLEL 8
  4  NOLOGGING
  5  AS
  6  SELECT /*+ PARALLEL(8) */
  7          p.SEQNUM,
  8          p.ULIMORE,
  9          p.VISIT_DATE_DT,
 10
 11          LAG(p.ED_UCC_DT           IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_ED_UCC_DT,
 12          LEAD(p.ED_UCC_DT          IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_ED_UCC_DT,
 13
 14          LAG(p.ED_UCC_SEQNUM       IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_ED_UCC_SEQNUM,
 15          LEAD(p.ED_UCC_SEQNUM      IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_ED_UCC_SEQNUM,
 16
 17          LAG(p.DAD_ACUTE_DISP_DT   IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_DAD_DISP_AC_DT,
 18          LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_DAD_DISP_AC_DT,
 19
 20          LAG(p.DAD_ACUTE_SEQNUM    IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_DAD_DISP_AC_SEQNUM,
 21          LEAD(p.DAD_ACUTE_SEQNUM   IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_DAD_DISP_AC_SEQNUM
 22
 23  FROM
 24          TB_SOURCE_COPY_TRIMMED p
 25  ;


Explained.


SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1332415219


-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT             |                         |     1 |    78 |     3  (34)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                    |                         |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001                |     1 |    78 |     3  (34)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_TARGET_WITH_LEAD_LAG |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |                         |     1 |    78 |     3  (34)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      WINDOW SORT                   |                         |     1 |    78 |     3  (34)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX RECEIVE                   |                         |     1 |    78 |     2   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        PX SEND HASH                | :TQ10000                |     1 |    78 |     2   (0)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   8 |         PX BLOCK ITERATOR          |                         |     1 |    78 |     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|   9 |          TABLE ACCESS FULL         | TB_SOURCE_COPY_TRIMMED  |     1 |    78 |     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------------


Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - Degree of Parallelism is 8 because of hint


21 rows selected.


SQL> CREATE UNIQUE INDEX UX2_COMPOSITE_KEYS
  2  ON TB_SOURCE_COPY_TRIMMED
  3  (
  4          ULIMORE,
  5          VISIT_DATE_DT,
  6          SEQNUM,
  7          ED_UCC_DT,
  8          ED_UCC_SEQNUM,
  9          DAD_ACUTE_DISP_DT,
 10          DAD_ACUTE_ADMIT_DT,
 11          DAD_ACUTE_SEQNUM
 12  );


Index created.


SQL> EXPLAIN PLAN FOR
  2  CREATE TABLE TB_TARGET_WITH_LEAD_LAG
  3  PARALLEL 8
  4  NOLOGGING
  5  AS
  6  SELECT /*+ PARALLEL(8) */
  7          p.SEQNUM,
  8          p.ULIMORE,
  9          p.VISIT_DATE_DT,
 10
 11          LAG(p.ED_UCC_DT           IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_ED_UCC_DT,
 12          LEAD(p.ED_UCC_DT          IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_ED_UCC_DT,
 13
 14          LAG(p.ED_UCC_SEQNUM       IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_ED_UCC_SEQNUM,
 15          LEAD(p.ED_UCC_SEQNUM      IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_ED_UCC_SEQNUM,
 16
 17          LAG(p.DAD_ACUTE_DISP_DT   IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_DAD_DISP_AC_DT,
 18          LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_DAD_DISP_AC_DT,
 19
 20          LAG(p.DAD_ACUTE_SEQNUM    IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_DAD_DISP_AC_SEQNUM,
 21          LEAD(p.DAD_ACUTE_SEQNUM   IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_DAD_DISP_AC_SEQNUM
 22
 23  FROM
 24          TB_SOURCE_COPY_TRIMMED p
 25  ;


Explained.


SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);


PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1877319916


-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT             |                         |     1 |    78 |     2   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR                    |                         |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)              | :TQ10001                |     1 |    78 |     1   (0)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HYBRID TSM/HWMB)| TB_TARGET_WITH_LEAD_LAG |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING |                         |     1 |    78 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE                    |                         |     1 |    78 |     1   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND ROUND-ROBIN          | :TQ10000                |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | S->P | RND-ROBIN  |
|   7 |        PX SELECTOR                 |                         |       |       |            |          |  Q1,00 | SCWC |            |
|   8 |         WINDOW BUFFER              |                         |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | SCWC |            |
|   9 |          INDEX FULL SCAN           | UX2_COMPOSITE_KEYS      |     1 |    78 |     1   (0)| 00:00:01 |  Q1,00 | SCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------------


Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - Degree of Parallelism is 8 because of hint


21 rows selected.


SQL>

SY.

Sam_P

@jonathan-lewis2 . Attached below are the log files.

px_stats.log (2.12 MB)px_waits.log (260.66 KB)

Sam_P

@solomon-yakobson , I had tried your "compare plans" test yesterday and I did see an INDEX FULL SCAN with WINDOW BUFFER, however, when I ran the CTAS with this plan, it kept on going for over 4 hrs. without any improvement as it still has to scan through ALL 156M records to be able to do LEAD/LAG.
@alvinder , I still haven't tried your new script for DBMS_PARALLEL_EXECUTE but will try it as soon as I can and will keep you posted. :)

Solomon Yakobson

You posted populate TB_SOURCE_COPY_TRIMMED table took ~ 17 mins. How long does it take to:

SET TIMING ON
BEGIN
    FOR V_REC IN (SELECT * FROM TB_SOURCE_COPY_TRIMMED ORDER BY ULIMORE,VISIT_DATE_DT) LOOP
      NULL;
    END LOOP;
END;
/

SY.

Sam_P

@solomon-yakobson It took 10 mins & 33 secs to run the above LOOP code block.

@jonathan-lewis2 More info for you as you analyze the log files I had provided in my earlier post. Apparently, we do not have "in-memory" enabled as it is a separate license -- hence, the bottleneck is likely to come from Disk I/O or CPU waits or some other reason, which you may be able to tell from the logs I had attached. Here are some params configured currently by our DBA:-

Here's our current SGA and PGA configuration:

Thanks!

Solomon Yakobson

Any chance you are on 11G? There was a bug which was fixed in 12C where LAG/LEAD with IGNORE NULLS was extremely slow:

SQL> CREATE TABLE a AS SELECT LEVEL ID ,decode(mod(LEVEL,10),1,'CODE'||LEVEL) CODE from dual CONNECT BY LEVEL<20001;

Table created.

SQL> CREATE TABLE TEST AS
  2  SELECT ID,
  3  LAG(CODE ignore nulls) OVER(ORDER BY ID ) CODE
  4  FROM a;

Table created.

Elapsed: 00:00:44.36
SQL> drop table test purge;

Table dropped.

Elapsed: 00:00:00.01
SQL> CREATE TABLE TEST AS
  2  SELECT ID,
  3  LAG(CODE) OVER(ORDER BY ID ) CODE
  4  FROM a;

Table created.

Elapsed: 00:00:00.06
SQL>

If so, use LAST_VALUE/FIRST_VALUE with IGNORE NULLS.
SY.

Sam_P

@solomon-yakobson
We're on "Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.11.0.0.0"

Solomon Yakobson

Ah, I thought it was fixed but it wasn't and in 19C it is even worse than in 12C.
12C:

SQL> select version from v$instance;

VERSION
-----------------
12.2.0.1.0

Elapsed: 00:00:00.01
SQL> CREATE TABLE a AS SELECT LEVEL ID ,decode(mod(LEVEL,10),1,'CODE'||LEVEL) CODE from dual CONNECT BY LEVEL<20001;

Table created.

Elapsed: 00:00:00.16
SQL> CREATE TABLE TEST_LAG AS
  2      SELECT ID,
  3      LAG(CODE ignore nulls) OVER(ORDER BY ID ) CODE
  4      FROM a;

Table created.

Elapsed: 00:00:23.04
SQL> CREATE TABLE TEST_LAST_VALUE AS
  2      SELECT ID,
  3      LAST_VALUE(CODE ignore nulls) OVER(ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) CODE
  4      FROM a;

Table created.

Elapsed: 00:00:00.04
SQL> SELECT * FROM TEST_LAG
  2  MINUS
  3  SELECT * FROM TEST_LAST_VALUE;

no rows selected

Elapsed: 00:00:00.01
SQL>

And 19C:

SQL> select version_full from v$instance;

VERSION_FULL
-----------------
19.13.0.0.0

Elapsed: 00:00:00.05
SQL> CREATE TABLE a AS SELECT LEVEL ID ,decode(mod(LEVEL,10),1,'CODE'||LEVEL) CODE from dual CONNECT BY LEVEL<20001;

Table created.

Elapsed: 00:00:00.16
SQL> CREATE TABLE TEST_LAG AS
  2      SELECT ID,
  3      LAG(CODE ignore nulls) OVER(ORDER BY ID ) CODE
  4      FROM a;

Table created.

Elapsed: 00:00:39.20
SQL> CREATE TABLE TEST_LAST_VALUE AS
  2      SELECT ID,
  3      LAST_VALUE(CODE ignore nulls) OVER(ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) CODE
  4      FROM a;

Table created.

Elapsed: 00:00:00.09
SQL> SELECT * FROM TEST_LAG
  2  MINUS
  3  SELECT * FROM TEST_LAST_VALUE;

no rows selected

Elapsed: 00:00:00.04
SQL>

Anyway, use LAST_VALUE/FIRST VALUE IGNORE NULLS instead. And, if you want to, open SR with Oracle Support.
SY.

Sam_P

[Solomon Yakobson](/ords/forums/user/Solomon Yakobson)
Thanks for creating a small test case to prove your point.
I am not too familiar with:

(... ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)

Simply changing to the following should be sufficient?

        FIRST_VALUE(p.ED_UCC_DT         IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_ED_UCC_DT,
        LAST_VALUE(p.ED_UCC_DT          IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_ED_UCC_DT,
        
        FIRST_VALUE(p.ED_UCC_SEQNUM     IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_ED_UCC_SEQNUM,
        LAST_VALUE(p.ED_UCC_SEQNUM      IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_ED_UCC_SEQNUM,
        
        FIRST_VALUE(p.DAD_ACUTE_DISP_DT IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_DAD_DISP_AC_DT,
        LAST_VALUE(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_DAD_DISP_AC_DT,
        
        FIRST_VALUE(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS PREV_DAD_DISP_AC_SEQNUM,
        LAST_VALUE(p.DAD_ACUTE_SEQNUM   IGNORE NULLS)   OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)         AS NEXT_DAD_DISP_AC_SEQNUM

Solomon Yakobson

LAG(p.ED_UCC_DT IGNORE NULLS) OVER(PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)

Is same as

LAST_VALUE(p.ED_UCC_DT IGNORE NULLS) OVER(PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)

And

LEAD(p.ED_UCC_DT IGNORE NULLS) OVER(PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)

Is same as

FIRST_VALUE(p.ED_UCC_DT IGNORE NULLS) OVER(PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING))

However, I just checked and it appears LEAD IGNORE NULLS is fine:

SQL> CREATE TABLE TEST_LEAD
 2  AS
 3   SELECT ID,
 4       LEAD(CODE ignore nulls) OVER(ORDER BY ID ) CODE
 5    FROM a
 6 /

Table created.

Elapsed: 00:00:00.13
SQL> CREATE TABLE TEST_FIRST_VALUE
 2  AS
 3   SELECT ID,
 4       FIRST_VALUE(CODE ignore nulls) OVER(ORDER BY ID ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING) CODE
 5    FROM a
 6 /

Table created.

Elapsed: 00:00:00.11
SQL>

SY.

Sam_P

Cool, thanks a lot @solomon-yakobson . I'll give a shot but our current server is most likely running out of RAM and as a result, it is overflowing the workloads into TEMP files, hence, excessively causing Disk I/O bottlenecks.

Sam_P

Just wanted to give a quick update to everyone -- I ended up testing and using the DBMS_PARALLEL_EXECUTE technique but with slightly different method of obtaining the ROWIDs for chunk processing. Thanks to @alvinder for showing an example on how to achieve this.
The whole target table with 156M rows was created in 2hrs and 44mins using the DBMS_PARALLEL_EXECUTE method with direct-path load INSERT query.
When we contacted Oracle Support, they acknowledged that the LAG function, when used especially with IGNORE NULLS clause is known to run forever. This is a known bug and there is no patch-fix for this at the moment in 19c.
Thank you everyone, truly appreciate all your help and support.

Rocky

@sam-p - just to add your point here also a link which has shown all the dbms_parllel_execute options and examples -
ORACLE-BASE - DBMS_PARALLEL_EXECUTE

1 person found this helpful

alvinder

@sam-p The whole target table with 156M rows was created in 2hrs and 44mins using the DBMS_PARALLEL_EXECUTE method with direct-path load INSERT query.
2hrs and 44mins.
If i understand correctly
What is direct path load in Oracle?
A direct path load calls on Oracle to lock tables and indexes at the start of the load and releases them when the load is finished. A conventional path load calls Oracle once for each array of rows to process a SQL INSERT statement. A direct path load uses multiblock asynchronous I/O for writes to the database files.
Shouldn't be using the direct-path load. That is the reason i created an index on all the columns that are used. That way you will be using index lookup's.
Basically if you are running 10 jobs in parallel the 9 will be waiting for the lock to be released. I might be wrong but that is what my understanding is.
Also try to have a bucket of a size that is acceptable.
What you can do is run the query in sql session and see how long it takes with different bucket size.
Glad to hear that your table got loaded.

Sam_P

@alvinder I tried your original code and it is taking over 4 hrs and it is still running. Is there a formula to determine an optimal number of buckets needed and an optimal number of parallel_level => NN?
The method I have tried and tested (that took 2hrs and 44 mins) used the CHUNK_BY_ROWID method, whereby, I determined an algorithm to determine a way to chunk ROWID ranges and store them in a table. Then, I used a user-defined framework whereby I loop over each ROWID range one-by-one in a serial fashion, not in a parallel manner.
There is an example in the documentation https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_PARALLEL_EXECUTE.html#GUID-F45C3B26-821D-4545-AB48-FD03E82ED201 under Executing Chunks in an User-defined Framework section.
I find that with this method, I can create a re-usable and dynamic procedure without having to figure out the indexes and use of ora_hash and determining an optimal number of buckets and parallel level number and make it so that, if any table is provided into the procedure's input parameter, it should be able to figure things out on its own. The CHUNK BY ROWID method was more suitable in this case. Given that it processes one chunk at a time (inserts) in a LOOP (without any parallel processing), I can safely use the APPEND hint to lock the table while one chunk is being processed in a serial fashion. While I agree, that if I were to use your method of processing in parallel, I would not use the APPEND hint for direct-path load.

alvinder

@sam-p How many rows are you getting when you do CHUNK_BY_ROWID.
So you are running one bucket at a time. The example you mentioned is running 10 in parallel.
Depending upon your server try 4-5 in parallel with smaller buckets. It is trail an error to get the optimum size.
Try having 150M/2000 as number of buckets and run 4-5 in parallel. See how you go.
Anyway if you have got it working and happy with it then it is fine.

Sam_P

@alvinder I'm getting around 3,120 chunks with ROWID ranges. Even though it processes each chunk serially, they go by pretty fast.
Are you saying to have 75,000 buckets (150,000,000 / 2000) with a parallel_level => 5?

alvinder

Yes you can try that. Anyway you are executing serially for around 50000 times. This way you will run it in parallel and since it is just index lookup it might be lot faster. No harm in trying.
Also i don't think it is possible to get the right results for Chunks_by_rowid in your case as all the values for

ULIMORE

Might not be in the same chunk. That way lead and lag will not work as there is a where condition for start_id and end_id.
If you don't want to use ora_hash you can create a bucket table.
Code is attached.

DROP TABLE TB_SOURCE PURGE;
CREATE TABLE TB_SOURCE(
              SEQNUM      VARCHAR2(18),
              ULIMORE      VARCHAR2(15),
              VISIT_DATE_DT   DATE,
              ED_UCC_DT     DATE,
              ED_UCC_SEQNUM   VARCHAR2(18),
              DAD_ACUTE_DISP_DT DATE,
              DAD_ACUTE_SEQNUM VARCHAR2(18)
              )
 /
--Creating sample data.              
INSERT INTO tb_source
  SELECT
    ROWNUM
   , 
      dbms_random.string(
        'x'
       , 5
      )
     
   , sysdate - ROWNUM
   , sysdate - ROWNUM
   , ROWNUM
   , sysdate - ROWNUM
   , ROWNUM
    
  FROM
    dual
  CONNECT BY
    level < 100000;     


 truncate table tb_source_final;      


 CREATE TABLE TB_SOURCE_final(
              SEQNUM      VARCHAR2(18),
              ULIMORE      VARCHAR2(15),
              VISIT_DATE_DT   DATE,
              ED_UCC_DT_prev     DATE,
              ED_UCC_DT_next     DATE,
              ED_UCC_SEQNUM_prev   VARCHAR2(18),
              ED_UCC_SEQNUM_next   VARCHAR2(18),
              DAD_ACUTE_DISP_DT_prev DATE,
               DAD_ACUTE_DISP_DT_next DATE,
              DAD_ACUTE_SEQNUM_prev VARCHAR2(18),
               DAD_ACUTE_SEQNUM_next VARCHAR2(18)
             )
 /
 
 Create table tb_bucket as
 WITH t AS (
  SELECT DISTINCT
    ulimore
  FROM
    tb_source
), t1 AS (
  SELECT
    ulimore
   , NTILE(75000)
     OVER(
      ORDER BY
        ulimore
     ) bucket
  FROM
    t
)
SELECT bucket start_id , bucket end_id, 
  MIN(ulimore) min_ulimore
 , MAX(ulimore) max_ulimore
FROM
  t1
GROUP BY
  bucket;
create index tb_bucket_ix1 on tb_bucket(start_id,end_id,min_ulimore,max_ulimore);
 
 create index tb_source_ix1 on tb_source(
              ulimore     ,
              VISIT_DATE_DT   ,
              SEQNUM,
              ED_UCC_DT     ,
              ED_UCC_SEQNUM   ,
              DAD_ACUTE_DISP_DT ,
              DAD_ACUTE_SEQNUM  
              ) parallel(degree 8);
Alter index tb_source_ix1 noparallel;

create or replace PROCEDURE insert_parallel_execute AS
  l_chunk_sql VARCHAR2(1000);
  l_sql_stmt VARCHAR2(32000);
  l_try    NUMBER;
  l_status  NUMBER;
BEGIN
  BEGIN
    dbms_parallel_execute.drop_task('insert_aps');
  EXCEPTION
    WHEN OTHERS THEN
      NULL;
  END;
 -- Create the TASK
  dbms_parallel_execute.create_task('insert_aps');


 -- Chunk the table by MANAGER_ID
  l_chunk_sql := 'select start_id , end_id from tb_bucket';
   
  dbms_parallel_execute.create_chunks_by_sql(
                       'insert_aps'
                      , l_chunk_sql
                      , false
  );


 -- Execute the DML in parallel
 --  the WHERE clause contain a condition on manager_id, which is the chunk
 --  column. In this case, grouping rows is by manager_id.
  l_sql_stmt := ' insert into TB_SOURCE_final
 SELECT 
    p.SEQNUM,
    p.ULIMORE,
    p.VISIT_DATE_DT,
    LAG(p.ED_UCC_DT     IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_ED_UCC_DT,
    LEAD(p.ED_UCC_DT     IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_ED_UCC_DT,
    LAG(p.ED_UCC_SEQNUM   IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_ED_UCC_SEQNUM,
    LEAD(p.ED_UCC_SEQNUM   IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_ED_UCC_SEQNUM,
    LAG(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_DAD_DISP_AC_DT,
    LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_DAD_DISP_AC_DT,
    LAG(p.DAD_ACUTE_SEQNUM  IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS PREV_DAD_DISP_AC_SEQNUM,
    LEAD(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)   AS NEXT_DAD_DISP_AC_SEQNUM
  FROM
    TB_SOURCE p, tb_bucket
    where ulimore between min_ulimore and max_ulimore
    and start_id between :start_id and :end_id';
       
  dbms_parallel_execute.run_task(
                 'insert_aps'
                , l_sql_stmt
                , dbms_sql.native
                , parallel_level => 5
  );
END insert_parallel_execute;
/

 select status,count(*) from user_parallel_execute_chunks
 where task_name = 'insert_aps'
 group by status;

Sam_P

Thanks @alvinder for the code. Just wondering, why are you strictly relying on a single column ULIMORE, meanwhile, the composite columns (SEQNUM, ULIMORE, VISIT_DATE_DT) in this exact order, truly uniquely identify a record in that table. Would this change the code?

alvinder

Sam_P My reason is that you are partition by it.

PARTITION BY p.ULIMORE

So lets see an example like this
There are approximately 73K records between the 2 rowid's. If the chunk is 3000 records then this won't be picked up.
You need next and prev date for each ulimore order by visit_date_dt.
Hope this helps.

User_H3J7U

undefined (0 Bytes)LAG IGNORE NULLS is slower because it is calculated as nth_value.
21.3:

declare
  est clob;
begin
  dbms_utility.expand_sql_text(q'{
select id,
       LAG       (CODE             ) OVER(ORDER BY ID ) CODE1,
       LAG       (CODE ignore nulls) OVER(ORDER BY ID ) CODE2,
       LAST_VALUE(CODE ignore nulls) OVER(ORDER BY ID ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) CODE3
from a}', est);
  dbms_output.put_line(est);
end;
/

SELECT "A1"."ID" "ID",
   -- lag respect nulls

   DECODE(COUNT(*)                 OVER ( ORDER BY "A1"."ID" ROWS  BETWEEN 1 PRECEDING  AND 1 PRECEDING ),
       1, FIRST_VALUE("A1"."CODE") OVER ( ORDER BY "A1"."ID" ROWS  BETWEEN 1 PRECEDING  AND 1 PRECEDING ),
     NULL) "CODE1",

   -- lag ignore nulls
   NVL(   NTH_VALUE("A1"."CODE",1) FROM LAST IGNORE NULLS
                                   OVER ( ORDER BY "A1"."ID" ROWS  BETWEEN  UNBOUNDED  PRECEDING  AND 1 PRECEDING ),
          NULL) "CODE2",

   -- last_value ignore nulls
          LAST_VALUE("A1"."CODE" IGNORE NULLS) OVER ( ORDER BY "A1"."ID" ROWS  BETWEEN  UNBOUNDED  PRECEDING  AND 1 PRECEDING ) "CODE3"
FROM "2"."A" "A1";

1 person found this helpful

Jonathan Lewis

@solomon-yakobson
Good catch.
Interesting to see the expand_sql from @user-h3j7u on this one - the lead and lag conversions to nth_value(xxx,1) look totally symmetrical so one might expect their perfomance to be identical, but it looks as if Oracle must be doing something like stepping through rows for one but copying out the required range and then walking it for the other.
Do you have a MOS Doc Id that mentions this?
Regards
Jonathan Lewis

1 - 50

Added on Apr 1 2022

67 comments

5,731 views

8 mentions

SQL & PL/SQL

LEAD and LAG taking hours on a very large table!

Comments

Post Details