Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

LEAD and LAG taking hours on a very large table!

Sam_PApr 1 2022 — edited Apr 3 2022

Hello,
I have a source table (partitioned and indexed) with approx. 156 Million records. I am trying to create a target table using CTAS method by reading records from the source table with a simple SELECT query that does simple LEAD/LAG operations on a date-column and it is taking over 8 hrs!!
Upon searching online, one developer recommended to try the MATCH_RECOGNIZE technique as it is claimed to run faster than the Analytic SQL functions on very large tables. I have not tested this yet as I need help in converting the SQL query into the MATCH_RECOGNIZE statement, which would result in the same result-set.
CREATE TABLE TB_TARGET
PARALLEL 8
NOLOGGING
AS
SELECT /*+ PARALLEL_INDEX(8) PARALLEL(8) */
PAT_ID,
VISIT_DATE_DT,
ROW_NUMBER() OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS ROW_NUM,
ED_UCC_DT,
LAG(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS PREV_ED_UCC_DT,
LEAD(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS NEXT_ED_UCC_DT
FROM TB_SOURCE; --Very large table!

The result set for a single PAT_ID record (100123456) looks like this:-
image.png
Questions:-
1. How do I speed this query up when it runs over 156 Million records with multiple LEAD and LAG operations added into the above query on other columns in addition to the above column?
2. If the fastest solution is to use MATCH_RECOGNIZE, could you please help me with composing a SQL query statement such that it yields the same results as shown above (along with NULLs)?
3. Any other solution is welcomed as long as I can reduce the query execution time to an acceptable duration other than 8 hrs.
Thank you in advance.

This post has been answered by alvinder on Apr 6 2022
Jump to Answer

Comments

577516
Did you try:

Limit columns based on section values or
Limit rows based on section values
in pivot table ?
RMach
Yes.

Each row is being shown because it is being used at some point in the pivot table. We tried to add "AA" IS NOT NULL filters to the column, but they would prevent the rows that were null for "AA" from being displayed for the other rows that did have value.
577516
CAn you let us know which columns are placed in section area, which in row area and which in columns area?
RMach
Sorry, I just edited the original message to be a little clearer.

In the end we are hoping to make a financial statement.
AA = Revenues
BB = Costs
First Formula = Gross Margin
Second Formula = Gross Margin %
CC = Pay
DD = Taxes

This is a simplified version, if we can get this to show what we want it too, we can figure the rest out.


Right now we are trying to do this with two different avenues, neither are working. The first avenue is the one we have listed above. We think it is the most promising if we can get zero suppression to remove the rows of nulls.

The other path we are exploring is to just use the hierarchy as is.

Columns:
Measure Lables

Rows:
Acct-Lvl 1
Acct-Lvl 2

Measures:
Actual for Current Month

The problem with this path is that we cannot get Gross Margin and Gross Margin % in because they are formulas. They work on the first level (Sort of, they are auto sorted to the bottom of the pivot table), but when you add the second level, those formula attempt to calculate for each secondary level with values, and the table becomes monsterous.



Is the problem only solvable within the cube itself?
RMach
Well, the person managing the database has altered how things work a bit in an attempt to fix this.

Thus the answer to the orignal question in this thread is that it can't be done without an alteration in the back end.
1 - 5

Post Details

Added on Apr 1 2022
67 comments
5,937 views