Discussions
Categories
- 197.1K All Categories
- 2.5K Data
- 546 Big Data Appliance
- 1.9K Data Science
- 450.7K Databases
- 221.9K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 552 MySQL Community Space
- 479 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.1K ORDS, SODA & JSON in the Database
- 555 SQLcl
- 4K SQL Developer Data Modeler
- 187.2K SQL & PL/SQL
- 21.3K SQL Developer
- 296.3K Development
- 17 Developer Projects
- 139 Programming Languages
- 293K Development Tools
- 109 DevOps
- 3.1K QA/Testing
- 646.1K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 158 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.2K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 19 Java Essentials
- 162 Java 8 Questions
- 86K Java Programming
- 81 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 204 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 466 LiveLabs
- 39 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 175 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 233 Portuguese
LEAD and LAG taking hours on a very large table!
Answers
-
Hi @Solomon Yakobson , I had created the index with the leading column being SEQNUM. Then ran the CTAS and it is taking forever.....over 4 hrs now. I had to manually cancel it, yet again.
I'll go ahead and drop the index and re-create it as follows:-
CREATE INDEX IX_COMPOSITE_COLUMNS ON TB_SOURCE ( ULIMORE, VISIT_DATE_DT, SEQNUM, ED_UCC_DT, ED_UCC_SEQNUM, DAD_ACUTE_DISP_DT, DAD_ACUTE_ADMIT_DT, DAD_ACUTE_SEQNUM, 1 -- Since all the above columns are NULLable, we need to add a constant for the Function-Based Index to be created ) TABLESPACE "ANALYTICS_INDEX" PARALLEL 8 NOLOGGING COMPUTE STATISTICS ; /
The EXPLAIN PLAN (it did not switch to WINDOW BUFFER):-
-
Hello @alvinder , thanks for sharing your code sample.
The ULIMORE column is a VARCHAR2, so your "bucket" query simply picks up the numeric values and ignores the rest of the string values.
-
Show CTAS statement.
SY.
-
EXPLAIN PLAN FOR CREATE TABLE TB_TARGET PCTFREE 0 PARALLEL 8 NOLOGGING AS SELECT /*+ PARALLEL(8) */ p.SEQNUM, p.ULIMORE, p.VISIT_DATE_DT, LAG(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_DT, LEAD(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_DT, LAG(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_SEQNUM, LEAD(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_SEQNUM, LAG(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_DT, LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_DT, LAG(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_SEQNUM, LEAD(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_SEQNUM FROM TB_SOURCE p ; /
I even tried matching the columns like they were in the INDEX above but still the EXPLAIN PLAN does not perform a WINDOW BUFFER but rather, it shows WINDOW SORT:-
EXPLAIN PLAN FOR CREATE TABLE TB_TARGET PCTFREE 0 PARALLEL 8 NOLOGGING AS SELECT /*+ PARALLEL(8) */ p.ULIMORE, p.VISIT_DATE_DT, p.SEQNUM, LAG(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_DT, LEAD(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_DT, LAG(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_SEQNUM, LEAD(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_SEQNUM, LAG(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_DT, LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_DT, LAG(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_SEQNUM, LEAD(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_SEQNUM FROM TB_SOURCE p ; /
-
Post create TB_SOURCE table.
SY.
-
Tested with varchar2 for Ulimore.
I am using ora_hash function to assign a number to a string.
This code works for me. I have created 100 buckets running 15 in parallel. You can go for more data. My sample data is 100000
DROP TABLE TB_SOURCE PURGE; CREATE TABLE TB_SOURCE( SEQNUM VARCHAR2(18), ULIMORE VARCHAR2(15), VISIT_DATE_DT DATE, ED_UCC_DT DATE, ED_UCC_SEQNUM VARCHAR2(18), DAD_ACUTE_DISP_DT DATE, DAD_ACUTE_SEQNUM VARCHAR2(18) ) / --Creating sample data. INSERT INTO tb_source SELECT ROWNUM , dbms_random.string( 'x' , 5 ) , sysdate - ROWNUM , sysdate - ROWNUM , ROWNUM , sysdate - ROWNUM , ROWNUM FROM dual CONNECT BY level < 100000; truncate table tb_source_final; CREATE TABLE TB_SOURCE_final( SEQNUM VARCHAR2(18), ULIMORE VARCHAR2(15), VISIT_DATE_DT DATE, ED_UCC_DT_prev DATE, ED_UCC_DT_next DATE, ED_UCC_SEQNUM_prev VARCHAR2(18), ED_UCC_SEQNUM_next VARCHAR2(18), DAD_ACUTE_DISP_DT_prev DATE, DAD_ACUTE_DISP_DT_next DATE, DAD_ACUTE_SEQNUM_prev VARCHAR2(18), DAD_ACUTE_SEQNUM_next VARCHAR2(18) ) / create index tb_source_ix1 on tb_source( ora_hash(ulimore,155000000) , VISIT_DATE_DT , SEQNUM, ED_UCC_DT , ED_UCC_SEQNUM , DAD_ACUTE_DISP_DT , DAD_ACUTE_SEQNUM ) parallel(degree 8); Alter index tb_source_ix1 noparallel; create or replace PROCEDURE insert_parallel_execute AS l_chunk_sql VARCHAR2(1000); l_sql_stmt VARCHAR2(32000); l_try NUMBER; l_status NUMBER; BEGIN BEGIN dbms_parallel_execute.drop_task('insert_aps'); EXCEPTION WHEN OTHERS THEN NULL; END; -- Create the TASK dbms_parallel_execute.create_task('insert_aps'); -- Chunk the table by MANAGER_ID l_chunk_sql := 'WITH t AS ( SELECT DISTINCT ora_hash(ulimore,155000000,0) ulimore FROM tb_source ), t1 AS ( SELECT ulimore , NTILE(100) OVER( ORDER BY ulimore ) bucket FROM t ) SELECT MIN(ulimore)start_id , MAX(ulimore) end_id FROM t1 GROUP BY bucket '; dbms_parallel_execute.create_chunks_by_sql( 'insert_aps' , l_chunk_sql , false ); -- Execute the DML in parallel -- the WHERE clause contain a condition on manager_id, which is the chunk -- column. In this case, grouping rows is by manager_id. l_sql_stmt := ' insert into TB_SOURCE_final SELECT p.SEQNUM, p.ULIMORE, p.VISIT_DATE_DT, LAG(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_DT, LEAD(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_DT, LAG(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_SEQNUM, LEAD(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_SEQNUM, LAG(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_DT, LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_DT, LAG(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_SEQNUM, LEAD(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_SEQNUM FROM TB_SOURCE p where ora_hash(ulimore,155000000) between :start_id and :end_id'; dbms_parallel_execute.run_task( 'insert_aps' , l_sql_stmt , dbms_sql.native , parallel_level => 15 ); END insert_parallel_execute; / select status,count(*) from user_parallel_execute_chunks where task_name = 'insert_aps' group by status;
-
Coming back to this after some time away.
Getting the unique identifying columns from the Data Scientist was a good move, and that's then the right query to generate the extra columns he needs. The plan shows an estimated 10G memory of an "optimal" i.e. in-memory sort, so my estimate of 9GB was in the right ballpark.
If this is taking 4.5 hours running parallel 8 the thing to check is HOW is the time being used. Are your PX slaves using 100% CPU, are they stalled waiting for CPU, is time going on interprocess communication with that hash distribution, are you spending a lot of time on I/O, and is this for tablescans, temp reads and writes, or reading undo. Is it possible that Oracle hasn't noticed a data skew and one process is doing almost all the sorting ... etc,
I've just published a script that will report all the session stats for the PX processes in a parallel query; I'm looking for the one that does the same for events and will publish that shortly. In 4.5 hours you should be able to get some good indications of why the CTAS is taking so long.
Regards
Jonathan Lewis
UPDATE: The script for parallel wait events is now online.
-
Thanks @Jonathan Lewis . I had to manually cancel the CTAS after 4.5 hrs as I could not go on...
Thanks for your 2 scripts. If I understand correctly, I should run 3 separate windows (sessions) of Oracle SQLCl (command line) tool -- in one window, I start running the CTAS, followed by running your 2 separate scripts in the rest of the windows.
For this test, I have done the following:-
1. Created a duplicate copy of the source table with only the columns that we need and called it as TB_SOURCE_COPY_TRIMMED.
2. Columns SEQNUM, ULIMORE and VISIT_DATE_DT do not contain any NULL values.
3. Because we know from the Data Scientist that the composite columns (SEQNUM, ULIMORE, VISIT_DATE_DT) uniquely identify a record, I proceeded to create a UNIQUE index and a Primary Key constraint on TB_SOURCE_COPY_TRIMMED table. @Solomon Yakobson, @Jonathan Lewis @alvinder Please see the script attached.
4. Running the EXPLAIN PLAN shows:-
@Jonathan Lewis I'll attach the px_stats.log and px_wait.log files after a few hours pass-by.
-
Again, compare plans:
SQL> CREATE UNIQUE INDEX UX_COMPOSITE_KEYS 2 ON TB_SOURCE_COPY_TRIMMED 3 ( 4 SEQNUM, 5 ULIMORE, 6 VISIT_DATE_DT 7 ); Index created. SQL> ALTER TABLE TB_SOURCE_COPY_TRIMMED 2 ADD CONSTRAINT PK_COMPOSITE_KEYS 3 PRIMARY KEY (SEQNUM, ULIMORE, VISIT_DATE_DT) 4 USING INDEX UX_COMPOSITE_KEYS 5 ; Table altered. SQL> set sqlbla on SQL> EXPLAIN PLAN FOR 2 CREATE TABLE TB_TARGET_WITH_LEAD_LAG 3 PARALLEL 8 4 NOLOGGING 5 AS 6 SELECT /*+ PARALLEL(8) */ 7 p.SEQNUM, 8 p.ULIMORE, 9 p.VISIT_DATE_DT, 10 11 LAG(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_DT, 12 LEAD(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_DT, 13 14 LAG(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_SEQNUM, 15 LEAD(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_SEQNUM, 16 17 LAG(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_DT, 18 LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_DT, 19 20 LAG(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_SEQNUM, 21 LEAD(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_SEQNUM 22 23 FROM 24 TB_SOURCE_COPY_TRIMMED p 25 ; Explained. SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 1332415219 ------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------- | 0 | CREATE TABLE STATEMENT | | 1 | 78 | 3 (34)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 1 | 78 | 3 (34)| 00:00:01 | Q1,01 | P->S | QC (RAND) | | 3 | LOAD AS SELECT (HYBRID TSM/HWMB)| TB_TARGET_WITH_LEAD_LAG | | | | | Q1,01 | PCWP | | | 4 | OPTIMIZER STATISTICS GATHERING | | 1 | 78 | 3 (34)| 00:00:01 | Q1,01 | PCWP | | | 5 | WINDOW SORT | | 1 | 78 | 3 (34)| 00:00:01 | Q1,01 | PCWP | | | 6 | PX RECEIVE | | 1 | 78 | 2 (0)| 00:00:01 | Q1,01 | PCWP | | | 7 | PX SEND HASH | :TQ10000 | 1 | 78 | 2 (0)| 00:00:01 | Q1,00 | P->P | HASH | | 8 | PX BLOCK ITERATOR | | 1 | 78 | 2 (0)| 00:00:01 | Q1,00 | PCWC | | | 9 | TABLE ACCESS FULL | TB_SOURCE_COPY_TRIMMED | 1 | 78 | 2 (0)| 00:00:01 | Q1,00 | PCWP | | ------------------------------------------------------------------------------------------------------------------------------------------- Note ----- - dynamic statistics used: dynamic sampling (level=2) - Degree of Parallelism is 8 because of hint 21 rows selected. SQL> CREATE UNIQUE INDEX UX2_COMPOSITE_KEYS 2 ON TB_SOURCE_COPY_TRIMMED 3 ( 4 ULIMORE, 5 VISIT_DATE_DT, 6 SEQNUM, 7 ED_UCC_DT, 8 ED_UCC_SEQNUM, 9 DAD_ACUTE_DISP_DT, 10 DAD_ACUTE_ADMIT_DT, 11 DAD_ACUTE_SEQNUM 12 ); Index created. SQL> EXPLAIN PLAN FOR 2 CREATE TABLE TB_TARGET_WITH_LEAD_LAG 3 PARALLEL 8 4 NOLOGGING 5 AS 6 SELECT /*+ PARALLEL(8) */ 7 p.SEQNUM, 8 p.ULIMORE, 9 p.VISIT_DATE_DT, 10 11 LAG(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_DT, 12 LEAD(p.ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_DT, 13 14 LAG(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_ED_UCC_SEQNUM, 15 LEAD(p.ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_ED_UCC_SEQNUM, 16 17 LAG(p.DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_DT, 18 LEAD(p.DAD_ACUTE_ADMIT_DT IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_DT, 19 20 LAG(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS PREV_DAD_DISP_AC_SEQNUM, 21 LEAD(p.DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_SEQNUM 22 23 FROM 24 TB_SOURCE_COPY_TRIMMED p 25 ; Explained. SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------------------------------- Plan hash value: 1877319916 ------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | ------------------------------------------------------------------------------------------------------------------------------------------- | 0 | CREATE TABLE STATEMENT | | 1 | 78 | 2 (0)| 00:00:01 | | | | | 1 | PX COORDINATOR | | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | 1 | 78 | 1 (0)| 00:00:01 | Q1,01 | P->S | QC (RAND) | | 3 | LOAD AS SELECT (HYBRID TSM/HWMB)| TB_TARGET_WITH_LEAD_LAG | | | | | Q1,01 | PCWP | | | 4 | OPTIMIZER STATISTICS GATHERING | | 1 | 78 | 1 (0)| 00:00:01 | Q1,01 | PCWP | | | 5 | PX RECEIVE | | 1 | 78 | 1 (0)| 00:00:01 | Q1,01 | PCWP | | | 6 | PX SEND ROUND-ROBIN | :TQ10000 | 1 | 78 | 1 (0)| 00:00:01 | Q1,00 | S->P | RND-ROBIN | | 7 | PX SELECTOR | | | | | | Q1,00 | SCWC | | | 8 | WINDOW BUFFER | | 1 | 78 | 1 (0)| 00:00:01 | Q1,00 | SCWC | | | 9 | INDEX FULL SCAN | UX2_COMPOSITE_KEYS | 1 | 78 | 1 (0)| 00:00:01 | Q1,00 | SCWP | | ------------------------------------------------------------------------------------------------------------------------------------------- Note ----- - dynamic statistics used: dynamic sampling (level=2) - Degree of Parallelism is 8 because of hint 21 rows selected. SQL>
SY.
-