Forum Stats

  • 3,838,667 Users
  • 2,262,392 Discussions
  • 7,900,730 Comments

Discussions

LEAD and LAG taking hours on a very large table!

Sam_P
Sam_P Member Posts: 155 Blue Ribbon
edited Apr 3, 2022 5:17AM in SQL & PL/SQL

Hello,

I have a source table (partitioned and indexed) with approx. 156 Million records. I am trying to create a target table using CTAS method by reading records from the source table with a simple SELECT query that does simple LEAD/LAG operations on a date-column and it is taking over 8 hrs!!

Upon searching online, one developer recommended to try the MATCH_RECOGNIZE technique as it is claimed to run faster than the Analytic SQL functions on very large tables. I have not tested this yet as I need help in converting the SQL query into the MATCH_RECOGNIZE statement, which would result in the same result-set.

CREATE TABLE TB_TARGET

PARALLEL 8

NOLOGGING

AS

SELECT /*+ PARALLEL_INDEX(8) PARALLEL(8) */

    PAT_ID,

    VISIT_DATE_DT,

    ROW_NUMBER() OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS ROW_NUM,

  ED_UCC_DT,

    LAG(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS PREV_ED_UCC_DT,

    LEAD(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS NEXT_ED_UCC_DT

FROM TB_SOURCE; --Very large table!


The result set for a single PAT_ID record (100123456) looks like this:-


Questions:-

1. How do I speed this query up when it runs over 156 Million records with multiple LEAD and LAG operations added into the above query on other columns in addition to the above column?

2. If the fastest solution is to use MATCH_RECOGNIZE, could you please help me with composing a SQL query statement such that it yields the same results as shown above (along with NULLs)?

3. Any other solution is welcomed as long as I can reduce the query execution time to an acceptable duration other than 8 hrs.

Thank you in advance.

Best Answer

  • alvinder
    alvinder Member Posts: 437 Silver Badge
    edited Apr 7, 2022 12:52AM Answer ✓

    Tested with varchar2 for Ulimore.

    I am using ora_hash function to assign a number to a string.


    This code works for me. I have created 100 buckets running 15 in parallel. You can go for more data. My sample data is 100000


    DROP TABLE TB_SOURCE PURGE;
    
    
    CREATE TABLE TB_SOURCE(
                               SEQNUM            VARCHAR2(18),
                               ULIMORE           VARCHAR2(15),
                               VISIT_DATE_DT     DATE,
                               ED_UCC_DT         DATE,
                               ED_UCC_SEQNUM     VARCHAR2(18),
                               DAD_ACUTE_DISP_DT DATE,
                               DAD_ACUTE_SEQNUM  VARCHAR2(18)
                               )
     /
    --Creating sample data.                           
    INSERT INTO tb_source
        SELECT
            ROWNUM
          , 
                dbms_random.string(
                    'x'
                  , 5
                )
            
          , sysdate - ROWNUM
          , sysdate - ROWNUM
          , ROWNUM
          , sysdate - ROWNUM
          , ROWNUM
          
        FROM
            dual
        CONNECT BY
            level < 100000;         
    
    
      truncate table tb_source_final;          
    
    
     CREATE TABLE TB_SOURCE_final(
                               SEQNUM            VARCHAR2(18),
                               ULIMORE           VARCHAR2(15),
                               VISIT_DATE_DT     DATE,
                               ED_UCC_DT_prev         DATE,
                                ED_UCC_DT_next         DATE,
                               ED_UCC_SEQNUM_prev     VARCHAR2(18),
                                ED_UCC_SEQNUM_next     VARCHAR2(18),
                               DAD_ACUTE_DISP_DT_prev DATE,
                                 DAD_ACUTE_DISP_DT_next DATE,
                               DAD_ACUTE_SEQNUM_prev  VARCHAR2(18),
                                 DAD_ACUTE_SEQNUM_next  VARCHAR2(18)
                              )
      /
     
     
     create index tb_source_ix1 on tb_source(
                                ora_hash(ulimore,155000000)           ,
                               VISIT_DATE_DT     ,
                               SEQNUM,
                               ED_UCC_DT         ,
                               ED_UCC_SEQNUM     ,
                               DAD_ACUTE_DISP_DT ,
                               DAD_ACUTE_SEQNUM  
                               ) parallel(degree 8);
    Alter index tb_source_ix1 noparallel;
    
    create or replace PROCEDURE insert_parallel_execute AS
        l_chunk_sql VARCHAR2(1000);
        l_sql_stmt  VARCHAR2(32000);
        l_try       NUMBER;
        l_status    NUMBER;
    BEGIN
        BEGIN
            dbms_parallel_execute.drop_task('insert_aps');
        EXCEPTION
            WHEN OTHERS THEN
                NULL;
        END;
      -- Create the TASK
        dbms_parallel_execute.create_task('insert_aps');
    
    
      -- Chunk the table by MANAGER_ID
        l_chunk_sql := 'WITH t AS (
        SELECT DISTINCT
            ora_hash(ulimore,155000000,0) ulimore
        FROM
            tb_source
    ), t1 AS (
        SELECT
            ulimore
          , NTILE(100)
              OVER(
                ORDER BY
                    ulimore
              ) bucket
        FROM
            t
    )
    SELECT
       MIN(ulimore)start_id
      , MAX(ulimore) end_id
    FROM
        t1
    GROUP BY
        bucket ';
        
        dbms_parallel_execute.create_chunks_by_sql(
                                                  'insert_aps'
                                                , l_chunk_sql
                                                , false
        );
    
    
      -- Execute the DML in parallel
      --   the WHERE clause contain a condition on manager_id, which is the chunk
      --   column. In this case, grouping rows is by manager_id.
        l_sql_stmt := ' insert into  TB_SOURCE_final
      SELECT 
            p.SEQNUM,
            p.ULIMORE,
           p.VISIT_DATE_DT,
           LAG(p.ED_UCC_DT          IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_DT,
           LEAD(p.ED_UCC_DT         IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_DT,
           LAG(p.ED_UCC_SEQNUM      IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_ED_UCC_SEQNUM,
           LEAD(p.ED_UCC_SEQNUM     IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_ED_UCC_SEQNUM,
           LAG(p.DAD_ACUTE_DISP_DT  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_DT,
           LEAD(p.DAD_ACUTE_DISP_DT IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_DT,
           LAG(p.DAD_ACUTE_SEQNUM   IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS PREV_DAD_DISP_AC_SEQNUM,
           LEAD(p.DAD_ACUTE_SEQNUM  IGNORE NULLS)  OVER (PARTITION BY p.ULIMORE ORDER BY p.VISIT_DATE_DT)     AS NEXT_DAD_DISP_AC_SEQNUM
       FROM
           TB_SOURCE p
           where ora_hash(ulimore,155000000) between  :start_id and :end_id';
                 
        dbms_parallel_execute.run_task(
                                      'insert_aps'
                                    , l_sql_stmt
                                    , dbms_sql.native
                                    , parallel_level => 15
        );
    END insert_parallel_execute;
    /
    
      select status,count(*) from user_parallel_execute_chunks
      where task_name = 'insert_aps'
      group by status;
      
    
     
«134567

Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,246 Red Diamond
    edited Apr 1, 2022 4:24PM

    Hi, @Sam_P

    Whenever you have a question, please post a little sample data (CREATE TABLE and INSERT statements for all tables involved, relevant columns only) so the people who want to help you can re-create the problem and test their ideas. Also post the exact results you want from that data, and explain why you want those results from that data. Always post your complete Oracle version (e.g. 18.4.0.0.0).

    with multiple LEAD and LAG operations added into the above query on other columns in addition to the above column?

    To how many columns (besides ed_ucc_dt) will you want to apply LAG and LEAD? Include one in your sample data and results

     LAG(ED_UCC_DT) IGNORE NULLS OVER (PARTITION BY PAT_ID ORDER BY VISIT_DATE_DT) AS PREV_ED_UCC_DT,

    Since you're concerned about NULLS, include examples of NULLs in your sample data and results.

    Is the combination (pat_id, visit_date_dt) unique? If not, include examples of ties in your sample data, results and explanation.

    Given that you need to generate the ROW_NUM column, analytic functions might be the best you can do. LAG and LEAD can always be replaced FIRST_VALUE or LAST_VALUE, but I'm not sure they will be any faster.

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 19,601 Red Diamond

    In order to do LEAD/LAG oracle needs to fetch ALL rows. Then it needs to sort it. Check execution plan. Most likely sort doesn't fit in memory and time is spent on disk sort. Having index on PAT_ID, VISIT_DATE_DT or better PAT_ID, VISIT_DATE_DT, ED_UCC_DT could improve sort. Also, how table is partitioned? If it is partitioned by PAT_ID then paralllel could also improve performance.

    SY.

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon

    @Frank Kulash

    My apologies I didn't attach sample data as it is Healthcare data and private in nature, so I'll have to spend more time to de-identify to create a sample data.

    Oracle version is "Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.11.0.0.0".

    Combination (pat_id, visit_date_dt) is not unique. ROW_NUM() is not required per say, it was only for my own purposes.


    @Solomon Yakobson

    I think you're on to something as I'm also leaning towards the same suspicion that the LEAD/LAG do require a lot of sorting area in PGA to be able to churn through a large table like this. I'll see if more RAM can be allocated to PGA.

    There is a partition on column VISIT_DATE_DT (interval range partition) which stores by YEAR and there is a non-unique LOCAL index created on column PAT_ID and some other columns as well.

  • Rocky
    Rocky Member Posts: 196 Bronze Badge

    Hi,

    Adding to @Solomon Yakobson point - you can also think of doing it in parallel through using dbms_parallel_execute package.

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon

    @Rocky

    Thanks, I’ve never used that package before. Ideally, I’d want to break the parallel tasks based on the Partition column VISIT_DATE_DT yearly values for distrubuted parallel processing for further performance gain.

    Is this doable with this package?

    Do you have an example of how to use this package?

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,010 Blue Diamond

    Unless you have a global index that starts with (pat_id, visit_date_dt) and includes all the columns you're going to use to create the result table Oracle will have to sort 156M rows, including all the columns that you want to copy from the source table. (The same basic requirement would be true for a solution based on match_recognize).

    Because your table is range-partitioned by visit_date_dt the dbms_parallel_execute() package will not help as it tries to handle the table in (contiguous) chunks, but your requirement is to pick all the rows for a patient from lots of different parts of the table.

    Do you want the target table partitioned in any way as this may dictate the optimal solution.

    You've got two options for using an increased volume of concurrent resources

    a) execute the statement as a parallel statement - Oracle should be smart enough to do parallel tablescans and distribute the data by range on pat_id so that each PX server process sorts a subset of the data.

    b) create some simple code to do an "insert as select ... where pat_id between {parameter1} and {parameter2}, the start up a set processes that use non-overlapping ranges that cover the entire range of pat_id. Probably best if the execution path starts with a tablescan in each case.

    Following on from (b) - before you do anything else - check the execution plan for the statement you're using at present. It's possible that Oracle is trying to use your index on pat_id to minimise sorting and spending all its time on random I/Os when (possibly) it should just be doing a brute force tablescan.

    Regards

    Jonathan Lewis

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon
    edited Apr 3, 2022 5:21AM

    Thanks @Jonathan Lewis for helping me navigate through this issue. I am a big fan of yours and I'm honored that you have taken notice to this issue and are willing to help. I truly appreciate it, sir! :)

    Just to give you a bit of a context first -- we're working in an Analytics team with Data Scientists wanting to analyze the data (one of them wanted to create a separate table with LEAD/LAG so that they can proceed with their analysis but their query lasted for over 8 hrs and the DBA had to kill it). Since this Data Scientist couldn't proceed any further, I am trying to help to resolve this issue, however, I'm not the DBA, but I do have some privileges that enable me to perform some administration tasks in Oracle.

    The ETL team, which is separate from our team, performs monthly data refreshes to thousands of source tables in Oracle, which then gets exposed to us as VIEWS from a given schema in Oracle, for us to consider and query from these source VIEWS and create our own tables (mostly using CTAS to create either a full or a subset of the data from the source VIEWS) in our own "working" schemas, which then act as "source tables" for us in our relevant "working" schemas.

    Currently, in our analysis "working" schema, the Data Scientist has created a "source table" (based on a subset of data from the source VIEWS), and this "source table" (i.e. TB_SOURCE) is the one that has 156M records. Currently, this table DOES NOT have any table partitions, however, it does contain a number of individually created (separate) indexes based on individual columns, as shown:-

    Notes:

    (a) The column "ULIMORE" in the image above is actually what I wrote in this post as "PAT_ID" but this column does not contain unique values, as some of the patients come back for recurring visits.

    (b) There is no PRIMARY KEY constraint or any other constraints on this table.

    (c) At the moment, this TB_SOURCE table gets destroyed (dropped) and re-created from scratch, every month, by the Data Scientist, to capture new data that was refreshed and made available in source VIEWS. Because of this current practice and the fact that this table is very large, plus the fact that every month, we get approx. 5M new records, we need a new approach.

    (d) For this new approach, my plan is to create a new table, partition it and re-create the LOCAL indexes as the original TB_SOURCE table (as shown above in the image) and keep this new partitioned table as "steady" going forward, without the Data Scientist dropping and recreating it every month. Then once the performance and optimization actions are performed on this new partitioned table to be able to run the SQL query in a reasonable time, I was planning to drop the existing TB_SOURCE table and rename the new partitioned table (TB_SOURCE_PARTITIONED) as the new TB_SOURCE. Then, I was going to have the Data Scientist use a parallel MERGE statement to keep this table populated with newly changed (refreshed) data from the source VIEWS, every month.

    (e) The above columns were chosen to be indexed because these columns participate in one way or the other in the SELECT below, in addition to all the columns from the TB_SOURCE table:-

    --This is taking over 8 hrs!

    CREATE TABLE tb_target 

    PARALLEL 8

    NOLOGGING

    AS

    SELECT /*+ PARALLEL_INDEX(8) PARALLEL(8) */ 

    s.*, --all columns from the original table plus additional ones below

    LAG(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS PREV_ED_UCC_DT,

    LEAD(ED_UCC_DT IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS NEXT_ED_UCC_DT,

    LAG(ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS PREV_ED_UCC_SEQNUM,

    LEAD(ED_UCC_SEQNUM IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS NEXT_ED_UCC_SEQNUM,

    LAG(DAD_ACUTE_DISP_DT IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS PREV_DAD_DISP_AC_DT,

    LEAD(DAD_ACUTE_ADMIT_DT IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_DT,

    LAG(DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS PREV_DAD_DISP_AC_SEQNUM,

    LEAD(DAD_ACUTE_SEQNUM IGNORE NULLS) OVER (PARTITION BY ULIMORE ORDER BY VISIT_DATE_DT) AS NEXT_DAD_DISP_AC_SEQNUM

    FROM tb_source s;


    Because I did not want to perform any maintenance on the existing TB_SOURCE table and potentially disrupt any downstream reporting based off of this table, I chose to create a separate, new, partitioned table, TB_SOURCE_PARTITIONED and populated it with 156M records (using a direct-path load INSERT /*+ APPEND PARALLEL(8) */ INTO TB_SOURCE_PARTITIONED SELECT /*+ PARALLEL_INDEX(8) PARALLEL(8) */ * FROM TB_SOURCE; --This took 17 mins to load 156M records).

    I then proceeded to create the same indexes in this partitioned table as they are in original TB_SOURCE but could not create UNIQUE LOCAL indexes on columns ED_UCC_SEQNUM and on DAD_ACUTE_SEQNUM as it threw an error saying something to the effect that UNIQUE columns must be part of the partition key or something similar. So, I just created NON-UNIQUE indexes instead and all of the indexes created are LOCAL indices on the partitioned table, as shown:-


    I then gathered full table and index stats and then tried running the above CTAS (but reading FROM tb_source_partitioned s;) to see if there was any improvement but sadly, it was taking longer than 4 hrs based on my testing and I had to manually cancel the query. Unfortunately, partitions did not help.


    A few questions:-

    1. Do you think that my use of the PARALLEL_INDEX(8) hint in the above CTAS is forcing Oracle to use indexes instead of the full table scan?

    2. Should I keep the TB_SOURCE_PARTITIONED table or drop it?

    3. Should I keep using the non-partitioned TB_SOURCE and use MERGE to populate it going forward?


    There is no requirement for the TARGET table to be partitioned but the Data Scientist wants to create additional follow-up tables based off of this TB_TARGET to perform subsequent analysis until the final analysis tables are created to be used for final reporting.

    I am looking forward to performing the next steps based on your recommendations and instructions.

    Thank you once again for your help.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,010 Blue Diamond

    If someone hired me to address this question I'd spend a couple of hours talking to your data scientist about what they wanted to do with the data before I made any suggestions about how to add an ordering column to a data set of 156M rows. Try to imagine how many exchanges of notes that's going to turn into if I have have to read your replies and write more questions asking for clarification - I don't have that much time free.

    A couple of points:

    If the tb_source table is recreated each month why not include the analytic sort in its creation - a brute force, one-off create is probably the most efficient way to prepare the data set.

    You said the tb_source was a subset of the data - that suggests there may be data in the table this month that won't be needed next month, or maybe next year. Have you considered the possibility of having to remove data.

    A MERGE command requires uniqueness on the "ON" columns of the new data otherwise you risk a run-time error (about a "stable set of rows") and Oracle rolling back the update so far. If you do manage to deal with the uniqueness how are you going to avoid updating every row in the 156MB or conversely, how are you going to ensure that the rows (and only those rows) that need an update are updated.

    Even if you manage to get a working MERGE to add data, how are you going to set the relevant row_number, lag/lead date columns for the new data populated. And what if a new row for a pat_id has a visit date that is earlier than the highest existing vist date for that pat_id - it probably "shouldn't" happen, but probably will happen.

    If you don't know what the end-user is going to do with the data and you don't really understand how partitioning works then you shouldn't be using partitioning (at least not until you know why it might make the end-user's queries run significantly more efficiently).



    I have a couple of ideas for design strategies that MIGHT be appropriate for the end-user's requirements, but there's no point in saying anything about any of them without knowing what the end-user hopes to do with the data. The strongest indication we have to date is that the end-user is interested in follow patient histories - which suggests that for efficiency purposes you need to think about putting all the data for a given patient into the smallest number of blocks possible: i.e. think about (hash) partitioning by pat_id or creating an index organized table based around finding a unique index starting with pat_id (or cluster by pat_id, but that's difficult to do well).


    Regards

    Jonathan Lewis

  • Sam_P
    Sam_P Member Posts: 155 Blue Ribbon
    edited Apr 3, 2022 5:18PM

    Points noted @Jonathan Lewis .

    I'll spend more time tomorrow with the end-user to understand how this table is being used and queried downstream.

    About TB_SOURCE being a subset of the source data, I looked at his SQL scripts and looks like he's running CTAS from 6 different source VIEWS and creates their corresponding 6 tables as RAW tables. These tables contain ALL the records from their corresponding source VIEWS, not a subset. I stand corrected.

    He then creates a ton of intermediate tables to perform various transformations from the RAW tables (like CASTs, handling NULLS, joins, adding more columns to standardize, etc.) in his scripts. Once the transformations are done with their own standardized tables with similar column names and data-types, he then consolidates them (UNION) into a single table, which is acting as this TB_SOURCE for this project purposes.

    However, it doesn't stop here, as part of his project and the requirements are not always provided by his clients in advance but rather, the client (a medical group) may pose different questions as the project progresses and more insights are found by the Data Scientist. As a result, he creates more intermediate tables downstream (reading from TB_SOURCE) in order to answer many different types of questions/scenarios posed by his client. So, a table is most likely to be dependent on the prior table in a chain of many project-related tables to get to the final few tables that are then read by the reporting/dashboarding tool to bring it all together visually.

    Just an FYI - as of yesterday evening at 9PM, I ran the CTAS (against TB_SOURCE directly) without the PARALLEL_INDEX hint (just kept the PARALLEL hint) and checked the execution plan and the optimizer chose to perform a FULL TABLE SCAN. I kept my laptop running all night and when I checked as of this morning at 9AM, the CTAS query was still running! I had to manually cancel it again.

    Also, I have dropped the TB_SOURCE_PARTITIONED until I get more information from the end-user. I'll read up on HASH partitioning and on IOTs. Thank you for your suggestions :)

    I'll provide more details soon.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 10,010 Blue Diamond

    A quick bit of arithmetic.

    If you create the table to hold only one of the unique numbers, the 4 dates (2 x next/prev) and the 4 sequence number (2 x next prev) you will have done a sort carrying: unique number, ulimore, ed_ucc_dt, ed_ucc_seqnum, dad_acute_disp_dt, dad_acute_seqnum, which will be in the order of 45 bytes per entry, for a total of about 7GB of data - so when you try including the whole row (and you haven't said anything about its length) it's not surprising that the sort will take a long time.

    If it's necessary to stick with the current sequence of event you could consider creating a table from just the columns I've listed above, then creating the final table by forcing a parallel hash join between this 7GB data set and the tb_source dataset, maybe partitioning the result by HASH 16 or 32 ways on the ulimore (and running with the same degree of parallelism as there are partitions - any maybe creating the intermediate 7GB table with the same partitioning clause to help optimize the final join and create.

    Regards

    Jonathan Lewis