Forum Stats

  • 3,759,172 Users
  • 2,251,509 Discussions
  • 7,870,523 Comments

Discussions

Fastest way to batch delete data from a table with 1 billion rows

OraC
OraC Member Posts: 75 Blue Ribbon
edited Jan 30, 2021 8:41AM in SQL & PL/SQL

Hi,

I need some help deleting batches from a really large online transactions table (up to 1 billion records). I hope to delete around 9 million records daily. 9 million more are being added daily. I have an off-peak time window when customer usage is limited so I can try to run this optimally, but I'm also conscious of not impacting any potential customers too much by specify too high a batch size below(10,000). Its Oracle 12.2 Standard Edition so unfortunately partitioning is not an option. I've come up with the following but its just not deleting fast enough. The initial select seems to be ok, its more about my loop. Is there a more efficient way of batching this?

DECLARE 

cursor cur is 

select /*+ index_ffs(a,P_ITD) parallel_index(a,P_ITD,4) */ C_ID from ITD a WHERE CREATED < '27-NOV-20 12.00.00.000000 AM';

TYPE CII_TYPE IS TABLE OF NUMBER; 

CII_TYPE_TBL CII_TYPE; 

BEGIN 

OPEN CUR; 

LOOP 

FETCH CUR BULK COLLECT INTO CII_TYPE_TBL LIMIT 10000; 

FORALL i IN 1..CII_TYPE_TBL.COUNT 

DELETE FROM ITD WHERE C_ID=CII_TYPE_TBL(i); 

COMMIT; 

EXIT WHEN CUR%NOTFOUND; 

END LOOP; 

CLOSE CUR; 

END; 

/

P_ITD is the primary key constraint on the ITD table on C_ID

CREATED_ON is also indexed separately.

Thanks

S567
«13

Answers

  • RogerT
    RogerT Member Posts: 1,853 Gold Trophy

    Why are you reading 10000 rows into a collection just to delete those rows afterwards from the table you have read from? If you want to delete in chunks this could also be done wit a delete statement only (looping as long as the delete statement deletes rows).

    Why comparing CREATED (hopefully a DATE column) with a string value?

    hth

  • Billy Verreynne
    Billy Verreynne Software Engineer Member Posts: 28,571 Red Diamond

    The SQL runs as multiple parallel processes.

    The PL/SQL process runs as a SINGLE serialised process. Bulk collect does not make it auto magically run in parallel.

    So it is obvious that the PL/SQL process is the bottleneck. Bulk collect IS SLOW row-by-row processing.

    Eliminate PL/SQL serialisation. Maximise SQL parallelisation. Use a single SQL parallel DML to find and delete the relevant data.

  • OraC
    OraC Member Posts: 75 Blue Ribbon

    Thanks RogerT and Billy! Its much appreciated.

    I have SECUREFILES in my table so I don't think it supports parallel delete, right?

    From some tests I've seen that once the list of C_IDs are compiled initially then the delete will run much faster because it can just use the P_ITD primary key in the execution plan which seems to be much faster than using the CREATED timestamp index scan.

    Taking these 2 things into account, what would you recommend?

    Any sql or plsql examples would help greatly. I'm not at all experienced in writing these things and I'm a bit stuck to fix this.

    Again, your help is very much appreciated.

    Thanks,

    O.

  • Billy Verreynne
    Billy Verreynne Software Engineer Member Posts: 28,571 Red Diamond

    See support Master Note: Overview of Oracle Large Objects (BasicFiles LOBs and SecureFiles LOBs) (Doc ID 1490228.1)

    SecureFiles LOBs 12c Enhancements

    Enable PDML Operations on SecureFiles

    Limitations have been removed in this release with regard to Parallel DML (PDML) support for SecureFiles LOBs. This feature allows SecureFiles to leverage the performance and scalability benefits of the PDML features of Oracle Database.

  • cormaco
    cormaco Member Posts: 1,685 Bronze Crown

    What about this:

    delete /*+ parallel */ from ITD
    where created < timestamp '2020-11-27 00:00:00'
    
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,756 Gold Crown

    @Billy Verreynne


    "Limitations have been removed ... " but it doesn't say which ones. I've just done a simple test with parallel DML enabled and this is what the plan says:

    Note
    -----
      - Degree of Parallelism is 2 because of table property
      - PDML disabled because single fragment or non partitioned table used
    

    And it was teling the truth about run-time. The pass to identify rows ran parallel 2, then the rowids were passed to the co-ordinator for serial deletion.


    Regards

    Jonathan Lewis

    OraC
  • Billy Verreynne
    Billy Verreynne Software Engineer Member Posts: 28,571 Red Diamond

    Unfortunately not the first and only misleading support note it seems... <sigh>

  • OraC
    OraC Member Posts: 75 Blue Ribbon

    Thanks all!

    Yeah, unfortunately in this link https://docs.oracle.com/en/database/oracle/oracle-database/12.2/vldbg/types-parallelism.html it states:

    "For non-partitioned tables with LOB columns, parallel INSERT operations are supported provided that the LOB columns are declared as SecureFiles LOBs. Parallel UPDATEDELETE, and MERGE operations on such tables are not supported."

    Is the method I have shown above for looping using the C_IDs really inefficient then considering when I try and delete in batches less than CREATED I find it often can choose an alternative execution plan which looks to be inefficient compared to that for using the primary key.

    @RogerT Thanks for your suggestion, is this what you mean?

    var deleteolderthan TIMESTAMP(6);

    var batch_size number;

    exec :deleteolderthan := '27-NOV-20 12.00.00.000000 AM';

    exec :batch_size := 10000;

    begin

     LOOP

      delete from ITD where CREATED < :deleteolderthan and rownum <= :batch_size;

      EXIT WHEN SQL%ROWCOUNT=0;

      COMMIT;

     END LOOP;

    end;

    /

    1) Will this be more efficient and are there any additional improvements I can make to this to make it run even faster?

    2) If the delete takes up to 4-5 hours to run and the table size has shrunk by 9 million records potentially affecting other query performance. Will I achieve any benefit from updating statistics on the table or key indexes mid execution?

    Any other suggestions very welcome.

    Thanks again,

    O

  • RogerT
    RogerT Member Posts: 1,853 Gold Trophy

    You should not compare

    a) delete using CREATED

    with

    b) delete using C_ID

    what you should compare is

    a) delete using CREATED

    with

    b) fill a collection using CREATED PLUS delete using C_ID


    so?

  • OraC
    OraC Member Posts: 75 Blue Ribbon

    Thanks @RogerT

    What you've suggested is what I am trying to do now but I'm still trying to find out the optimal way to do both for comparison purposes. Is a cursor or a loop or some other method more efficient typically for something like this?