Forum Stats

  • 3,768,281 Users
  • 2,252,770 Discussions
  • 7,874,513 Comments

Discussions

Purge Large Dataset

user12840908
user12840908 Member Posts: 35
edited Apr 11, 2017 1:50PM in General Database Discussions

Hi,

We have a table with millions of records (approx 120-125 M) and we have to purge few old records which will be close to 35 M (Table is not date partitioned)

It's a live production system. Ofcource we can't simply issue single delete statement and delete all 35 M records in one go.

We plan to create a pl/sql unit to delete records in Batch (lets says of size 100k)

Now the question is, out of two below available methods which will be best from performance point of view.

Background:

Tables:

PURGE_TABLE (table to be purged, primary key REF_NO)

DELETE_ID  - Column1: REF_NO (list of REF_NO to be deleted), Column2: BATCH_NO

Option 1:

For I IN 1.. TOTAL_BATCH

LOOP

DELETE FROM PURGE_TABLE WHERE  REF_NO IN (SELECT REF_NO FROM DELETE_ID WHERE BATCH_NO= I)

COMMIT;

END LOOP;

Option 2:

For I IN 1.. TOTAL_BATCH

SELECT REF_NO INTO TYPE_VAR FROM DELETE_ID WHERE BATCH_NO= I;

FORALL J IN TYPE_VAR.FIRST..TYPE_VAR.LAST

DELETE FROM PURGE_TABLE WHERE  REF_NO = TYPE_VAR(J);

END LOOP;

What will be the best approach?

Regards!

jgarryuser12840908

Best Answer

  • jgarry
    jgarry Member Posts: 13,842
    edited Apr 7, 2017 12:37PM Accepted Answer

    It is reasonable to worry about using up undo and subjecting other sessions to ora-1555 risk.  Personally, I tend to keep a large undo just to avoid this worry in the rare situations where it applies.  But I can understand others not having that luxury.  I would recommend to the OP to search for ora-1555 on asktom, there are several considerations.

    CTAS with some down time might be the fastest, but if you really can't do that, there's nothing wrong with dribs and drabs, 100K at a time.  You still may be better off with straight sql for each little drib, rather than PL, assuming you can get the optimizer to cooperate, which your testing should tell.  You need to test your PL anyways for the same reason, plus the normal verification that it works.  Do you not have a test environment?

    Your option 2 would likely be slower and use the same or more resources as a straight 35M delete, if I'm reading it correctly.  Not sure whether option 1 might wrongly decide predicate pushing, that's where a plan would be more clear.

«1

Answers

  • EdStevens
    EdStevens Member Posts: 28,519 Gold Crown
    edited Apr 7, 2017 7:50AM
    user12840908 wrote:Hi,We have a table with millions of records (approx 120-125 M) and we have to purge few old records which will be close to 35 M (Table is not date partitioned)It's a live production system. Ofcource we can't simply issue single delete statement and delete all 35 M records in one go.

    Of course?

    Why can't you delete all 35 million in one go?

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Apr 7, 2017 7:52AM
    user12840908 wrote:Hi,We have a table with millions of records (approx 120-125 M) and we have to purge few old records which will be close to 35 M (Table is not date partitioned)It's a live production system. Ofcource we can't simply issue single delete statement and delete all 35 M records in one go...

    Why not?

    You'd only be locking rows that you want to delete, if other sessions are using them then why would you want to delete them?

    -edit

    I see Ed was having the same thought

  • user12840908
    user12840908 Member Posts: 35
    edited Apr 7, 2017 8:01AM

    Based on previous exp, it might not come out for hours. End up generating lots of undo and redo might impact production, other services.

    Regards!

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Apr 7, 2017 8:12AM
    user12840908 wrote:Based on previous exp, it might not come out for hours. End up generating lots of undo and redo might impact production, other services.Regards!

    You'll almost certainly generate more undo and more redo if you batch it.

    Batching will allow the undo you write to be reuseable sooner, but is the amount generated really a problem? Have you checked how much it is?

    If you were to batch it, batch it using dbms_parallel_execute using one session https://oracle-base.com/articles/11g/dbms_parallel_execute_11gR2  has some sample code

    user12840908
  • DeepC
    DeepC Member Posts: 158 Blue Ribbon
    edited Apr 7, 2017 9:16AM

    CTAS NOLOGGING PARALLEL will be the best approach here --- can understand that the new table will have 90 million rows, but, still better than DELETE.

    user12840908
  • user12840908
    user12840908 Member Posts: 35
    edited Apr 7, 2017 10:37AM

    Thanks, yes amount will be same, but still cannot risk database by issueing single statement deleting 35M rows

  • EdStevens
    EdStevens Member Posts: 28,519 Gold Crown
    edited Apr 7, 2017 10:58AM
    user12840908 wrote:Thanks, yes amount will be same, but still cannot risk database by issueing single statement deleting 35M rows

    How does that risk the database? 

    user12840908
  • John Thorton
    John Thorton Member Posts: 14,493 Silver Crown
    edited Apr 7, 2017 11:00AM
    user12840908 wrote:Thanks, yes amount will be same, but still cannot risk database by issueing single statement deleting 35M rows

    Other sessions won't know the DELETE has been issued until after COMMIT has been issued.

    Nothing is "at risk" by issuing single DELETE statement.

    user12840908
  • ddf_dba
    ddf_dba Member Posts: 1,398 Bronze Trophy
    edited Apr 7, 2017 11:04AM

    How does that 'solve' the 'problem'?

    David Fitzjarrell

    user12840908
  • jgarry
    jgarry Member Posts: 13,842
    edited Apr 7, 2017 12:37PM Accepted Answer

    It is reasonable to worry about using up undo and subjecting other sessions to ora-1555 risk.  Personally, I tend to keep a large undo just to avoid this worry in the rare situations where it applies.  But I can understand others not having that luxury.  I would recommend to the OP to search for ora-1555 on asktom, there are several considerations.

    CTAS with some down time might be the fastest, but if you really can't do that, there's nothing wrong with dribs and drabs, 100K at a time.  You still may be better off with straight sql for each little drib, rather than PL, assuming you can get the optimizer to cooperate, which your testing should tell.  You need to test your PL anyways for the same reason, plus the normal verification that it works.  Do you not have a test environment?

    Your option 2 would likely be slower and use the same or more resources as a straight 35M delete, if I'm reading it correctly.  Not sure whether option 1 might wrongly decide predicate pushing, that's where a plan would be more clear.

This discussion has been closed.