Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

What is the best way to delete millions of row from large table?

Sevdimali IsayevJul 23 2018 — edited Jul 24 2018

Hi everyone,

I have large table it has more than 5 million rows.

Table not partitioned.

on average 50-60 rows inserted in every second to my table.

it is 12 column (one blob and one clob) and three indexes on my table.

\

what is the best way to delete rows from this table?

If any other information needed please let me know.

This post has been answered by BrunoVroman on Jul 23 2018
Jump to Answer

Comments

BEDE

The main concern should be: are there any foreign keys referring the table from which you desire to delete? If the answer is yes, then it's going to be really rough on the database. Otherwise a simple delete from x_table where ... should do.

If there are 5M rows to delete, how many rows will there still remain in the table? If you delete more than some, say 25% of the rows in table_x, maybe it's beter to do something like below:

create table table_x_del as select * from table_x where ...; --- the where condition should include all the rows you will keep

drop table_x;

rename table_x_del to table_x;

Of course, this will not work on a system where table_x is to be accessed by users.

And there is one more thing that is important: do you have to delete all those records in one single transaction? If the answer is yes then a simple delete from tbale_x where... is the only solution.

Otherwise there may be variations like using dbms_parallel_execute.

Cookiemonster76

If you go the CTAS route remember it doesn't copy constraints or triggers.

BEDE

Right! Those are to be created afterwards. And the same goes for indexes.

Just to simplify, it would be better:

create table table_x_del as select * from table_x where ...; --- filter only what the rows you want to keep

truncate table table_x;

insert into table_x select * from table_x_del;

drop table table_x_del;

Paulzip

If you are deleting a large percentage of the data, I'd probably consider CTAS approach, but you'll have to migrate constraints, triggers, indexes etc.

If you're not keen on this approach (which in some situations is perfectly justifiable) I would go the DBMS_PARALLEL_EXECUTE route.  It's precisely the sort of situation it was designed for,

Sevdimali Isayev

Hi, thansk for reply.

Not able to choose ctas method, because i have many applications which insert this table, if I will drop this table they got an exceptions.. which is forbidden.

BrunoVroman

Hello,

about "CTAS": and privileges.  Be fully aware that the CTAS way means some application downtime (no new data should be inserted/modified in the table before the CTAS starts)

Alternative: expdp with a filter on rows to keep, then drop table (if expdp is successful) and impdp it back. (obviously this also  requires "application downtime"; the advantage of the alternative is that this will take care of constraints, indexes, privileges, triggers)

Bes tregards,

Bruno Vroman.

Edited: clarification after Paulzip's remark.

Paulzip

Bruno.Vroman wrote:

Hello,

about "CTAS": and privileges. Be fully aware that the CTAS way means some application downtime (no new data should be inserted/modified in the table before the CTAS starts)

Alternative: expdp with a filter on rows to keep, then drop table (if expdp is successful) and impdp it back.

Bes tregards,

Bruno Vroman.

Not going to work in a live system that cannot be offline.

Paulzip

Example of DBMS_PARALLEL_EXECUTE...

declare

  l_sql_stmt VARCHAR2(1000);

  l_try NUMBER := 0;

  l_status NUMBER;

begin

  DBMS_PARALLEL_EXECUTE.CREATE_TASK ('MYTASK');  -- Create the TASK

  DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('MYTASK', 'MY_SCHEMA', 'MY_TABLE', true, 100);  -- Chunk the table by ROWID

  -- Execute the DML in parallel

  l_sql_stmt := 'delete from MY_TABLE WHERE rowid BETWEEN :start_id AND :end_id'; -- Add whatever other predicates you are filtering on in the where clause here

  DBMS_PARALLEL_EXECUTE.RUN_TASK('MYTASK', l_sql_stmt, DBMS_SQL.NATIVE, parallel_level => 10);

  L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('MYTASK');

  while (l_try < 2 and L_status != DBMS_PARALLEL_EXECUTE.FINISHED)  -- If there is an error, RESUME it for at most 2 times.

  loop

    L_try := l_try + 1;

    DBMS_PARALLEL_EXECUTE.RESUME_TASK('MYTASK');

    L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('MYTASK');

  end loop;

  DBMS_PARALLEL_EXECUTE.DROP_TASK('MYTASK'); -- Done with processing; drop the task

  commit;

end;

/

Joerg.Sobottka

5 Million isn't a large table. If 50-60 rows are added per second, just delete, what you don't need anymore. In my opinion if the LOBs are not very big, it even does not make to shrink the table as the space will be allocated again very soon.

BrunoVroman
Answer

Oh, by the way:

more than 5 million rows.  on average 50-60 rows inserted in every second

Hmm, 55 rows / second * 86400 seconds / day = 4752000 rows per day... Does the table contains data of only one day (+/-)? And maybe (depending on the rows that you have to delete and the fact that you have to repeat the operation often or not -maybe daily?): The table seems a good candidate for partitioning, maybe you can change the "I have to delete millions of rows" into "I have to drop (a) partition(s)"

Best regards,

Bruno Vroman.

Marked as Answer by Sevdimali Isayev · Sep 27 2020
BEDE

Then, you may have a problem if really many transactions have to take place on the same table. Then, in order to avoid a possible deadlock you may do something like:

begin

   delete from x_table where ... and rownum<=1000;

   exit when sql%rowcount=0;

   commit;

end;

That will last quite long, but it will certainly do the job.

Sevdimali Isayev

Thanks for sugesstion. I think we wil choose partitioning the table.

Cookiemonster76

A process that only deletes from one table shouldn't ever cause a deadlock.

BEDE

But just how do you partition the table if several sessions need to access that? You must have exclusive use of the table for any DDL. So, for partitioning the table pick the right time when you may do that.

Sevdimali Isayev

I think they will accept, one downtime for a life for this situation.

Joerg.Sobottka

You can partition the table using DBMS_REDEFINITION. Then you don't have any downtime.

Sven W.

Cookiemonster76 wrote:

A process that only deletes from one table shouldn't ever cause a deadlock.

It shouldn't, but it can happen. Especially when the delete runs in parallel.

SeánMacGC

Sven W. wrote:

Cookiemonster76 wrote:

A process that only deletes from one table shouldn't ever cause a deadlock.

It shouldn't, but it can happen. Especially when the delete runs in parallel.

And particularly if there are bitmap indexes involved.

Cookiemonster76

If they've got bitmaps on a table that gets 50-60 new rows per second then they're likely in for a world of pain.

Mark D Powell

Sevdimali, "best" is a relative term.  What is best for me may no be best for you.  The best solution also depends on if this is a one-time operation, or the first DELETE of all the old data and a regular DELETE process will take its place, etc....  For initial purges where the quantity of data is larger than what will normally be purged and one-time purges I do not think speed of the DELETE should be a primary consideration.  Instead what really matters is that the purge process not impact the system.

- -

The following is a simple but effective approach where you want to limit the impact.  Use PL/SQL, write a cursor to Select the PK of the target rows, Loop through the cursor, delete by PK, commit every N rows to limit the UNDO usage.  N is determined by the amount of UNDO you determine to allow this process to consume.  Consider putting a sleep between commits or every X number of commits if you need to allow the deleted data to age out of UNDO.

- -

IMHO -- Mark D Powell --

unknown-7404

what is the best way to delete rows from this table?

That depends - how many rows are being deleted? how many rows are in the table now?

Simple way 'delete * from myTable where ....'

AndrewSayer

Sevdimali Isayev wrote:

I think they will accept, one downtime for a life for this situation.

Whats your full version number? If you’re on at least 12.2.0.1 then you can partition tables online using  an alter table statement. Below that then you’d want to use dbms_redefinition.

BTW why do you need to delete the same number of rows that you insert per day in one go? You could just have a job that runs each hour that deletes rows that need to be deleted (assuming the deletion is based on the age of the data and you can handle the granularity of an hour). Alternatively, what even is the problem with just a straight delete statement? It shouldn’t matter how long it takes - it’s only touching data you apparently don’t care about so nothing should be effected, right?

Mark D Powell

All, just a reminder that partitioning is an extra-cost option that most shops do not license.

- -

HTH -- Mark D Powell --

1 - 23
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Aug 21 2018
Added on Jul 23 2018
23 comments
656 views