Forum Stats

  • 3,741,218 Users
  • 2,248,393 Discussions
  • 7,861,681 Comments

Discussions

What is the best way to delete millions of row from large table?

Sevdimali Isayev
Sevdimali Isayev Member Posts: 123 Red Ribbon
edited Jul 24, 2018 11:47AM in SQL & PL/SQL

Hi everyone,

I have large table it has more than 5 million rows.

Table not partitioned.

on average 50-60 rows inserted in every second to my table.

it is 12 column (one blob and one clob) and three indexes on my table.

\

what is the best way to delete rows from this table?

If any other information needed please let me know.

Tagged:
Sevdimali IsayevSven W.BrunoVromanSeánMacGCJonathan Lewis

Best Answer

  • BrunoVroman
    BrunoVroman Member Posts: 1,848 Silver Crown
    edited Jul 23, 2018 8:47AM Accepted Answer

    Oh, by the way:

    more than 5 million rows.  on average 50-60 rows inserted in every second

    Hmm, 55 rows / second * 86400 seconds / day = 4752000 rows per day... Does the table contains data of only one day (+/-)? And maybe (depending on the rows that you have to delete and the fact that you have to repeat the operation often or not -maybe daily?): The table seems a good candidate for partitioning, maybe you can change the "I have to delete millions of rows" into "I have to drop (a) partition(s)"

    Best regards,

    Bruno Vroman.

    Sevdimali IsayevSevdimali IsayevSven W.Jonathan Lewis

Answers

  • BEDE
    BEDE Oracle Developer Member Posts: 2,266 Gold Trophy
    edited Jul 23, 2018 8:17AM

    The main concern should be: are there any foreign keys referring the table from which you desire to delete? If the answer is yes, then it's going to be really rough on the database. Otherwise a simple delete from x_table where ... should do.

    If there are 5M rows to delete, how many rows will there still remain in the table? If you delete more than some, say 25% of the rows in table_x, maybe it's beter to do something like below:

    create table table_x_del as select * from table_x where ...; --- the where condition should include all the rows you will keep

    drop table_x;

    rename table_x_del to table_x;

    Of course, this will not work on a system where table_x is to be accessed by users.

    And there is one more thing that is important: do you have to delete all those records in one single transaction? If the answer is yes then a simple delete from tbale_x where... is the only solution.

    Otherwise there may be variations like using dbms_parallel_execute.

  • Cookiemonster76
    Cookiemonster76 Member Posts: 3,410
    edited Jul 23, 2018 8:16AM

    If you go the CTAS route remember it doesn't copy constraints or triggers.

  • BEDE
    BEDE Oracle Developer Member Posts: 2,266 Gold Trophy
    edited Jul 23, 2018 8:24AM

    Right! Those are to be created afterwards. And the same goes for indexes.

    Just to simplify, it would be better:

    create table table_x_del as select * from table_x where ...; --- filter only what the rows you want to keep

    truncate table table_x;

    insert into table_x select * from table_x_del;

    drop table table_x_del;

  • Paulzip
    Paulzip Member Posts: 8,370 Blue Diamond
    edited Jul 23, 2018 8:20AM

    If you are deleting a large percentage of the data, I'd probably consider CTAS approach, but you'll have to migrate constraints, triggers, indexes etc.

    If you're not keen on this approach (which in some situations is perfectly justifiable) I would go the DBMS_PARALLEL_EXECUTE route.  It's precisely the sort of situation it was designed for,

  • Sevdimali Isayev
    Sevdimali Isayev Member Posts: 123 Red Ribbon
    edited Jul 23, 2018 8:24AM

    Hi, thansk for reply.

    Not able to choose ctas method, because i have many applications which insert this table, if I will drop this table they got an exceptions.. which is forbidden.

  • BrunoVroman
    BrunoVroman Member Posts: 1,848 Silver Crown
    edited Jul 23, 2018 8:39AM

    Hello,

    about "CTAS": and privileges.  Be fully aware that the CTAS way means some application downtime (no new data should be inserted/modified in the table before the CTAS starts)

    Alternative: expdp with a filter on rows to keep, then drop table (if expdp is successful) and impdp it back. (obviously this also  requires "application downtime"; the advantage of the alternative is that this will take care of constraints, indexes, privileges, triggers)

    Bes tregards,

    Bruno Vroman.

    Edited: clarification after Paulzip's remark.

  • Paulzip
    Paulzip Member Posts: 8,370 Blue Diamond
    edited Jul 23, 2018 8:26AM
    Bruno.Vroman wrote:Hello,about "CTAS": and privileges. Be fully aware that the CTAS way means some application downtime (no new data should be inserted/modified in the table before the CTAS starts)Alternative: expdp with a filter on rows to keep, then drop table (if expdp is successful) and impdp it back.Bes tregards,Bruno Vroman.

    Not going to work in a live system that cannot be offline.

  • Paulzip
    Paulzip Member Posts: 8,370 Blue Diamond
    edited Jul 23, 2018 8:28AM

    Example of DBMS_PARALLEL_EXECUTE...

    declare

      l_sql_stmt VARCHAR2(1000);

      l_try NUMBER := 0;

      l_status NUMBER;

    begin

      DBMS_PARALLEL_EXECUTE.CREATE_TASK ('MYTASK');  -- Create the TASK

      DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('MYTASK', 'MY_SCHEMA', 'MY_TABLE', true, 100);  -- Chunk the table by ROWID

      -- Execute the DML in parallel

      l_sql_stmt := 'delete from MY_TABLE WHERE rowid BETWEEN :start_id AND :end_id'; -- Add whatever other predicates you are filtering on in the where clause here

      DBMS_PARALLEL_EXECUTE.RUN_TASK('MYTASK', l_sql_stmt, DBMS_SQL.NATIVE, parallel_level => 10);

      L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('MYTASK');

      while (l_try < 2 and L_status != DBMS_PARALLEL_EXECUTE.FINISHED)  -- If there is an error, RESUME it for at most 2 times.

      loop

        L_try := l_try + 1;

        DBMS_PARALLEL_EXECUTE.RESUME_TASK('MYTASK');

        L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('MYTASK');

      end loop;

      DBMS_PARALLEL_EXECUTE.DROP_TASK('MYTASK'); -- Done with processing; drop the task

      commit;

    end;

    /

  • Joerg.Sobottka
    Joerg.Sobottka Senior Consultant and Oracle Ace Member Posts: 595 Bronze Trophy
    edited Jul 23, 2018 8:31AM

    5 Million isn't a large table. If 50-60 rows are added per second, just delete, what you don't need anymore. In my opinion if the LOBs are not very big, it even does not make to shrink the table as the space will be allocated again very soon.

  • BrunoVroman
    BrunoVroman Member Posts: 1,848 Silver Crown
    edited Jul 23, 2018 8:47AM Accepted Answer

    Oh, by the way:

    more than 5 million rows.  on average 50-60 rows inserted in every second

    Hmm, 55 rows / second * 86400 seconds / day = 4752000 rows per day... Does the table contains data of only one day (+/-)? And maybe (depending on the rows that you have to delete and the fact that you have to repeat the operation often or not -maybe daily?): The table seems a good candidate for partitioning, maybe you can change the "I have to delete millions of rows" into "I have to drop (a) partition(s)"

    Best regards,

    Bruno Vroman.

    Sevdimali IsayevSevdimali IsayevSven W.Jonathan Lewis
  • BEDE
    BEDE Oracle Developer Member Posts: 2,266 Gold Trophy
    edited Jul 23, 2018 8:49AM

    Then, you may have a problem if really many transactions have to take place on the same table. Then, in order to avoid a possible deadlock you may do something like:

    begin

       delete from x_table where ... and rownum<=1000;

       exit when sql%rowcount=0;

       commit;

    end;

    That will last quite long, but it will certainly do the job.

  • Sevdimali Isayev
    Sevdimali Isayev Member Posts: 123 Red Ribbon
    edited Jul 23, 2018 8:57AM

    Thanks for sugesstion. I think we wil choose partitioning the table.

    BrunoVroman
  • Cookiemonster76
    Cookiemonster76 Member Posts: 3,410
    edited Jul 23, 2018 9:02AM

    A process that only deletes from one table shouldn't ever cause a deadlock.

  • BEDE
    BEDE Oracle Developer Member Posts: 2,266 Gold Trophy
    edited Jul 23, 2018 9:03AM

    But just how do you partition the table if several sessions need to access that? You must have exclusive use of the table for any DDL. So, for partitioning the table pick the right time when you may do that.

  • Sevdimali Isayev
    Sevdimali Isayev Member Posts: 123 Red Ribbon
    edited Jul 23, 2018 9:06AM

    I think they will accept, one downtime for a life for this situation.

  • Joerg.Sobottka
    Joerg.Sobottka Senior Consultant and Oracle Ace Member Posts: 595 Bronze Trophy
    edited Jul 23, 2018 9:14AM

    You can partition the table using DBMS_REDEFINITION. Then you don't have any downtime.

    Sevdimali IsayevBrunoVroman
  • Sven W.
    Sven W. Member Posts: 10,511 Gold Crown
    edited Jul 23, 2018 9:18AM
    Cookiemonster76 wrote:A process that only deletes from one table shouldn't ever cause a deadlock.

    It shouldn't, but it can happen. Especially when the delete runs in parallel.

  • SeánMacGC
    SeánMacGC Member Posts: 2,914 Gold Trophy
    edited Jul 23, 2018 10:48AM
    Sven W. wrote:Cookiemonster76 wrote:A process that only deletes from one table shouldn't ever cause a deadlock.It shouldn't, but it can happen. Especially when the delete runs in parallel.

    And particularly if there are bitmap indexes involved.

  • Cookiemonster76
    Cookiemonster76 Member Posts: 3,410
    edited Jul 23, 2018 11:07AM

    If they've got bitmaps on a table that gets 50-60 new rows per second then they're likely in for a world of pain.

    SeánMacGC
  • Mark D Powell
    Mark D Powell Member Posts: 5,914 Blue Diamond
    edited Jul 23, 2018 11:44AM

    Sevdimali, "best" is a relative term.  What is best for me may no be best for you.  The best solution also depends on if this is a one-time operation, or the first DELETE of all the old data and a regular DELETE process will take its place, etc....  For initial purges where the quantity of data is larger than what will normally be purged and one-time purges I do not think speed of the DELETE should be a primary consideration.  Instead what really matters is that the purge process not impact the system.

    - -

    The following is a simple but effective approach where you want to limit the impact.  Use PL/SQL, write a cursor to Select the PK of the target rows, Loop through the cursor, delete by PK, commit every N rows to limit the UNDO usage.  N is determined by the amount of UNDO you determine to allow this process to consume.  Consider putting a sleep between commits or every X number of commits if you need to allow the deleted data to age out of UNDO.

    - -

    IMHO -- Mark D Powell --

    Sevdimali Isayev
  • Unknown
    edited Jul 23, 2018 1:15PM
    what is the best way to delete rows from this table?

    That depends - how many rows are being deleted? how many rows are in the table now?

    Simple way 'delete * from myTable where ....'

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Jul 23, 2018 4:50PM
    Sevdimali Isayev wrote:I think they will accept, one downtime for a life for this situation. 

    Whats your full version number? If you’re on at least 12.2.0.1 then you can partition tables online using  an alter table statement. Below that then you’d want to use dbms_redefinition.

    BTW why do you need to delete the same number of rows that you insert per day in one go? You could just have a job that runs each hour that deletes rows that need to be deleted (assuming the deletion is based on the age of the data and you can handle the granularity of an hour). Alternatively, what even is the problem with just a straight delete statement? It shouldn’t matter how long it takes - it’s only touching data you apparently don’t care about so nothing should be effected, right?

  • Mark D Powell
    Mark D Powell Member Posts: 5,914 Blue Diamond
    edited Jul 24, 2018 11:47AM

    All, just a reminder that partitioning is an extra-cost option that most shops do not license.

    - -

    HTH -- Mark D Powell --

This discussion has been closed.