Forum Stats

  • 3,853,259 Users
  • 2,264,199 Discussions
  • 7,905,296 Comments

Discussions

Bulk Deletes from a Large Table

Paul V.
Paul V. Member Posts: 42 Blue Ribbon
edited Jul 27, 2015 4:25PM in General Database Discussions

DB: 11.2.0.3

OS: Linux, kernal 2.6.18

I have the following three column table,

Col1  not null number

Col2  not null number

Col3  not null number

Col1 is the primary key.  There is a unique index on Col2/Col3 combined.  There is a non-unique index on Col3.

Table has around 8 billion records and it is not partitioned.

Once a month, I have to delete approximately 200,000,000 records.  I know this is a generic question, but I would appreciate any suggestions on the best way to accomplish this.  I have been deleting in chunks of about 5,000 records at a time but that is taking longer than I would like.  Am I missing some obvious way to solve this problem.

For background, when this table/application was designed about eight years, we were never going to delete data from it.  Now there are some new "policies" concerning some of the data that resides here causing this new delete requirement.

Jonathan LewisJohn Stegeman
«13

Answers

  • Unknown
    edited May 14, 2015 10:39AM
    Paul V. wrote:
    
    DB: 11.2.0.3
    OS: Linux, kernal 2.6.18
    
    I have the following three column table,
    
    Col1  not null number
    Col2  not null number
    Col3  not null number
    
    Col1 is the primary key.  There is a unique index on Col2/Col3 combined.  There is a non-unique index on Col3.
    
    Table has around 8 billion records and it is not partitioned.
    
    Once a month, I have to delete approximately 200,000,000 records.  I know this is a generic question, but I would appreciate any suggestions on the best way to accomplish this.  I have been deleting in chunks of about 5,000 records at a time but that is taking longer than I would like.  Am I missing some obvious way to solve this problem.
    
    For background, when this table/application was designed about eight years, we were never going to delete data from it.  Now there are some new "policies" concerning some of the data that resides here causing this new delete requirement.
    

    How is the decision made as to which rows get removed?

    > I have been deleting in chunks of about 5,000 records at a time

    Multiple COMMIT increase elapsed time.

  • Paul V.
    Paul V. Member Posts: 42 Blue Ribbon
    edited May 14, 2015 10:55AM

    Currently deletes are based on Col2 values found in a different table but I could wrote some PL/SQL to utilize the primary key instead.

  • Cobert
    Cobert Member Posts: 564 Silver Badge
    edited May 14, 2015 11:27AM

    At present are you running a straightforward delete or context switching? See this from oracle magazine on bulk processing, may be of some help: PLSQL 101

    Quite tricky with that many rows though!

    Paul V.
  • Iordan Iotzov
    Iordan Iotzov Member Posts: 750 Silver Badge
    edited May 14, 2015 11:33AM

    I cannot offer dramatic improvements, just some ideas that can help a little bit:

    1. 1.As already mentioned by sol.beach, you can make the chunks larger and thus reduce the number of commits.
    2. When going after the deleted data, take into account how it is clustered. If one DB block contains 10 records to be deleted, it is better to have all 10 of them deleted in one shot, instead of each of the 10 records deleted with a separate statement. Reviewing index clustering factor can be useful in determining the best way forward.
    3. Consider going parallel (DBMS_PARALLEL_EXECUTE or parallel DML). Please note that each of the methods (particularly parallel DML) comes with some restrictions/limitations.

    HTH,

    Iordan Iotzov

  • John Stegeman
    John Stegeman Member Posts: 24,269 Blue Diamond
    edited May 14, 2015 11:37AM

    Well, let me throw out the obvious - have you thought about partitioning the table?

  • Oyvind Isene
    Oyvind Isene Member Posts: 64 Bronze Badge
    edited May 14, 2015 12:28PM

    Besides suggesting to partition the table (as previous poster did), you can look into a simple routine that creates a new table with select (CTAS) as explained by Tom Kyte here.

    If partitioning is possible, you can use the package dbms_redefinition to do so. This recipe by Dr Tim Hall shows you how.

    Paul V.
  • John Stegeman
    John Stegeman Member Posts: 24,269 Blue Diamond
    edited May 14, 2015 12:36PM

    I wouldn't ordinarily suggest CTAS when faced with deleting 200 million out of 8 billion records. 7.8 billion records is a lot of I/O

  • Oyvind Isene
    Oyvind Isene Member Posts: 64 Bronze Badge
    edited May 14, 2015 12:53PM

    If I could afford the space I would certainly have tested it. Deleting 200 million rows may generate a lot of extra I/O writing redo, maintaining indexes, and space management.

    Jonathan Lewis
  • John Stegeman
    John Stegeman Member Posts: 24,269 Blue Diamond
    edited May 14, 2015 12:58PM

    deleting rows = no space management

  • Oyvind Isene
    Oyvind Isene Member Posts: 64 Bronze Badge
    edited May 14, 2015 1:09PM

    True But the next inserts, unless the append hint is used, will try to fill those empty blocks, right? Looks like I should create an experiment and blog it

This discussion has been closed.