Discussions
Categories
- 196.8K All Categories
- 2.2K Data
- 238 Big Data Appliance
- 1.9K Data Science
- 450.3K Databases
- 221.7K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 550 MySQL Community Space
- 478 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3K ORDS, SODA & JSON in the Database
- 544 SQLcl
- 4K SQL Developer Data Modeler
- 187K SQL & PL/SQL
- 21.3K SQL Developer
- 295.8K Development
- 17 Developer Projects
- 138 Programming Languages
- 292.5K Development Tools
- 107 DevOps
- 3.1K QA/Testing
- 646K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 155 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.1K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 18 Java Essentials
- 160 Java 8 Questions
- 86K Java Programming
- 80 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 204 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 439 LiveLabs
- 38 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 171 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 232 Portuguese
Bulk Deletes from a Large Table

DB: 11.2.0.3
OS: Linux, kernal 2.6.18
I have the following three column table,
Col1 not null number
Col2 not null number
Col3 not null number
Col1 is the primary key. There is a unique index on Col2/Col3 combined. There is a non-unique index on Col3.
Table has around 8 billion records and it is not partitioned.
Once a month, I have to delete approximately 200,000,000 records. I know this is a generic question, but I would appreciate any suggestions on the best way to accomplish this. I have been deleting in chunks of about 5,000 records at a time but that is taking longer than I would like. Am I missing some obvious way to solve this problem.
For background, when this table/application was designed about eight years, we were never going to delete data from it. Now there are some new "policies" concerning some of the data that resides here causing this new delete requirement.
Answers
-
Paul V. wrote: DB: 11.2.0.3 OS: Linux, kernal 2.6.18 I have the following three column table, Col1 not null number Col2 not null number Col3 not null number Col1 is the primary key. There is a unique index on Col2/Col3 combined. There is a non-unique index on Col3. Table has around 8 billion records and it is not partitioned. Once a month, I have to delete approximately 200,000,000 records. I know this is a generic question, but I would appreciate any suggestions on the best way to accomplish this. I have been deleting in chunks of about 5,000 records at a time but that is taking longer than I would like. Am I missing some obvious way to solve this problem. For background, when this table/application was designed about eight years, we were never going to delete data from it. Now there are some new "policies" concerning some of the data that resides here causing this new delete requirement.
How is the decision made as to which rows get removed?
> I have been deleting in chunks of about 5,000 records at a time
Multiple COMMIT increase elapsed time.
-
Currently deletes are based on Col2 values found in a different table but I could wrote some PL/SQL to utilize the primary key instead.
-
At present are you running a straightforward delete or context switching? See this from oracle magazine on bulk processing, may be of some help: PLSQL 101
Quite tricky with that many rows though!
-
I cannot offer dramatic improvements, just some ideas that can help a little bit:
- 1.As already mentioned by sol.beach, you can make the chunks larger and thus reduce the number of commits.
- When going after the deleted data, take into account how it is clustered. If one DB block contains 10 records to be deleted, it is better to have all 10 of them deleted in one shot, instead of each of the 10 records deleted with a separate statement. Reviewing index clustering factor can be useful in determining the best way forward.
- Consider going parallel (DBMS_PARALLEL_EXECUTE or parallel DML). Please note that each of the methods (particularly parallel DML) comes with some restrictions/limitations.
HTH,
Iordan Iotzov
-
Well, let me throw out the obvious - have you thought about partitioning the table?
-
Besides suggesting to partition the table (as previous poster did), you can look into a simple routine that creates a new table with select (CTAS) as explained by Tom Kyte here.
If partitioning is possible, you can use the package dbms_redefinition to do so. This recipe by Dr Tim Hall shows you how.
-
I wouldn't ordinarily suggest CTAS when faced with deleting 200 million out of 8 billion records. 7.8 billion records is a lot of I/O
-
If I could afford the space I would certainly have tested it. Deleting 200 million rows may generate a lot of extra I/O writing redo, maintaining indexes, and space management.
-
deleting rows = no space management
-
True But the next inserts, unless the append hint is used, will try to fill those empty blocks, right? Looks like I should create an experiment and blog it