1 2 Previous Next 20 Replies Latest reply on Jun 8, 2018 4:07 PM by jgarry

    DML Performance

    933257

      Hi we are using version 11.2.0.4 of oracle. We are planning to DELETE 25million+ records from a table. Below are three options we are thinking for the DELETE , which way of delete would be the best in terms of performance? Are below option-2 and option-3 will give similar response time and consume similar DB resources?

       

      There exists index on column C1 of table TAB1. TAB1 contains ~1billion records.

       

      Option-1

      DELETE FROM TAB1

      WHERE C1 in (select c1 from tab2 where status='F'and c_id>=100);

       

       

      option-2

       

      DECLARE

           type rowidArray is table of rowid index by binary_integer;

           type emRec is record

           (rowid    rowidArray);

           l_record    emRec;

           l_array_size number:=10000;

           l_done       boolean;

           l_cnt        number:=1;

       

           l_processed  number:=0;

       

           cursor c is select rowid from TAB1

            WHERE C1 in (select c1 from tab2 where status='F'and c_id>=100);

      BEGIN

           open c;

           loop

               dbms_application_info.set_action('Processing '||l_cnt||' thru '||(l_cnt+l_array_size-1)||' ('||l_processed||')');

       

               fetch c bulk collect into l_record.rowid LIMIT l_array_size;

               l_done := c%notfound;

       

               forall i in 1 .. l_record.rowid.count

                   delete from tab1

                    where rowid    = l_record.rowid(i);

       

                l_processed:=l_processed + SQL%ROWCOUNT;

       

               commit;

               exit when (l_done);

       

               l_cnt := l_cnt + l_array_size;

           end loop;

      end;

      /

       

       

       

      option-3

       

      BEGIN  

      LOOP    

      DELETE FROM TAB1

      WHERE C1 in (select c1 from tab2 where status='F'and c_id>=100)

      AND ROWNUM <= 10000;    

      EXIT WHEN SQL%ROWCOUNT = 0;     

      COMMIT;  

      END LOOP;

      END;

      /

        • 1. Re: DML Performance
          John Thorton

          933257 wrote:

           

          Hi we are using version 11.2.0.4 of oracle. We are planning to DELETE 25million+ records from a table. Below are three options we are thinking for the DELETE , which way of delete would be the best in terms of performance? Are below option-2 and option-3 will give similar response time and consume similar DB resources?

           

          There exists index on column C1 of table TAB1. TAB1 contains ~1billion records.

           

          which metric measures best performance?

          Is the goal to minimize or maximize this metric?

           

          Never do in PL/SQL that which can be done in plain SQL.

           

          COMMIT inside LOOP increases probability that ORA-01555 Snapshot To  Old will occur.

          1 person found this helpful
          • 2. Re: DML Performance
            Lisandro Fernigrini

            Usually, pure SQL solutions perform better than PL/SQL processes. If you can avoid PL/SQL and do it in SQL, then go that way.

             

            Other thing you may consider is taking into account how many rows are supposed to be kept on the table. If the table has 30 million rows and you plan to delete 25 millo, then probably creating a new table with the remaining rows, dropping the original table and then renaming the new one and regenerating missing objects (like FKs from other tables) may be quicker than deleting the 25 millions.

            1 person found this helpful
            • 3. Re: DML Performance
              Andrew Sayer

              Impossible to say without knowing what execution plans will be done. However, option 2 will make you do tons of reading against undo.

               

              If you're going to do it in batches, do it with dbms_parallel_execute, batch it by rowid. You can still run this in series, or you can scale it up to a few parallel jobs if you have the spare resources.

              1 person found this helpful
              • 4. Re: DML Performance
                933257

                Sorry i am not able to get why the option-2 using "forall" will read more UNDO than option-3, can you please  explain? Isn't both option-2 and option-3 will delete 10k record at one shot and then commit, so they will visit same amount of UNDO?

                I was under assumption bulk processing is faster than row by row processing, so was trying to convert as bulk collect. is my understanding wrong here? First option will try to hold all 25million records UNDO and will try to commit in one-shot which can cause failure due to UNDO space or other issues, so we were trying other options.

                • 5. Re: DML Performance
                  jgarry

                  Agree with Lisandro (except, after I wrote the following I noticed the 1 billion rows, so you probably want to just delete with SQL, but I'll leave it anyways for you to consider).  Oracle favors adding over deleting.

                   

                  In addition, consider how long until the space is reused.  Oracle is pretty good about reusing space, including indices, but this is one exception that may favor recreating indices.

                   

                  Also consider what else is going on.  If you need others to have concurrent access to anything in this table, there are issues to think about, including needing undo to reconstruct the data the others need.  Undo can also be an issue with what John pointed out.

                   

                  Undo and redo issues should be considered.  A lot of redo can be generated with some scenarios, which can propagate to longer and larger backups.

                   

                  Truncation may result in very long checkpointing, too.

                   

                  CTAS what you want to keep, delete old table, rename new table, fix any constraint issues is often the fastest with least leftover issues, but you need to test for your particular situation.

                   

                  Another possibility is to backup, use transportable tablespaces to move the data elsewhere, do the mods there in noarchivelog mode, then move it back and backup again.  If you can afford the downtime - using noarchivelog can sometimes pay for itself with less i/o, meaning shorter downtime.

                   

                  If you need the table to remain usable while deleting, and you have times of lower usage, you may also consider dribbling out the change over time, using the delete statement with various discrete c_id values.  You may later want to alter table ... shrink space cascade if you can.

                   

                  Do you have advanced compression or partitioning?

                  1 person found this helpful
                  • 6. Re: DML Performance
                    jgarry

                    (Sorry, I didn't see yours or Andrews reply until after I posted)

                     

                    Sometimes mining asktom for things like this can be enlightening.  Keep in mind answers may change over decades.  https://asktom.oracle.com/pls/asktom/f?p=100:11:::::P11_QUESTION_ID:5918938803188,

                    • 7. Re: DML Performance
                      Andrew Sayer

                      933257 wrote:

                       

                      Sorry i am not able to get why the option-2 using "forall" will read more UNDO than option-3, can you please explain? Isn't both option-2 and option-3 will delete 10k record at one shot and then commit, so they will visit same amount of UNDO?

                      I was under assumption bulk processing is faster than row by row processing, so was trying to convert as bulk collect. is my understanding wrong here? First option will try to hold all 25million records UNDO and will try to commit in one-shot which can cause failure due to UNDO space or other issues, so we were trying other options.

                       

                      This line:

                      fetch c bulk collect into l_record.rowid LIMIT l_array_size;

                       

                      Is reading data from your table as of the point you opened the cursor - each time you fetch more results from it you will need to be applying the undo you have just written. By the time you get to the end, you will be trying to read the table as it was at the beginning of the process.

                       

                       

                      I'm not suggesting you do row by row processing, I'm suggesting you do rowid ranges. This will be about the same (probably faster) as array binding a list of rowids as you are doing in two.

                      e.g.

                      DELETE FROM TAB1

                      WHERE C1 in (select c1 from tab2 where status='F'and c_id>=100)

                      and rowid between :start_id and :end_id;

                       

                      The duration of the non-dml part of this will be about the time of a full tablescan and the time it takes to check if the C1 value appears in your tab2 query - we don't know what selectivity that is achieving or how well you can evaluate the filter from either direction.

                       

                      Doing the bulk collect just means you are running the query part of that delete to get the rowids that fit the filter against C1, then going back to the table on those rowids using array binds.

                       

                      1 person found this helpful
                      • 8. Re: DML Performance
                        John Thorton

                        933257 wrote:

                        Below are three options we are thinking for the DELETE , which way of delete would be the best in terms of performance?

                         

                        Which metric measures "best"?

                        Please quantify.

                        • 9. Re: DML Performance
                          933257

                          Thank you so much for the clarification. I found one discussion as below , comparing performance of all the methods. I will test these in lower environment and see the results. It seems after simple DELETE statement or with the PARALLEL Hints, the option-3(Batched Delete) here will give the optimal performance.

                           

                          https://community.toadworld.com/platforms/oracle/b/weblog/archive/2011/02/22/the-not-so-simple-delete

                          • 10. Re: DML Performance
                            Joerg.Sobottka

                            The BEST solution from a performance point of view at all in your scenario is to NOT delete from the original table.

                            Create with DBMS_REDEFINITION another mirror-table, which only do have the records you want to keep and just drop the original table after you have used DBMS_REDEFINITION.FINISH.

                            https://www.bobbydurrettdba.com/2014/05/16/12c-online-table-redefinition-examples/

                             

                            If you delete so many records, you do produce a big amount of REDO (which slows down the database and if you need to do a restore/recover also the time you need to get the full database recovered),  you waste a lot of space (due to high watermark) and you maybe need to maintain your indexes, ...

                            • 11. Re: DML Performance
                              Jonathan Lewis

                              Joerg.Sobottka wrote:

                               

                              The BEST solution from a performance point of view at all in your scenario is to NOT delete from the original table.

                              Create with DBMS_REDEFINITION another mirror-table, which only do have the records you want to keep and just drop the original table after you have used DBMS_REDEFINITION.FINISH.

                              https://www.bobbydurrettdba.com/2014/05/16/12c-online-table-redefinition-examples/

                               

                              If you delete so many records, you do produce a big amount of REDO (which slows down the database and if you need to do a restore/recover also the time you need to get the full database recovered), you waste a lot of space (due to high watermark) and you maybe need to maintain your indexes, ...

                               

                              The required delete is from a table with 1 billion rows - and we don't know how large those rows are or how many indexes exist on the table, or whether the table is partitioned and could be cleared one partiiton at a time; and we don't know whether or not the system has to be kept online while it's happening or whether there's some replication activity going on. Saying that dbms_redefinition is the best solution is jumping to conclusions from too little data.

                               

                              Regards

                              Jonathan Lewis

                              • 12. Re: DML Performance
                                Andrew Sayer

                                Joerg.Sobottka wrote:

                                 

                                The BEST solution from a performance point of view at all in your scenario is to NOT delete from the original table.

                                Create with DBMS_REDEFINITION another mirror-table, which only do have the records you want to keep and just drop the original table after you have used DBMS_REDEFINITION.FINISH.

                                https://www.bobbydurrettdba.com/2014/05/16/12c-online-table-redefinition-examples/

                                 

                                If you delete so many records, you do produce a big amount of REDO (which slows down the database and if you need to do a restore/recover also the time you need to get the full database recovered), you waste a lot of space (due to high watermark) and you maybe need to maintain your indexes, ...

                                On top of what Jonathan rightly pointed out.

                                 

                                How would you actually do this in dbms_redefinition while keeping the "new" table in sync? There's no filter clause. If you meant using the new in 12.2 (which OP is not on) execute_update procedure then you would loose all other changes to the table while you are doing the delete.

                                • 13. Re: DML Performance
                                  Jonathan Lewis

                                  933257 wrote:

                                   

                                  Thank you so much for the clarification. I found one discussion as below , comparing performance of all the methods. I will test these in lower environment and see the results. It seems after simple DELETE statement or with the PARALLEL Hints, the option-3(Batched Delete) here will give the optimal performance.

                                   

                                  https://community.toadworld.com/platforms/oracle/b/weblog/archive/2011/02/22/the-not-so-simple-delete

                                   

                                  That's not really a good article to follow - it's only a guideline on a few strategies.

                                   

                                  It's a small table, it's deleting 20% of the data (not 2.5% of large table), there's only one index (which happens to be one that might have allowed Oracle to take advantage of a particular optimisation that would be very helpful in some of the tests), and there's no information about the relative sizes of the data objects, the buffer cache, and the batch sizes in terms of blocks, and there's no information about where the time went.

                                   

                                  I've just published an index to a short series of articles I wrote in 2016 about massive deletes. You might want to read it to get some idea of what you can do and where the threats may be.

                                  https://jonathanlewis.wordpress.com/2018/06/08/massive-delete/

                                   

                                   

                                  After you've read through the articles you 'll have a better idea of what information we would need to be able to make relevant suggestions.

                                   

                                   

                                  Regards

                                  Jonathan Lewis

                                  • 14. Re: DML Performance
                                    933257

                                    The table is having ~36GB in size holding ~1.4billion records in it. Its not partitioned. It has two indexes on it. One(IDX1) on column C1 and other one(IDX2_PK) is composite primary key index on column (c2_id, dt_cr). size of index IDX1 is ~24GB and index IDX2_PK is ~38GB. Current idea is to keep the table online for querying by others. This is primary and data-guard is configured for DR replication. Column C1 in TAB1 is having ~56 distinct values and no nulls.

                                    1 2 Previous Next