Forum Stats

  • 3,733,169 Users
  • 2,246,708 Discussions
  • 7,856,553 Comments

Discussions

Fastest way to batch delete data from a table with 1 billion rows

OraC
OraC Member Posts: 70 Blue Ribbon
edited January 30 in SQL & PL/SQL

Hi,

I need some help deleting batches from a really large online transactions table (up to 1 billion records). I hope to delete around 9 million records daily. 9 million more are being added daily. I have an off-peak time window when customer usage is limited so I can try to run this optimally, but I'm also conscious of not impacting any potential customers too much by specify too high a batch size below(10,000). Its Oracle 12.2 Standard Edition so unfortunately partitioning is not an option. I've come up with the following but its just not deleting fast enough. The initial select seems to be ok, its more about my loop. Is there a more efficient way of batching this?

DECLARE 

cursor cur is 

select /*+ index_ffs(a,P_ITD) parallel_index(a,P_ITD,4) */ C_ID from ITD a WHERE CREATED < '27-NOV-20 12.00.00.000000 AM';

TYPE CII_TYPE IS TABLE OF NUMBER; 

CII_TYPE_TBL CII_TYPE; 

BEGIN 

OPEN CUR; 

LOOP 

FETCH CUR BULK COLLECT INTO CII_TYPE_TBL LIMIT 10000; 

FORALL i IN 1..CII_TYPE_TBL.COUNT 

DELETE FROM ITD WHERE C_ID=CII_TYPE_TBL(i); 

COMMIT; 

EXIT WHEN CUR%NOTFOUND; 

END LOOP; 

CLOSE CUR; 

END; 

/

P_ITD is the primary key constraint on the ITD table on C_ID

CREATED_ON is also indexed separately.

Thanks

S567

Answers

  • RogerT
    RogerT Member Posts: 1,852 Gold Trophy

    Why are you reading 10000 rows into a collection just to delete those rows afterwards from the table you have read from? If you want to delete in chunks this could also be done wit a delete statement only (looping as long as the delete statement deletes rows).

    Why comparing CREATED (hopefully a DATE column) with a string value?

    hth

  • Billy Verreynne
    Billy Verreynne Member Posts: 28,259 Red Diamond

    The SQL runs as multiple parallel processes.

    The PL/SQL process runs as a SINGLE serialised process. Bulk collect does not make it auto magically run in parallel.

    So it is obvious that the PL/SQL process is the bottleneck. Bulk collect IS SLOW row-by-row processing.

    Eliminate PL/SQL serialisation. Maximise SQL parallelisation. Use a single SQL parallel DML to find and delete the relevant data.

  • OraC
    OraC Member Posts: 70 Blue Ribbon

    Thanks RogerT and Billy! Its much appreciated.

    I have SECUREFILES in my table so I don't think it supports parallel delete, right?

    From some tests I've seen that once the list of C_IDs are compiled initially then the delete will run much faster because it can just use the P_ITD primary key in the execution plan which seems to be much faster than using the CREATED timestamp index scan.

    Taking these 2 things into account, what would you recommend?

    Any sql or plsql examples would help greatly. I'm not at all experienced in writing these things and I'm a bit stuck to fix this.

    Again, your help is very much appreciated.

    Thanks,

    O.

  • Billy Verreynne
    Billy Verreynne Member Posts: 28,259 Red Diamond

    See support Master Note: Overview of Oracle Large Objects (BasicFiles LOBs and SecureFiles LOBs) (Doc ID 1490228.1)

    SecureFiles LOBs 12c Enhancements

    Enable PDML Operations on SecureFiles

    Limitations have been removed in this release with regard to Parallel DML (PDML) support for SecureFiles LOBs. This feature allows SecureFiles to leverage the performance and scalability benefits of the PDML features of Oracle Database.

  • cormaco
    cormaco Member Posts: 1,556 Bronze Crown

    What about this:

    delete /*+ parallel */ from ITD
    where created < timestamp '2020-11-27 00:00:00'
    
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,602 Gold Crown

    @Billy Verreynne


    "Limitations have been removed ... " but it doesn't say which ones. I've just done a simple test with parallel DML enabled and this is what the plan says:

    Note
    -----
      - Degree of Parallelism is 2 because of table property
      - PDML disabled because single fragment or non partitioned table used
    

    And it was teling the truth about run-time. The pass to identify rows ran parallel 2, then the rowids were passed to the co-ordinator for serial deletion.


    Regards

    Jonathan Lewis

    OraC
  • Billy Verreynne
    Billy Verreynne Member Posts: 28,259 Red Diamond

    Unfortunately not the first and only misleading support note it seems... <sigh>

  • OraC
    OraC Member Posts: 70 Blue Ribbon

    Thanks all!

    Yeah, unfortunately in this link https://docs.oracle.com/en/database/oracle/oracle-database/12.2/vldbg/types-parallelism.html it states:

    "For non-partitioned tables with LOB columns, parallel INSERT operations are supported provided that the LOB columns are declared as SecureFiles LOBs. Parallel UPDATEDELETE, and MERGE operations on such tables are not supported."

    Is the method I have shown above for looping using the C_IDs really inefficient then considering when I try and delete in batches less than CREATED I find it often can choose an alternative execution plan which looks to be inefficient compared to that for using the primary key.

    @RogerT Thanks for your suggestion, is this what you mean?

    var deleteolderthan TIMESTAMP(6);

    var batch_size number;

    exec :deleteolderthan := '27-NOV-20 12.00.00.000000 AM';

    exec :batch_size := 10000;

    begin

     LOOP

      delete from ITD where CREATED < :deleteolderthan and rownum <= :batch_size;

      EXIT WHEN SQL%ROWCOUNT=0;

      COMMIT;

     END LOOP;

    end;

    /

    1) Will this be more efficient and are there any additional improvements I can make to this to make it run even faster?

    2) If the delete takes up to 4-5 hours to run and the table size has shrunk by 9 million records potentially affecting other query performance. Will I achieve any benefit from updating statistics on the table or key indexes mid execution?

    Any other suggestions very welcome.

    Thanks again,

    O

  • RogerT
    RogerT Member Posts: 1,852 Gold Trophy

    You should not compare

    a) delete using CREATED

    with

    b) delete using C_ID

    what you should compare is

    a) delete using CREATED

    with

    b) fill a collection using CREATED PLUS delete using C_ID


    so?

  • OraC
    OraC Member Posts: 70 Blue Ribbon

    Thanks @RogerT

    What you've suggested is what I am trying to do now but I'm still trying to find out the optimal way to do both for comparison purposes. Is a cursor or a loop or some other method more efficient typically for something like this?

  • Billy Verreynne
    Billy Verreynne Member Posts: 28,259 Red Diamond

    Introducing PL/SQL code to process the output of a SQL cursor, introduces an overhead. A lot of cursor looping just moves data from the cursor to another cursor - and this is mostly stupid.

    You can control parallel execution via the DBMS_PARALLEL interface - see https://community.oracle.com/tech/developers/discussion/comment/13727146#Comment_13727146 for an example.

    That said, deleting LOB segments in parallel could introduce wait event latency and slow this parallel approach down.

  • OraC
    OraC Member Posts: 70 Blue Ribbon

    Thanks @Billy Verreynne

    What about this:

     https://docs.oracle.com/en/database/oracle/oracle-database/12.2/vldbg/types-parallelism.html it states:

    "For non-partitioned tables with LOB columns, parallel INSERT operations are supported provided that the LOB columns are declared as SecureFiles LOBs. Parallel UPDATEDELETE, and MERGE operations on such tables are not supported."

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,602 Gold Crown

    I'm still trying to find out the optimal way to do both for comparison purposes.

    There is no generic answer to the question of "how to delete a lot of data"; but there are factors that point you towards the right tests.

    Key features in your case

    • you're on standard edition so no question of on-line restructuring or partitioning
    • you want to delete about 1% of the data, every day
    • you have a (securefile) LOB column
    • your target data is time-dependent (created more than ca, 3 months ago).

    On the plus side (and assuming no disruption because of historical activity),, the 9M rows you want to delete will be very well clustered in a set of consecutive blocks in the table, they won't scattered all over the table. (That's implied by the "created date" column that you're using for the delete.)

    One of the problems you might have seen with your first code sample - parallel select of rowids - is that the batch of rows you deleted might have been fed in smaller batches from the 4 parallel server processes which could mean that the query co-ordinator was having to jump around the section of the table for that day a little randomly - leading to extra work on buffer latches, consistent gets and current gets. The side effects of using the dbms_parallel_execute package (which is not the same as executing a single statement with a set of parallel servers processes) might make the side effects of parallelism even worse.

    Key point to investigate: where does the time go when you try each method. To determine this in Standard Edition you could try taking a statspack snapshot just before and just after running a smaller delete cycle when there is little other activity; alternatively just take a snapshot of either the session or system level activity and wait events before and after the run (v$sesstat / v$sesstat for the session) and v$system_event / v$session_event.

    A few possibilities are:

    • CPU
    • Waiting for log file writes to complete
    • db file sequential reads - possible of randomly scattered index leaf blocks if you have several indexes
    • db file sequential reads - of undo blocks if you're having read-consistency problems
    • enqueue waits - if there are collisions between processes doing updates and your delete

    The CPU time is something you can from the session/system activity; the other from the session/system events. The underlying cause of time spent in the events may be something that can be inferred from the session/system activity (e.g. from the stats named like '%undo records applied"). If necessary you could get extra detail from other dynamic performance views, or enable extended SQL trace for a run.


    Regards

    Jonathan Lewis

  • OraC
    OraC Member Posts: 70 Blue Ribbon

    Thank you @Jonathan Lewis

    I'm running on AWS Oracle RDS and using Performance Insights I've already been able to determine that during the deletion the time is spent in CPU and db file sequential read, mostly the latter.

    I have 8 indexes on this table:

    6 single column indexes (2 of which are there for foreign key constraint links)

    1 two column index

    1 three column unique index (unique constraint with using index clause)

    In Oracle RDS a log switch happens automatically every 5 minutes (this cannot be changed because of AWS recovery restrictions) and I was finding that logs were switching a lot more often than that. I have now sized my 12 redo log files to be 4GB each in size and the log switch happens roughly every 5 minutes now, with the occasional 5 minute period where there is more than 1 switch. With this amount of log switching, what should my UNDO tablespace size be?

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,602 Gold Crown
    edited January 31


    Good move with the 4GB redo logs (assuming it cost very little cash - you might want to check the archived log file size if space is a consideration).

    One option to bear in mind is that if you drive through an index in ascending order to delete rows Oracle can optimize the index maintenance - it will track the rows (and values) deleted then do a sort and then do array updates into each index in turn to delete the index entries. This can make a big difference to the volume of random I/O, undo and redo (because each leaf block updated is visited just once and your data may mean many rows are deleted from each leaf block as a single undo/redo entry).

    If your delete is driven by tablescan oracle will update each index as it deletes each row.

    There are some exclusions to the array update mechanism - your two indexes supporting the foreign key indexes will be updated row by row.

    Since you have a "created" index you may find a process that does something like the following is as efficient as you can get:

    delete /*+ index_rs_asc(a (created)) */  
    from itd a 
    where created < '27-Nov-2020 12:00:00' 
    and rownum < 50000
    

    You'd need to embedd this in a PL/SQL loop that cycles until the number of rows deleted is zero. (obviously you might need to experiment to find an optimum number of rows to delete in one pass, remembering that you'll be sorting 6N keys in memory if you set the number to N.

    You'd have to check that the execution path was an index range scan; and you might find that the cost of walking through the index you'd deleted on all the previous passess might add undesirable overheads (which you might avoid by coding the PL/SQL to start where it had last left off (or one second before).


    One though - if either of your multiple column indexes starts with the same column eiather of as the foriegn key indexes than the foreign key index is redundant and could be dropped.


    Regards

    Jonathan Lewis


    Footnote: https://jonathanlewis.wordpress.com/2006/11/22/tuning-updates/ is a note I wrote a long time ago about updating through indexes, with a footnote that this applies also to deletes.

    OraC
  • Billy Verreynne
    Billy Verreynne Member Posts: 28,259 Red Diamond

    DBMS_PARALLEL_EXECUTE is a method allowing you to "multi-thread" your SQL and/or PL/SQL code, with "threads" executed as DBMS_SCHEDULER background processes.

    Oracle Parallel Query (PQ) is not used and is not relevant in this case.

    However, as using this approach will simulate (as non-PQ non single transaction processes) what PQ does - deleting securefile LOB rows from the same table at the same time - this approach could run into securefile related cross process contention.

    OraC
  • Billy Verreynne
    Billy Verreynne Member Posts: 28,259 Red Diamond

    Since you have a "created" index you may find a process that does something like the following is as efficient as you can get:

    delete /*+ index_rs_asc(a (created)) */  
    from itd a 
    where created < '27-Nov-2020 12:00:00' 
    and rownum < 50000
    

    You'd need to embedd this in a PL/SQL loop that cycles until the number of rows deleted is zero. (obviously you might need to experiment to find an optimum number of rows to delete in one pass, remembering that you'll be sorting 6N keys in memory if you set the number to N.

    This code also needs to support instrumentation, support restarting failed loop processing, etc.

    Rather run this as single "threaded" process via DBMS_PARALLEL_EXECUTE, making use of its instrumentation, transaction, and recovery features.

    OraC
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,602 Gold Crown


    For a bit of background reading on "massive deletes", there's a short series I wrote for redgate, and a few articles on my blog that cover quite a lot of ground. Catalogued here: https://jonathanlewis.wordpress.com/2018/06/08/massive-delete/


    One very specific point in your case, that I overlooked in my first response: you have a LOB column, but the resources needed to deal with that column will vary greatly depending on whether the actual value is (frequently) stored in-row or out of row. If you have allowed in-row and a lot of your LOB values are less than the (approx) 4,000 byte limit then deleting them will generate a lot of undo and redo because they are simply part of a standard row delete. Moreover if the LOB values are in-row and subject to update you may find that a lot of the rows have ended up being migrated, leading to a lot more random I/O to take you from the original location of the row to the migrated row. (Keep an eye on the session activity statistic "table fetch continued rows" when testing.)


    Regards

    Jonathan Lewis

    OraC
  • OraC
    OraC Member Posts: 70 Blue Ribbon
    edited February 8

    Thank you all so much for all your suggestions... Its really appreciated!

    So there have been a few changes since and we're now testing the delete from created_dtm directly. 

    We now have some additional search criteria. The delete from ITD now has to include specific values from another table - SPECS so the delete looks like this now:


    delete from ITD 

    where DATA_ID in (select DATA_ID from SPECS where SERVICE_ID = 9) 

    and CREATED < '27-Nov-2020 12:00:00' 

    and rownum <= 10,000;


    The SPECS table only has 78 records of which 24 match the SERVICE_ID of 9

    DATA_ID is NUMBER(20)

    SERVICE_ID is NUMBER(20)


    Even though there are potentially 24 different DATA_IDs for SERVICE_ID 9, only 3 are currently used in this installation - DATA_IDs (56, 76, 77) 


    ITD_CIDS_7 is an index on ITD - SERVICE_ID and CREATED

    A_SPECS_U1 is a unique key constraint index on SPECS - SERVICE_ID and another column. 

    SERVICE_ID is indexed separately but its not being used. 


    @Jonathan Lewis Thank you for posting your Redgate article. There is a lot of useful information in there! Just to let you know the tricky BLOB situation we have:

    There is a single securefile BLOB_VALUE column in the ITD table which unfortunately holds the data from a number of different data elements, each one identified by the values returned from the (select DATA_ID from SPECS where SERVICE_ID = 9) subquery.


    The 3 DATA_IDs and corresponding length(BLOB_VALUE) sizes are as follows:

    DATA_ID - 56: typically either 784 or 880 bytes, number of rows around 270m

    DATA_ID - 76: typically either 3440 or 4384 bytes, number of rows around 270m

    DATA_ID - 77: typically either 3440 or 4384 bytes, number of rows around 270m

    so theres a real mix of blobs being stored inline and out of line.

    We are introducing compresssion in the application that will half the length of these elements so that will definitely help longer term.


    ITD table now has 835 million rows.

    Average row length from this table is 2232.

    BLOB_VALUE is defined as: 

    BLOB_VALUE BLOB

    lob (BLOB_VALUE) store as securefile BLOBSEG_ITD_BLOBVAL

    Each transaction inserts one row for each of the 3 DATA_ID elements above. So we have approximately 270 million transactions in the table at the moment. 270 x 3 rows = 810 million rows

    The remaining BLOB_VALUEs in the table are a mixture of other entries which don't ever get deleted and they will be inserted as other transactions are happening in the system. 

    Any discussions around breaking this table up will have to come later as that would require a big re-engineering project.


    The execution plan now looks like:

    ----------------------------------------------------------------------------------------------------------------

    | Id | Operation               | Name         | Rows | Bytes | Cost (%CPU)| Time   |

    |  0 | DELETE STATEMENT            |            |    |    | 1152K(100)|     |

    |  1 | DELETE                | ITD     |    |    |      |     |

    |* 2 |  COUNT STOPKEY            |            |    |    |      |     |

    |* 3 |  HASH JOIN              |            |  188M|  12G| 1152K (1)| 00:00:46 |

    |  4 |   TABLE ACCESS BY INDEX ROWID BATCHED| SPECS         |  24 |  192 |   2  (0)| 00:00:01 |

    |* 5 |   INDEX RANGE SCAN         | A_SPECS_U1      |  24 |    |   1  (0)| 00:00:01 |

    |* 6 |   INDEX FAST FULL SCAN        | ITD_CIDS_7      |  188M|  10G| 1151K (1)| 00:00:45 |

    ----------------------------------------------------------------------------------------------------------------


    Predicate Information (identified by operation id):

      2 - filter(ROWNUM<=:3)

      3 - access("DATA_ID"="DATA_ID")

      5 - access("SERVICE_ID"=:1)

      6 - filter("CREATED"<:2)


    Is there much else that can be done here?

    Also, after running a daily delete of 3 million transactions, (9 million ITD rows) how do we run statistics efficiently without it impacting normal user transactions when we are out of our maintenance window? Should it be run across the entire table with cascade true or should we only run stats manually for the ITD_CIDS_7 index?


    Thanks again!

    OraC

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,602 Gold Crown

    we're now testing the delete from created_dtm directly.

    You seem to have referred to this column as CREATED elsewhere - are there two different columns, one with and one without a time component ? You've also mentioned "created_on", and you've compared "created" with a literal that looks like a timestamp. Could you please give us a definitive statement of what you actually have..

    ITD_CIDS_7 is an index on ITD - SERVICE_ID and CREATED

    A_SPECS_U1 is a unique key constraint index on SPECS - SERVICE_ID and another column. 

    SERVICE_ID is indexed separately but its not being used.

    Another problem with your descriptions - ITD_CIDS_7 is on the ITD table - SERVICE_ID is a column on the SPECS table. Did you mean (data_id, created) for ITD_CIDS_7? Since nearly 100% of the data is evenly spread between 3 values of data_id then that's a bad choice of index - especially if a process inserts three rows at the same time to cover all three data_ids. However you may be have reduced the buffer busy waits on inserts by creating that index. However you've also said that (created) is separately indexed anyway which means you need to think carefully about your whole indexing strategy.

    Regards

    Jonathan Lewis


    PS I would say more but the forum softtware is behaving so badly at the moment that it's a major disincentive to continue.

    If you want further strategy suggestions please supply a list of your index definitions, and clarify what "created" really is.

  • OraC
    OraC Member Posts: 70 Blue Ribbon

    Apologies for the confusion @Jonathan Lewis and thank you for drawing my attention to the discrepancies. See the tables below.

    create table ITD 

    (

      CIDES_INSTANCE_ID  NUMBER(20,0)     not null,

      CUST_ID NUMBER(20,0)     not null,

      DATA_ID NUMBER(20,0)     not null,

      D_TYPE VARCHAR2(255)    default 'NOVALUE' not null,

      EK_ID NUMBER(20,0),

      MK_ID NUMBER(20,0),

      CREATED     TIMESTAMP      not null,

      UUN  VARCHAR2(255),

      UAN  VARCHAR2(255),

      MC         BLOB,

      CHR_VALUE     VARCHAR2(1024),

      BLOB_VALUE     BLOB,

      constraint P_ITD primary key (CIDES_INSTANCE_ID) using index tablespace DU_INDEXES,

      constraint A_ITD_U1 unique (CUST_ID, DATA_ID, D_TYPE) using index tablespace DU_INDEXES

    )

    tablespace DU_DATA 

    lob (BLOB_VALUE) store as securefile BLOBSEG_ITD_BLOBVAL

    lob (MC) store as BLOBSEG_ITD_MC;


    create index ITD_CIDS_1 on ITD (CUST_ID ASC) tablespace DU_INDEXES;

    create index ITD_CIDS_2 on ITD (DATA_ID ASC) tablespace DU_INDEXES;

    create index ITD_CIDS_3 on ITD (MK_ID ASC) tablespace DU_INDEXES;

    create index ITD_CIDS_4 on ITD (EK_ID ASC) tablespace DU_INDEXES;

    create index ITD_CIDS_5 on ITD (D_TYPE ASC) tablespace DU_INDEXES;

    create index ITD_CIDS_6 on ITD (CREATED ASC) tablespace DU_INDEXES;

    create index ITD_CIDS_7 on ITD (DATA_ID ASC, CREATED ASC) tablespace DU_INDEXES;


    create table SPECS 

    (

      DATA_ID NUMBER(20,0)     not null,

      SERVICE_ID NUMBER(20,0)     not null,

      SP_NAME    VARCHAR2(255)    not null,

      SP_TYPE    VARCHAR2(40)     not null

       constraint C_DATA_ELE_SP_TYPE check (SP_TYPE in ('CHAR','BLOB')),

      E_I    VARCHAR2(1)     default 'N' not null

       constraint C_DATA_ELE_E_I check (E_I in ('Y','N') and E_I = upper(E_I)),

      constraint P_SPECS primary key (DATA_ID) using index tablespace DC_INDEXES,

      constraint A_SPECS_U1 unique (SERVICE_ID, SP_NAME) using index tablespace DC_INDEXES

    )

    tablespace DC_DATA;


    create index SPECS_IDX_1 on SPECS (SERVICE_ID ASC) tablespace DC_INDEXES;


    On the ITD table we have FK constraints to parent tables for the following columns: CUST_ID, EK_ID, MK_ID, hence the indexes for those columns. DATA_ID also has a FK constraint to the SPECS parent table.

    On the SPECS table we have FK constraints from SERVICE_ID to a parent table.

    MC blob is tiny and has a length of 32 for all rows

    The skew of data in DATA_ID as mentioned above is 95-100% for the 3 values (56, 76, 77)

    Other counts for other DATA_IDs in the ITTD table associated with other SERVICE_IDs are as follows:

    DATA_ID: 43 - 6,485,780

    DATA_ID:  15 - 3,261,283

    DATA_ID:  54 - 152,108

    after that the numbers reduce to about 200 records for 15 other DATA_IDs associated with other SERVICE_IDs.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,602 Gold Crown


    You've got more indexes than you need on the big table.

    Deciding which to keep, which to drop, and which to chanage depends entirely on your strategic requirements.

    The (created) index might be causing buffer busy waits - all the inserts in the same second go into the same leaf block - so if you have multiple sessions inserting data they will keep colliding on that leaf block; this means (data_id, created) might be the one to keep provided you can "delete where data_id = constant and created <= xxxx".

    As it stands, then,

    • you don't need the index (cust_id) as a "foreign key" index because you've got (cust_id, data_id, d_type).
    • you don't need (data_id) as a foreign key index because you've got (data_id, created)
    • you may not need (created) if you can delete with the predicate above

    There are various ways you could try to code your delete with "in list", to emulate the effect of the delete statement I've suggested but possibly the simple thing to do is (pseudo-code):

    select data_id into pl/sql array from specs
    for i in 1..data_id_array.count loop
      loop
        delete /*+ index(itd (data_id created)) */ from itd where data_id = data_id_array(i) and created <=  ...;
        commit;
      until sql%rowcount = 0;
    end loop
    


    Regards

    Jonathan Lewis

    OraC
  • OraC
    OraC Member Posts: 70 Blue Ribbon
    edited February 8

    Thank you @Jonathan Lewis

    I am investigating with the development team to determine if we can drop some of these indexes.

    A new issue is hindering my testing progress now. After creating a copy of the database I'm working on for testing these delete options I will test a number of deletions and then restore my backup. Immediately before I took my backup the delete batches were completing in a number of seconds and I gathered stats on the table using:

    exec DBMS_STATS.GATHER_TABLE_STATS('SCHEMA', 'ITD', cascade => TRUE);

    After a reboot, the delete batch is taking minutes to run...rendering my tests almost useless. I expect this is due to a lack of index or table caching or both. I've read that a full table scan is one of the ways to load up this data, however for a table this size this is not going to be feasible. Are there any other options within standard edition to alleviate those awful initial batch executions after database restore or even reboot?

    Thanks,

    OraC

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,602 Gold Crown

    After a reboot, the delete batch is taking minutes to run...rendering my tests almost useless. I expect this is due to a lack of index or table caching or both

    By reboot do you mean of the machine, or do you mean restarting the instance. In either case (but at different levels) the need to reload caches after restarting can make a big difference between clean testing and preliminary experiments against a system that's been running for a long time and has a "live" cache.

    That's one of the reasons you need to know what the data looks like and what the real patterns of usage are so that you can make reasonable guesses about what fraction of the I/O you see in testing would have been cached on a production system.

    Running big tablescans (apart from the potential for not doing what you expect because of serial direct reads) will probably not be a realistic way to warm the instance - possibly you could review a few Statspack reports to get an idea of the most used, high-volume SQL statements and create a startup model that runs some of those statements (with different, randomised, bind values) a few thousand times to warm things up.

    Regards

    Jonathan Lewis

    OraC
  • OraC
    OraC Member Posts: 70 Blue Ribbon

    Thanks for that suggestion again @Jonathan Lewis

    We have set up a loading utility to run through a whole set of application queries and it has definitely improved performance of the deletion script after a restart be it an instance restart or server reboot.

    Regarding the suggestion above of using an array loop:

    1) Is array processing more efficient than using a cursor?

    2) Would one option be more suitable than the other based on the requirement to specify 9 million DATA_ID number(20) entries in a single execution?

    3) What kind of session memory problems could we encounter and could it have a knock on impact of paging other valuable data out of memory?

    4) Should we be resizing any memory structures or modifying any initialisation parameters to reduce the impact while running other online transactions?

    Regards,

    O

Sign In or Register to comment.