This discussion is archived
12 Replies Latest reply: Aug 23, 2013 4:05 PM by Thomas Morgan RSS

Why do these insert vary so much in performance?

Thomas Morgan Newbie
Currently Being Moderated

I have a table and a package similar to those shown in below DDL.

 

Table TABLE1 is populated in chunks of 10000 records from a remote database, thru TABLE1_PKG
by receiving arrays of data for three of its fields and a scalar value for a set identifier
in column named NUMBER_COLUMN_3.

 

I have two databases with following record count in the table:

 

     DATABASE1: 55862629

 

     DATABASE2: 64225247

 

When I executed the procedure to move 50000 records to each of the two databases, it took 20 seconds to
populate DATABASE1 and it took 150 seconds to populate DATABASE2.  Network was discarded as I recorded
in the package how long each of the 5 10000 chunk took to insert in each of the two databases as follows:

 

Records Being Inserted  Time it took in DATABASE1     Time it took in DATABASE2
----------------------  --------------------------   -----------------------------
First  10000             3 seconds                    27 seconds
Second 10000             4 seconds                    26 seconds
Third  10000             6 seconds                    40 seconds
Fourth 10000             4 seconds                    31 seconds
Fifth  10000             4 seconds                    26 seconds

 

When I look at the explain plan in both databases I see following:

-------------------------------------------------
| Id  | Operation                | Name | Cost  |
-------------------------------------------------
|   0 | INSERT STATEMENT         |      |     1 |
|   1 |  LOAD TABLE CONVENTIONAL |      |       |
-------------------------------------------------

 

My questions:

 

     1) Does the explain plan indicate that Direct Load was not used.

 

     2) If the answer to 1 is Yes, is it possible to use Direct Load or a faster insert method in this case?

 

     3) Any ideas what could be causing the 7.5 to 1 difference between the two databases.

 

Please note that these two databases are non production so load is negligible.


CREATE TABLE TABLE1
(
  TABLE1_ID                VARCHAR2(255)   NOT NULL,
  NUMBER_COLUMN_1          NUMBER,
  NUMBER_COLUMN_2          NUMBER,
  NUMBER_COLUMN_3          NUMBER
);

 

ALTER TABLE TABLE1 ADD CONSTRAINT TABLE1_PK PRIMARY KEY (TABLE1_ID);

 

CREATE INDEX NUMBER_COLUMN_3_IX ON TABLE1(NUMBER_COLUMN_3);

 

CREATE OR REPLACE PACKAGE TABLE1_PKG IS
  --
  ----------------------------------------------------------------------------------------------------------------------------------
  --
  TYPE VARCHAR2_ARRAY      IS TABLE OF VARCHAR2(4000);
  TYPE NUMBER_ARRAY        IS TABLE OF NUMBER;
  TYPE DATE_ARRAY          IS TABLE OF DATE;
  --
  ----------------------------------------------------------------------------------------------------------------------------------
  --
  PROCEDURE Insert_Table1
  (
    Table1_Id_Array_In         TABLE1_PKG.VARCHAR2_ARRAY,
    Number_Column1_Array_In    TABLE1_PKG.NUMBER_ARRAY,
    Number_Column2_In          TABLE1_PKG.NUMBER_ARRAY,
    NUMBER_COLUMN_3_In         NUMBER
  );
  --
  ----------------------------------------------------------------------------------------------------------------------------------
  --
END;
/

CREATE OR REPLACE PACKAGE BODY TABLE1_PKG IS
  --
  ----------------------------------------------------------------------------------------------------------------------------------
  --
  PROCEDURE Insert_Table1
  (
    Table1_Id_Array_In         TABLE1_PKG.VARCHAR2_ARRAY,
    Number_Column1_Array_In    TABLE1_PKG.NUMBER_ARRAY,
    Number_Column2_In          TABLE1_PKG.NUMBER_ARRAY,
    NUMBER_COLUMN_3_In         NUMBER
  )
  IS
  BEGIN
    FORALL I IN 1..Table1_Id_Array_In.Count
      INSERT /*+ APPEND */ INTO TABLE1 (TABLE1_ID            , NUMBER_COLUMN_1           , NUMBER_COLUMN_2     , NUMBER_COLUMN_3   )
      VALUES                           (Table1_Id_Array_In(I), Number_Column1_Array_In(I), Number_Column2_In(I), NUMBER_COLUMN_3_In);
  END Insert_Account_Keys;
  --
  ----------------------------------------------------------------------------------------------------------------------------------
  --
END;
/

 

 

Thanks,

 

Thomas

  • 1. Re: Why do these insert vary so much in performance?
    davidp 2 Pro
    Currently Being Moderated

    1. Yes, direct path insert was not used. You need the APPEND_VALUES hint not the APPEND hint. See Comments

    The APPEND hint is only supported with the subquery syntax of the INSERT statement, not the VALUES clause. If you specify the APPEND hint with the VALUES clause, it is ignored and conventional insert will be used. To use direct-path INSERT with the VALUES clause, refer to "APPEND_VALUES Hint".


    2. APPEND_VALUES will use direct path. The fastest is to find a way to do

    insert /*+ APPEND */ into table1 (...) select ...

    This is fastest even across a network. By breaking into chunks and processing through PL/SQL you have slowed it down a lot. If possible, at least test a single statement in pure SQL. It will need enough


    3. Disk I/O is likely to be the main time. Different memory and disk configurations for the DB's, different numbers of copies of redo logs, different speeds of disk - there's a big difference between the fastest disks and the cheapest, especially with caching in disk controllers. With conventional path, if one database's table has grown and then had lots of rows deleted, while the other table had no allocated blocks at the start, there will be a significant difference because the first one needs to read existing blocks from the disk. If both started with allocated, previously used blocks, one system may have had those blocks cached, either in the DB buffer cache or in the operating system cache. Different indexes on table in the different databases could also make a big difference, but I presume you've excluded that possibility.


     

    Message was edited by: davidp2 in response to Padders Good point by Padders. insert /*+ APPEND */   or insert /*+ APPEND_VALUES */ But you still need APPEND_VALUES if you are doing the PL/SQL FORALL array insert.

  • 2. Re: Why do these insert vary so much in performance?
    padders Pro
    Currently Being Moderated

    > insert /* APPEND */ into table1 (...) select ...


    Append hint is missing '+'.

  • 3. Re: Why do these insert vary so much in performance?
    Thomas Morgan Newbie
    Currently Being Moderated

    Thanks David for that answer.

     

    Indeed indexes are exactly the same between the two environments.

     

    I'll try APPEND_VALUES today and report back.

     

    Unfortunately, I cannot do a straight across insert which would be the fastest because the number of records could be large and have an effect on our production environment, so a decision was made to process in chunks.

     

    If it is due to configuration, do you (or anybody) have a recommendation as to what can be looked at to see if that is the case?

     

    To make thinks more interesting, yesterday, I made the first test of inserts into our production environment and we moved 10000 records and it took 164 seconds to move.  I spoke with our DBAs and he said that it was probably due to load on the system or CPU contention.  Is this possible? A production environment takes 2 minutes and 44 seconds to insert 10000 rows with this method?

     

    Thanks,

     

    Thomas

  • 4. Re: Why do these insert vary so much in performance?
    Thomas Morgan Newbie
    Currently Being Moderated

    It continued to use conventional load even with APPEND_VALUES hint.  Here is the new procedure with it:

     

      PROCEDURE Insert_Table1

      (

        Table1_Id_Array_In        TABLE1_PKG.VARCHAR2_ARRAY,

        Number_Column1_In      NUMBER,

        Number_Column2_In      NUMBER,

        Number_Column3_In     NUMBER

      )

      IS

      BEGIN

          FORALL I IN 1..Table1_Id_Array_In.Count

             INSERT /*+ APPEND_VALUES */ INTO TABLE1(TABLE1_ID, Number_Column1, Number_Column2, Number_Column3) VALUES(Table1_Id_Array_In(I), Number_Column1_In, Number_Column2_In, Number_Column3_In);

      END;

     

    Notice that I made an attempt to also improve performance by not using an array for Number_Column1 and Number_Column2 because they are unique for a given batch of records.  These changes resulted in no improved timing whihc indicates to me the majority of the time is being spent in the insert processing.

     

    The othe interesting thing that is happening is that if I delete the records from the target after an initial insert, any subsequent execution is immediate.  Any explanation for this behavior?

     

    Thanks,

     

    Thomas

  • 5. Re: Why do these insert vary so much in performance?
    Thomas Morgan Newbie
    Currently Being Moderated

    Not sure if following adds any useful information to this conversation:

     

    I have a batch that has 50000 records.

     

    If I do following:

     

      insert /*+ APPEND */ into TABLE1@TARGET_DB_LINK select * from table1 where number_column3 = SOME_NUMBER and rownum < 30000;

     

    This completes in a couple of seconds.  If I go to 40000 it completes in about 10 seconds.  If I go to 50001, it takes approximately 60 seconds.

  • 6. Re: Why do these insert vary so much in performance?
    davidp 2 Pro
    Currently Being Moderated

    "If I delete the records from the target after an initial insert, any subsequent execution is immediate."

    That suggests the relevant data blocks are being cached. That could be the index blocks (which need to be read as well as written). It looks like reading data blocks is a major part of the time, with various amounts being cached before your various queries.

    I'd also look for any triggers on the table - they can slow down inserts a lot and prevent direct path insert (but the work they do can be necessary, although I usually prefer the logic to be in a stored procedure).


    If the statement fires off one scattered read I/O per row, 10,000 rows would be 10,000 I/O's and 100 I/O's per second is a reasonable result for row by row processing - another reason for trying to get insert /*+ APPEND */ working - it can batch up the index updates and merge them in, but the merge in can still require lots of updates.


    Easy way to check for direct path insert: After a true direct path insert you cannot select from the table until you commit


    Consider whether you can disable an index before the load and rebuild it afterwards. Test the load with no indexes and no primary key or unique key constraints, to see how much of the time is being spent on index maintenance.

    Good luck.

  • 7. Re: Why do these insert vary so much in performance?
    rp0428 Guru
    Currently Being Moderated
    Unfortunately, I cannot do a straight across insert which would be the fastest because the number of records could be large and have an effect on our production environment, so a decision was made to process in chunks.

    Please explain that statement.

     

    How can 'number of records could be large' if you are limiting each batch to 10000?

     

    Why can't you do a straight across insert of 10000 for each batch?

     

    Start over and tell us what the REAL PROBLEM is that you are trying to solve.

     

    There is something you aren't telling us. You only post info about 50000 rows but then allude to 'the number of records could be large'.

     

    1. How much data (rows and MB) are you trying to pull over?

    2. How often do you need to pull the data over?

    3. Is it always a truncate-and-load operation? Or do you need to update rows based on changes?

    4. What is the window for pulling the data? Is the target table offline for some period of time while you reload data?

     

    Your thread title only asks about a variation between two DBs for a small number of rows. I suspect your actual business case is something else entirely.

  • 8. Re: Why do these insert vary so much in performance?
    BillyVerreynne Oracle ACE
    Currently Being Moderated

    ThomasMorgan wrote:

     

    Unfortunately, I cannot do a straight across insert which would be the fastest because the number of records could be large and have an effect on our production environment, so a decision was made to process in chunks.

    The decision is not the problem. The asinine implementation is.

     

    Using PL/SQL to pull, push and shunt rows between tables is slow. Period.

     

    Simple as that. So using this slow method to work in chunks, will be slow. Nothing amazing about it. The amazing bit is that some think this approach is fast... or can scale.

     

    The CORRECT method is to use INSERT .. SELECT, and in this case, use direct path inserts.

     

    The chunking part, enabling you to run INSERT .. SELECT on a chunk at a time, is done using DBMS_PARALLEL_EXECUTE. Example in Re: PLSQL Error while using collections dATABASE:10G.

  • 9. Re: Why do these insert vary so much in performance?
    Thomas Morgan Newbie
    Currently Being Moderated

    Thanks for the comments and trust when I say that, for now, for this problem, this is the best solution we can apply (that is of course without giving you any details as to the nature of our app/db, etc).  This new approach will work for us for years and it will be at least a factor of 10 better than what we have now.

     

    The questions still remain:

     

      Any ideas and what can be check to explain why it takes 3 to 6 seconds to insert 10000 rows in our development environment and 164 seconds in our production environment?

     

      Why is Direct Path not being used, even with APPEND_VALUES hint?  What can I check?  I read the documentation and I do not seem to be doing anything that would unqualify the insert for direct load.

     

    Thanks,

     

    Thomas

  • 10. Re: Why do these insert vary so much in performance?
    Thomas Morgan Newbie
    Currently Being Moderated

    Why can't you do a straight across insert of 10000 for each batch? Batches are bigger that 10000 in general.  I am considering rewriting process to do straight across inserts but I am still searching an explanation why it takes so long to insert 10000 rows.  It would seem that in a production quality environment it should not take 2 minutes and 44 seconds for such a small amount of data, for such a small table ( 1 field of 20 characters and three number fields).

     

    1. How much data (rows and MB) are you trying to pull over? 10000 rows take approximatel 0.2 MB for this data structure

     

    2. How often do you need to pull the data over? Not very often, three to five times a week.

     

    3. Is it always a truncate-and-load operation? Or do you need to update rows based on changes? No truncates.  No updates.  Always inserts.

     

    4. What is the window for pulling the data? Is the target table offline for some period of time while you reload data? No windows.  Our OLTP database is always online and this table can be selected from at any moment.

     

    Typically we create batches of 50000 rows but from time to time (maybe once a year) we have a need to create batches with several millions (3 to 5).

     

    Start over and tell us what the REAL PROBLEM is that you are trying to solve: I am trying to move batches that typically have around 50000 rows from one database to another and instead of using current one row at a time I am trying to move the records in collections with the knowledge that at time the number of recrods could be large and possible cause a drag in our production environment so we prefer to chunk the inserts.

     

    Maybe the concern I have for a large set disrupting our production environment is incorrect, what do you think?

  • 11. Re: Why do these insert vary so much in performance?
    rp0428 Guru
    Currently Being Moderated
    Batches are bigger that 10000 in general.

    We can only go by the information you post in the forum.

    I am trying to move batches that typically have around 50000 rows from one database to another and instead of using current one row at a time I am trying to move the records in collections with the knowledge that at time the number of recrods could be large and possible cause a drag in our production environment so we prefer to chunk the inserts.

    No one is objecting to 'chunking' the data. Go ahead and chunk it. But chunk it using INSERTs, not by using PL/SQL and collections.

    Maybe the concern I have for a large set disrupting our production environment is incorrect, what do you think?

    You haven't posted ANY info that supports your assertion. Based only on what you have posted your statement above is just a hypotheses that you need to test. Why would SELECTing data disrupt things?

     

    And you havent' explained this statement at all

    Network was discarded as I recorded

    in the package how long each of the 5 10000 chunk took to insert in each of the two databases as follows:

    How does that justify excluding the network as a possible cause?

     

    In short you need to go back to the basics. If, for the same operation, Oracle acts differently when working with DB1 and DB2 then there is SOMETHING that is different between DB1 and DB2:

     

    1. the network, network topology, distance, bridges, routers, firewalls

    2. architecture - one is VM the other isn't

    3. configuration - total memory, PGA memory (which is where collections are stored), many other config items

    4. other concurrent activity on the server the DB is on

  • 12. Re: Why do these insert vary so much in performance?
    Thomas Morgan Newbie
    Currently Being Moderated

    I found answer for why Direct Path is not used when I do an insert into TABLE1@SOMEDATABASE SELECT....:

     

      http://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9014.htm#i2163698

     

     

     

    Direct-path INSERT is subject to a number of restrictions. If any of these
    restrictions is violated, then Oracle Database executes conventional
    INSERT serially
    without returning any message, unless otherwise noted:

     

     

    • You can have multiple direct-path INSERT
    • Queries that access the same table, partition, or index are allowed before the direct-path INSERT
    • If any serial or parallel statement attempts to access a table that has already been modified by a direct-path INSERT
    • The target table cannot be of a cluster.
    • The target table cannot contain object type columns.
    • Direct-pathINSERT
    • Direct-pathINSERTAPPENDAPPEND_VALUESINSERT
    • The target table cannot have any triggers or referential integrity constraints defined on it.
    • The target table cannot be replicated.
    • A transaction containing a direct-path INSERT

     

    My table is being replicated and I am try it via a distributed transaction.

     

    I am still puzzled as to why it took 2 minutes and 44 seconds to insert 10000 rows in our production database but that's something that I'll investigate if time permits.  For now I've rewritten the process to use Insert...select if the number of records in the batch is less than or equal to a configured (currently set at 400000) number, else it will move using chunck and for now using bulk collect in the source pass arrays of data and FORALL inserts in the target.  If time allows in the future I will try to rewrite to use chunking combinde with insert..select.

     

    Thanks to all for your help,

     

    Thomas

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points