Forum Stats

  • 3,741,431 Users
  • 2,248,428 Discussions
  • 7,861,792 Comments

Discussions

Incremental (or any other way) large INSERT

rade.todorovich
rade.todorovich Member Posts: 1,306
edited Mar 10, 2016 1:31PM in SQL & PL/SQL

Oracle DB 12c

All:

I have two tables. Destination table with around 160M records. Source table with 7M records. I have to insert all records from source to destination table, something like:

INSERT INTO DEST_TABLE (attr1, attr2,....attrn)

SELECT attr1, attr2,....attrn from SRC_TABLE;

I attempted to do this from SQL Developer client, but the process never ended (have opened SR with Oracle).

So I am looking for pl/sql script that would insert SRC table into DEST table, perhaps inserting 100K or so records at the time and doing intermittent commits?

The entire issue with straight out INSERT, that used to work a year ago on the same hardware is kind of a long story and may not be relevant for this thread.

AndrewSayerJonWatmyCloudNimish GargBrunoVroman

Best Answer

  • John Stegeman
    John Stegeman Member Posts: 24,269 Blue Diamond
    edited Mar 8, 2016 1:18PM Accepted Answer

    Rade,

    So I am looking for pl/sql script that would insert SRC table into DEST table, perhaps inserting 100K or so records at the time and doing intermittent commits?
    


    Good way to make things slower and increase the possibility of ORA-01555 errors. Did you do any tracing on your simple bulk insert to see what it was waiting on? Have you looked into a direct path insert (insert /*+ APPEND */) to see if it is appropriate for you?


    John

    rade.todorovich
«1

Answers

  • Paulzip
    Paulzip Member Posts: 8,370 Blue Diamond
    edited Mar 8, 2016 1:25PM

    Here's an approach, but there is no substitute for pure SQL.  There's always a risk that committing during a cursor loop may lead to snapshot too old errors, if other transactions have changed the DEST_TABLE.  Locking DEST_TABLE exclusively may minimize that risk.  Personally, I'd do insert /* +APPEND */ select into and leave it running.

    declare

      type TTableRows is table of DEST_TABLE%ROWTYPE;

      vTableRows TTableRows;

      cursor cur is

        select  attr1, attr2, attrn

        from SRC_TABLE;

      vRows integer := 0;

    begin

      open cur;

      loop

        fetch cur bulk collect into vTableRows limit 500;

        forall i in 1..vTableRows.Count

          insert into DEST_TABLE values vTableRows(i);

        exit when vTableRows.count = 0;

        vRows := vRows + vTableRows.Count;

        if vRows > 100000 then

          commit;

          vRows := 0;

        end if;  

      end loop;

      close cur;

      commit;

    end;

    /

  • Unknown
    edited Mar 8, 2016 1:20PM

    COMMIT inside LOOP is great way to make elapsed time longer & to generate ORA-01555 (Snapshot Too Old) error

    NEVER do in PL/SQL that which can be done in plain SQL.

    post EXPLAIN PLAN for you INSERT

    https://community.oracle.com/message/9362003#9362003

    rade.todorovich
  • John Stegeman
    John Stegeman Member Posts: 24,269 Blue Diamond
    edited Mar 8, 2016 1:18PM Accepted Answer

    Rade,

    So I am looking for pl/sql script that would insert SRC table into DEST table, perhaps inserting 100K or so records at the time and doing intermittent commits?
    


    Good way to make things slower and increase the possibility of ORA-01555 errors. Did you do any tracing on your simple bulk insert to see what it was waiting on? Have you looked into a direct path insert (insert /*+ APPEND */) to see if it is appropriate for you?


    John

    rade.todorovich
  • Tubby
    Tubby Member Posts: 6,987 Bronze Crown
    edited Mar 8, 2016 1:19PM
    rade.todorovich wrote:
    
    Oracle DB 12c
    
    All:
    
    I have two tables. Destination table with around 160M records. Source table with 7M records. I have to insert all records from source to destination table, something like:
    
    INSERT INTO DEST_TABLE (attr1, attr2,....attrn)
    SELECT attr1, attr2,....attrn from SRC_TABLE;
    
    I attempted to do this from SQL Developer client, but the process never ended (have opened SR with Oracle).
    
    So I am looking for pl/sql script that would insert SRC table into DEST table, perhaps inserting 100K or so records at the time and doing intermittent commits?
    
    The entire issue with straight out INSERT, that used to work a year ago on the same hardware is kind of a long story and may not be relevant for this thread.
    

    What is the definition for the 2 tables? Are they partitioned? Indexes? Does this have to happen as an online operation?

    If you're moving 7m rows, what does that mean in terms of bytes, are we talking 100 megabytes or 100 gigabytes?

    I very much doubt you would want to do this via PLSQL. a single well tuned SQL statement to move the data would be your best bet (likely using a direct path operation if possible).

    Cheers,

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 1:51PM

    A straight insert is almost definitely the best option.

    What is the execution plan when you run the statement?

    Have you got access to active session history to see where the wait events are?

    Are there foreign keys on the target table?

    Are there unique indexes on the target table?

    Are there any other indexes on the target table?

    Are there enabled constraints on the target table ?

    Does a session hold a lock against the table?

    Why have you resorted to open an sr straight away? How long were you waiting for the statement to complete? Did you collect any useful diagnostic data then?

    rade.todorovich wrote:
    
    Oracle DB 12c
    
    All:
    
    I have two tables. Destination table with around 160M records. Source table with 7M records. I have to insert all records from source to destination table, something like:
    
    INSERT INTO DEST_TABLE (attr1, attr2,....attrn)
    SELECT attr1, attr2,....attrn from SRC_TABLE;
    
    I attempted to do this from SQL Developer client, but the process never ended (have opened SR with Oracle).
    
    So I am looking for pl/sql script that would insert SRC table into DEST table, perhaps inserting 100K or so records at the time and doing intermittent commits?
    
    The entire issue with straight out INSERT, that used to work a year ago on the same hardware is kind of a long story and may not be relevant for this thread.
    
    rade.todorovich
  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 3:09PM

    I am trying to respond to some of you but the page keeps saying to contact administrator, looks like I cannot upload formatted SQL and/or output

    now it says that my content is blocked due to spam!

  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 3:13PM

    sol beach

    | Id  | Operation                | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
    --------------------------------------------------------------------------------------------------
    |   0 | INSERT STATEMENT         |                       |  7046K|   967M| 74960   (1)| 00:00:03 |
    |   1 |  LOAD TABLE CONVENTIONAL | DRIVER_LICENSE_MASTER |       |       |            |          |
    |   2 |   TABLE ACCESS FULL      | DL15_BASIC            |  7046K|   967M| 74960   (1)| 00:00:03 |
    
  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 3:14PM

    John,

    I am far from DBA so I have to direct path insert. I will update this thread with findings

  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 3:24PM

    Andrew,

    I posted above explain plan if this is what you meant?

    - no foreign keys on the target table

    - there are total of 6 indexes on target table all of them are NONUNIQUE

    - there are no constraints on the target table

    I do not know about session history, waits, nor about locks or how to check for any of these.

    The reason I opened SR is that this particular insert worked well for years on the same hardware. The only change, I think is the upgrade from 11g to 12c DB. I did collect diagnostic data as suggested by oracle engineer.

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 3:24PM
    rade.todorovich wrote:
    
    John,
    
    I am far from DBA so I have to direct path insert. I will update this thread with findings
    

    You aren't using direct path insert in the plan you posted, that is a conventional load insert.

    direct path may speed up the insert but only if the conventional path insert is what's causing the slowness, it could be any number of things and unless you provide the full picture then we can only guess.

  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 3:36PM

    The strangest thing is this, I ran this script while INSERT was running

    select s.sid, s.serial#, p.spid, s.username, s.program,

    t.xidusn, t.used_ublk, t.used_urec, sa.sql_text from

    v$process p,v$session s, v$sqlarea sa, v$transaction t

    where s.paddr=p.addr

    and s.taddr=t.addr

    and s.sql_address=sa.address(+)

    and s.sql_hash_value=sa.hash_value(+)

    order by s.sid;

    and observed that USED_UREC, which is I guess the number of records in transaction (for undo), is incrementing. So I thought it would stop at 7M records which is the source table. Well it continued to 22M when I had to kill the session and initiate rollback.

  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 3:38PM

    Andrew,

    I understand, I answered some of your questions, others I have to learn/read about first

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 3:51PM

    used_urec may not be a simple one per row loaded , its also got to consider the updates to all indexes and probably other things.

    Run the query and observe it using the query you posted but this time include event from v$session, this will tell you the wait event that is currently being executed, if you take a few snapshots over enough time then you should be able to see a pattern emerge. Active session history does this job for you essentially but it is a paid for option so its not worth looking at unless you know you've paid for it.

  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 3:56PM

    So looks like you are implying I should drop indexes and re-create them after INSERT?

  • Unknown
    edited Mar 8, 2016 3:57PM

    We could learn a lot if you did like below

    alter session set sql_trace=true;

    INSERT INTO DEST_TABLE (attr1, attr2,....attrn)

    SELECT attr1, attr2,....attrn from SRC_TABLE;


    The content of the trace file would reveal what what happening (or not happening) as the INSERT was active.


    What is OS name & version?

    What is Oracle version to 4 decimal places?

  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 3:59PM

    Tubby, looks like I answered most of your questions.

    The good thing is that this INSERT does not need to be part of any online (live) op. The big table is used somewhat by an online app, but just for searches, hence I have some indexes created for most common search fields

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 4:00PM
    rade.todorovich wrote:
    
    So looks like you are implying I should drop indexes and re-create them after INSERT?
    

    I am not implying that. I am explaining that it's not just the data blocks in the table that you are affecting with the insert.

  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 4:29PM

    sol beach:

    Linux  2.6.18-402.el5 #1 SMP Thu Jan 8 06:22:34 EST 2015 x86_64 x86_64 x86_64 GNU/Linux

    This is redhat enterprise linux

    Oracle Database 12c Release 12.1.0.1.0 - 64bit Production

    as for the sql trace, I will run it after undo completes (tomorrow likely)

  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 4:45PM

    Thanks Andrew,

    Just read that I should drop or invalidate indexes if I want to use direct path insert

  • Tubby
    Tubby Member Posts: 6,987 Bronze Crown
    edited Mar 8, 2016 5:06PM
    rade.todorovich wrote:
    
    Tubby, looks like I answered most of your questions.
    The good thing is that this INSERT does not need to be part of any online (live) op. The big table is used somewhat by an online app, but just for searches, hence I have some indexes created for most common search fields
    
    
    

    That being the case I would go with the following myself.

    1) drop the indexes on the table

    2) enable parallel DML

    ALTER SESSION ENABLE PARALLEL DML; 

    3) add a APPEND hint to your insert (for direct path, writing over the high watermark of the table)

    4) add a parallel hint to the insert and the select (depends on your machine, maybe give 4 a go and see how that works for you)

    5) create indexes (use PARALLEL to recreate them), here I'd maybe use a higher degree of parallelism than in step #4

    6) alter index <index_name> noparallel; --this is because creating an index in parallel sets the degree of parallelism and you probably don't want that (or probably don't currently have that).

    So for the actual statement you'd have something like

    INSERT --+ append parallel (dest, 4)

    INTO DEST_TABLE dest (attr1, attr2,....attrn)

    SELECT --+ parallel ( src, 4)

    attr1, attr2,....attrn from SRC_TABLE src;

    This assumes the DEST table has no triggers on it, no foreign keys, etc... (all the things you need for direct path operations to work).

    You'd want to benchmark this in a non-production environment to get an idea of how long you are looking at this running for, because you say that the target table is used by an online app for some searches and if you drop all the indexes that means a full table scan is the only option, and you don't want to be frequently full scanning a hundred million row table

    Cheers,

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 5:11PM
    Tubby wrote:
    
    
    rade.todorovich wrote:
    
    Tubby, looks like I answered most of your questions.
    The good thing is that this INSERT does not need to be part of any online (live) op. The big table is used somewhat by an online app, but just for searches, hence I have some indexes created for most common search fields
    
    
    That being the case I would go with the following myself.
    
    1) drop the indexes on the table
    2) enable parallel DML
    ALTER SESSION ENABLE PARALLEL DML; 

    3) add a APPEND hint to your insert (for direct path, writing over the high watermark of the table)

    4) add a parallel hint to the insert and the select (depends on your machine, maybe give 4 a go and see how that works for you)

    5) create indexes (use PARALLEL to recreate them), here I'd maybe use a higher degree of parallelism than in step #4

    6) alter index <index_name> noparallel; --this is because creating an index in parallel sets the degree of parallelism and you probably don't want that (or probably don't currently have that).

    So for the actual statement you'd have something like

    INSERT --+ append parallel (dest, 4)

    INTO DEST_TABLE dest (attr1, attr2,....attrn)

    SELECT --+ parallel ( src, 4)

    attr1, attr2,....attrn from SRC_TABLE src;

    This assumes the DEST table has no triggers on it, no foreign keys, etc... (all the things you need for direct path operations to work).

    Cheers,

    OP should only drop the indexes where the additional resources consumed in recreating them do not outweigh the savings in time during the dml. If the DML is parallelized then Oracle will maintain the indexes in parallel too, this could be more efficient that rebuilding them.

    We haven't seen any evidence that the serial updates to the indexes at the moment are causing any performance issues so it seems like a leap to say drop them and recreate after.

  • Tubby
    Tubby Member Posts: 6,987 Bronze Crown
    edited Mar 8, 2016 5:24PM
    Andrew Sayer wrote:
    OP should only drop the indexes where the additional resources consumed in recreating them do not outweigh the savings in time during the dml. If the DML is parallelized then Oracle will maintain the indexes in parallel too, this could be more efficient that rebuilding them.
    We haven't seen any evidence that the serial updates to the indexes at the moment are causing any performance issues so it seems like a leap to say drop them and recreate after.
    

    It will, but it will also use a lot more resources in doing so.

    http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_modes.htm#SUTIL1329

    "

    By avoiding index maintenance, you minimize the amount of space required during a direct path load, in the following ways:

    • You can build indexes one at a time, reducing the amount of sort (temporary) segment space that would otherwise be needed for each index.
    • Only one index segment exists when an index is built, instead of the three segments that temporarily exist when the new keys are merged into the old index to make the new index.

    Avoiding index maintenance is quite reasonable when the number of rows to be loaded is large compared to the size of the table

    "

    To me, dropping the indexes is the safer and less worrisome proposition.

    Cheers,

    rade.todorovich
  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 5:29PM

    I do not think that creating indexes again will take a lot of time, at least I do not remember that being an issue before. Having said that, I also do not remember that adding 10M to 140M table a year ago was a problem (via conventional insert)

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 5:36PM
    rade.todorovich wrote:
    
    I do not think that creating indexes again will take a lot of time, at least I do not remember that being an issue before. Having said that, I also do not remember that adding 10M to 140M table a year ago was a problem (via conventional insert)
    

    Recreating the indexes will mean scanning the table and sorting it once per index. It could also generate a lot of redo at the same time. You will either lock the table for all dml during the rebuild or use the online clause of create index which could increase the redo generation substantially.

    Remember also that not all indexes will have the same impact on your insert, some will require very little maintenance. Imagine if all your rows have a column with values that fit on the far right hand side of an index, and then imagine the values span all possible values, which would take more time to merge into the tables index?

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 6:07PM

    To illustrate my point, I've done a sample that you should feel free to copy and verify for yourself.

    drop table as_parallel_dml_trg;
    drop table as_parallel_dml_src;
    set timing on
    CREATE TABLE as_parallel_dml_trg
    AS
    SELECT rownum as_id, ao.* FROM all_objects ao cross join (select 1 from dual connect by rownum <=100) many_rows;
    create index index_object_name on  as_parallel_dml_trg (object_name);
    create index index_object_id   on  as_parallel_dml_trg (as_id);
    select max(as_id)  from as_parallel_dml_trg;
    MAX(AS_ID)
    ----------
       8561000
    create table as_parallel_dml_src
    as
    select 8561000+rownum as_id, ao.* from  all_objects ao; alter session force parallel dml;
    insert into as_parallel_dml_trg
    select * from as_parallel_dml_src; 85613 rows created. Elapsed: 00:00:07.57 commit; alter session disable parallel dml; drop table as_parallel_dml_trg;
    drop table as_parallel_dml_src;
    set timing on
    CREATE TABLE as_parallel_dml_trg
    AS
    SELECT rownum as_id, ao.* FROM all_objects ao cross join (select 1 from dual connect by rownum <=100) many_rows;
    create index index_object_name on  as_parallel_dml_trg (object_name);
    create index index_object_id   on  as_parallel_dml_trg (as_id);
    select max(as_id)  from as_parallel_dml_trg;
    MAX(AS_ID)
    ----------
       8561000
    create table as_parallel_dml_src
    as
    select 8561000+rownum as_id, ao.* from  all_objects ao; drop index index_object_id; alter session force parallel dml;
    insert into as_parallel_dml_trg
    select * from as_parallel_dml_src; 85613 rows created. Elapsed: 00:00:03.38 commit;
    create index index_object_id   on  as_parallel_dml_trg (as_id)       PARALLEL;
    Index created. Elapsed: 00:00:11.02 alter  index index_object_id noparallel;
    alter session disable parallel dml; drop table as_parallel_dml_trg;
    drop table as_parallel_dml_src;
    CREATE TABLE as_parallel_dml_trg
    AS
    SELECT rownum as_id, ao.* FROM all_objects ao cross join (select 1 from dual connect by rownum <=100) many_rows;
    create index index_object_name on  as_parallel_dml_trg (object_name);
    create index index_object_id   on  as_parallel_dml_trg (as_id);
    select max(as_id)  from as_parallel_dml_trg;
    MAX(AS_ID)
    ----------
       8561000
    create table as_parallel_dml_src
    as
    select 8561000+rownum as_id, ao.* from  all_objects ao; drop index index_object_name; alter session force parallel dml;
    insert into as_parallel_dml_trg
    select * from as_parallel_dml_src; 85613 rows created. Elapsed: 00:00:01.72 commit;
    create index index_object_name   on  as_parallel_dml_trg (object_name)       PARALLEL; Index created. Elapsed: 00:00:14.96 alter  index index_object_name noparallel;
    alter session disable parallel dml;

    My parallel dml insert with all indexes took 7.57 seconds to complete.

    When I removed one index, the insert took 3.38 seconds to complete but rebuilding the index took 11.02 seconds. Nothing gained, at the cost of not having an index which will probably be wanted by other sessions.

    When I removed the other index, the insert took 1.72 seconds, but this time the index rebuilding took 14.96 seconds. Again nothing gained but a lot risked.

    Obviously your results will differ massively depending on your system, indexes and data distribution. But the point is that you must consider it before you go ahead and drop indexes.

    rade.todorovich
This discussion has been closed.