Forum Stats

  • 3,750,088 Users
  • 2,250,105 Discussions
  • 7,866,774 Comments

Discussions

Incremental (or any other way) large INSERT

13

Answers

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 5:11PM
    Tubby wrote:
    
    
    rade.todorovich wrote:
    
    Tubby, looks like I answered most of your questions.
    The good thing is that this INSERT does not need to be part of any online (live) op. The big table is used somewhat by an online app, but just for searches, hence I have some indexes created for most common search fields
    
    
    That being the case I would go with the following myself.
    
    1) drop the indexes on the table
    2) enable parallel DML
    ALTER SESSION ENABLE PARALLEL DML; 

    3) add a APPEND hint to your insert (for direct path, writing over the high watermark of the table)

    4) add a parallel hint to the insert and the select (depends on your machine, maybe give 4 a go and see how that works for you)

    5) create indexes (use PARALLEL to recreate them), here I'd maybe use a higher degree of parallelism than in step #4

    6) alter index <index_name> noparallel; --this is because creating an index in parallel sets the degree of parallelism and you probably don't want that (or probably don't currently have that).

    So for the actual statement you'd have something like

    INSERT --+ append parallel (dest, 4)

    INTO DEST_TABLE dest (attr1, attr2,....attrn)

    SELECT --+ parallel ( src, 4)

    attr1, attr2,....attrn from SRC_TABLE src;

    This assumes the DEST table has no triggers on it, no foreign keys, etc... (all the things you need for direct path operations to work).

    Cheers,

    OP should only drop the indexes where the additional resources consumed in recreating them do not outweigh the savings in time during the dml. If the DML is parallelized then Oracle will maintain the indexes in parallel too, this could be more efficient that rebuilding them.

    We haven't seen any evidence that the serial updates to the indexes at the moment are causing any performance issues so it seems like a leap to say drop them and recreate after.

  • Tubby
    Tubby Member Posts: 6,987 Bronze Crown
    edited Mar 8, 2016 5:24PM
    Andrew Sayer wrote:
    OP should only drop the indexes where the additional resources consumed in recreating them do not outweigh the savings in time during the dml. If the DML is parallelized then Oracle will maintain the indexes in parallel too, this could be more efficient that rebuilding them.
    We haven't seen any evidence that the serial updates to the indexes at the moment are causing any performance issues so it seems like a leap to say drop them and recreate after.
    

    It will, but it will also use a lot more resources in doing so.

    http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_modes.htm#SUTIL1329

    "

    By avoiding index maintenance, you minimize the amount of space required during a direct path load, in the following ways:

    • You can build indexes one at a time, reducing the amount of sort (temporary) segment space that would otherwise be needed for each index.
    • Only one index segment exists when an index is built, instead of the three segments that temporarily exist when the new keys are merged into the old index to make the new index.

    Avoiding index maintenance is quite reasonable when the number of rows to be loaded is large compared to the size of the table

    "

    To me, dropping the indexes is the safer and less worrisome proposition.

    Cheers,

    rade.todorovich
  • rade.todorovich
    rade.todorovich Member Posts: 1,306
    edited Mar 8, 2016 5:29PM

    I do not think that creating indexes again will take a lot of time, at least I do not remember that being an issue before. Having said that, I also do not remember that adding 10M to 140M table a year ago was a problem (via conventional insert)

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 5:36PM
    rade.todorovich wrote:
    
    I do not think that creating indexes again will take a lot of time, at least I do not remember that being an issue before. Having said that, I also do not remember that adding 10M to 140M table a year ago was a problem (via conventional insert)
    

    Recreating the indexes will mean scanning the table and sorting it once per index. It could also generate a lot of redo at the same time. You will either lock the table for all dml during the rebuild or use the online clause of create index which could increase the redo generation substantially.

    Remember also that not all indexes will have the same impact on your insert, some will require very little maintenance. Imagine if all your rows have a column with values that fit on the far right hand side of an index, and then imagine the values span all possible values, which would take more time to merge into the tables index?

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 8, 2016 6:07PM

    To illustrate my point, I've done a sample that you should feel free to copy and verify for yourself.

    drop table as_parallel_dml_trg;
    drop table as_parallel_dml_src;
    set timing on
    CREATE TABLE as_parallel_dml_trg
    AS
    SELECT rownum as_id, ao.* FROM all_objects ao cross join (select 1 from dual connect by rownum <=100) many_rows;
    create index index_object_name on  as_parallel_dml_trg (object_name);
    create index index_object_id   on  as_parallel_dml_trg (as_id);
    select max(as_id)  from as_parallel_dml_trg;
    MAX(AS_ID)
    ----------
       8561000
    create table as_parallel_dml_src
    as
    select 8561000+rownum as_id, ao.* from  all_objects ao; alter session force parallel dml;
    insert into as_parallel_dml_trg
    select * from as_parallel_dml_src; 85613 rows created. Elapsed: 00:00:07.57 commit; alter session disable parallel dml; drop table as_parallel_dml_trg;
    drop table as_parallel_dml_src;
    set timing on
    CREATE TABLE as_parallel_dml_trg
    AS
    SELECT rownum as_id, ao.* FROM all_objects ao cross join (select 1 from dual connect by rownum <=100) many_rows;
    create index index_object_name on  as_parallel_dml_trg (object_name);
    create index index_object_id   on  as_parallel_dml_trg (as_id);
    select max(as_id)  from as_parallel_dml_trg;
    MAX(AS_ID)
    ----------
       8561000
    create table as_parallel_dml_src
    as
    select 8561000+rownum as_id, ao.* from  all_objects ao; drop index index_object_id; alter session force parallel dml;
    insert into as_parallel_dml_trg
    select * from as_parallel_dml_src; 85613 rows created. Elapsed: 00:00:03.38 commit;
    create index index_object_id   on  as_parallel_dml_trg (as_id)       PARALLEL;
    Index created. Elapsed: 00:00:11.02 alter  index index_object_id noparallel;
    alter session disable parallel dml; drop table as_parallel_dml_trg;
    drop table as_parallel_dml_src;
    CREATE TABLE as_parallel_dml_trg
    AS
    SELECT rownum as_id, ao.* FROM all_objects ao cross join (select 1 from dual connect by rownum <=100) many_rows;
    create index index_object_name on  as_parallel_dml_trg (object_name);
    create index index_object_id   on  as_parallel_dml_trg (as_id);
    select max(as_id)  from as_parallel_dml_trg;
    MAX(AS_ID)
    ----------
       8561000
    create table as_parallel_dml_src
    as
    select 8561000+rownum as_id, ao.* from  all_objects ao; drop index index_object_name; alter session force parallel dml;
    insert into as_parallel_dml_trg
    select * from as_parallel_dml_src; 85613 rows created. Elapsed: 00:00:01.72 commit;
    create index index_object_name   on  as_parallel_dml_trg (object_name)       PARALLEL; Index created. Elapsed: 00:00:14.96 alter  index index_object_name noparallel;
    alter session disable parallel dml;

    My parallel dml insert with all indexes took 7.57 seconds to complete.

    When I removed one index, the insert took 3.38 seconds to complete but rebuilding the index took 11.02 seconds. Nothing gained, at the cost of not having an index which will probably be wanted by other sessions.

    When I removed the other index, the insert took 1.72 seconds, but this time the index rebuilding took 14.96 seconds. Again nothing gained but a lot risked.

    Obviously your results will differ massively depending on your system, indexes and data distribution. But the point is that you must consider it before you go ahead and drop indexes.

    rade.todorovich
  • Unknown
    edited Mar 8, 2016 6:15PM
    It will, but it will also use a lot more resources in doing so.
    
    http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_modes.htm#SUTIL1329
    "
    By avoiding index maintenance, you minimize the amount of space required during a direct path load, in the following ways:
    
    You can build indexes one at a time, reducing the amount of sort (temporary) segment space that would otherwise be needed for each index.
    Only one index segment exists when an index is built, instead of the three segments that temporarily exist when the new keys are merged into the old index to make the new index.
    
    Avoiding index maintenance is quite reasonable when the number of rows to be loaded is large compared to the size of the table
    "
    To me, dropping the indexes is the safer and less worrisome proposition.
    

    There is nothing 'unsafe' or 'worrisome' in doing the index maintenance rather than dropping them.

    That doc quote ONLY refers to the extra space that is needed.

    Data added by direct path is NOT accessible by existing users or even the user adding the data until it is committed.

    Indexes are NOT rebuilt until AFTER all data has been loaded to the table.

    Oracle can then use one new segment to create mini-indexes for the new data. Then the merge of the old index segment/data and the new is done to that third segment in a way that allows users to continue to access the indexes while Oracle works with them.

    And for an offline operation it is often MUCH faster to do the maintenance rather than drop and recreate multiple indexes. That is especially true for large tables. The maintenance only needs to sort the small amount of new data and then do a merge whereas the drop/rebuilt has to deal with ALL of the data for ALL of the indexes.

  • Tubby
    Tubby Member Posts: 6,987 Bronze Crown
    edited Mar 8, 2016 6:54PM
    rp0428 wrote:
    There is nothing 'unsafe' or 'worrisome' in doing the index maintenance rather than dropping them. 
    

    Very much not true. We aren't there to see what's happening, we don't know very much about the OP's system. Based on what we don't know VS what we do know I'd feel a lot warmer inside recommending the OP drop the indexes and recreate them.

    I never said it was unsafe or worrisome from a technical standpoint IF you know everything you would need to know in order to make that assessment. If you're comfortable that you know everything you would need to know based on the contents of this thread then you're certainly welcome to make your own recommendations to the OP.

    Cheers,

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Mar 9, 2016 3:01AM
    Tubby wrote:
    
    
    rp0428 wrote:
    There is nothing 'unsafe' or 'worrisome' in doing the index maintenance rather than dropping them. 
    
    
    Very much not true. We aren't there to see what's happening, we don't know very much about the OP's system. Based on what we don't know VS what we do know I'd feel a lot warmer inside recommending the OP drop the indexes and recreate them.
    
    I never said it was unsafe or worrisome from a technical standpoint IF you know everything you would need to know in order to make that assessment. If you're comfortable that you know everything you would need to know based on the contents of this thread then you're certainly welcome to make your own recommendations to the OP.
    
    Cheers, 
    

    Exactly, we don't know what's going on, we don't know if index maintenance is actually playing a big deal or if it would play a big deal if the dml was parallelized.    

    Dropping an index is not safe, dropping all indexes on a table seems reckless. What happens when another session tries to query the table? It'll have to do a full table scan, what if many sessions try to query the table ? A lot of full tables scans and this is while the parallel insert is taking up a lot of resources as it is. What if one of the indexes is unique? You now have the possibility of bad data being inserted which would take manual work to fix.

    What if the time taking to rebuild an index is longer than the time it takes to insert the data keeping the index? My demo previously confirms it's possible .

  • Chris Hunt
    Chris Hunt Member Posts: 2,066 Gold Trophy
    edited Mar 9, 2016 6:11AM
    Avoiding index maintenance is quite reasonable when the number of rows to be loaded is large compared to the size of the table
    

    In this case we're adding 7M rows to a 160M row table, so the number of rows to be added is less that 5% of the size of the table. Dropping and recreating the indexes is unlikely to help here.

    I also wouldn't entirely dismiss the idea of doing the insert in a number of stages. It's true that the overall time taken will be longer if you do it that way, but if your resources in terms of redo/rollback space are limited, and/or you only have a certain time window in which to do the transfer, it can still be a valid approach to take. That is assuming that the data in the source table is not being changed between iterations, and that there's a good indexed/partitioned column to use to identify the records to copy each time around.

    I think I'd try an insert append first, though.

    rade.todorovich
This discussion has been closed.