Forum Stats

  • 3,767,751 Users
  • 2,252,713 Discussions
  • 7,874,327 Comments

Discussions

Partitioning

187124
187124 Member Posts: 12
edited Sep 10, 2008 2:53PM in General Database Discussions
Hi, I need to know if I can speed up a massive insert statement like "insert into table t1 select * from tmp" partitioning the target table in the following way:
- The target table has more than 100 MM of record
- The partitioning criteria will be by load_date (Monthy)
- Every insert statement affect only one partition because will be referencing one load_date.


Thanks!
Tagged:

Answers

  • 591186
    591186 Member Posts: 3,668 Silver Trophy
    Please let us know the Oracle Database version to provide you more information.
  • JustinCave
    JustinCave Member Posts: 30,293 Gold Crown
    There would probably be benefits if

    - You use local indexes (not strictly necessary, but ideal from a performance standpoint)
    - You load data into a staging table with the same structure as the partitioned table
    - You build the indexes on the staging table
    - You do a partition exchange with the partitioned table to load the staging table

    Justin
  • Robert Geier
    Robert Geier Member Posts: 2,989
    edited Sep 9, 2008 10:07AM
    Sorry, are you planning to partition the source table (select), or the destination table (insert) ?

    Partitioning the source table should speed up the select by making it easier to use parallel processes, or specifically access data from a particular month.

    Edited by: Robert Geier on Sep 9, 2008 3:06 PM
  • 187124
    187124 Member Posts: 12
    I want to partitioning the destination table, all the records in the source table will be load..

    The version is 10G

    Thanks
  • Robert Geier
    Robert Geier Member Posts: 2,989
    If you are partitioning the destination table, then Justin's suggestion is probably the best. Instead of insert into select from just use exchange partition.
  • Randolf Geist
    Randolf Geist Member Posts: 2,214 Silver Trophy
    edited Sep 9, 2008 11:43AM
    Is your question if you can speed up the insert by partitioning the target table rather than inserting the same amount into an unpartitioned table?

    Are you talking about a single insert or multiple inserts? I assume multiple inserts according to your "Every insert statement..." information.

    * Does your target table already contain data?
    * Has this data been manipulated by updates/deletes?
    * Do you have existing indexes on the target table?

    I don't think that you'll be able to save significant amount of time regarding the insert per se, but of course this amount of data can be maintained in a better way if you use a reasonable partitioning.

    As already mentioned, you could load a partitioned table partition-wise using the exchange partition feature, but if the table is not in use yet, you can just insert directly into the target partition and create/rebuild indexes afterwards as required.

    Apart from the partitioning question, you can always speed up inserts using the following options:

    * Use direct-path inserts to avoid undo generation (/*+ APPEND */ hint, but consider the restrictions of direct-path inserts, e.g. no enabled triggers, foreign key constraints etc.) on target table. Note however that direct-path inserts always use blocks above the current high-water mark and therefore don't reuse existing blocks, i.e. if you have a lot of used but rather empty blocks in your table it will be larger after the direct-path insert as necessary, because the empty blocks have not been re-used.
    * Use parallel query (mark source object as parallel or use /*+ PARALLEL (<table>) */ hint) on the source object
    * Use parallel dml inserts (alter session enable parallel dml, mark target object as parallel or use /*+ PARALLEL (<table>) */ hint in insert) if you have suitable hardware and license, in combination with parallel query on source object
    * Set the target object to NOLOGGING to avoid redo generation, but consider the implications regarding recovery or standby databases
    * Set indexes to unusable prior to loading or drop and recreate/rebuild them accordingly

    Note that some of these options mentioned are kind of "offline" options as you don't want to apply them to a table that is currently being used due to locking and/or index usage issues.

    The "exchange partition" option together with "local" indexes is quite useful if you want to manipulate partitions that are currently being queried by other (read-only) processes, because it allows "seamless" transition from old data to new data from a reader's perspective (processes will continue to read the old data after exchanging the partition, new processes will read the new data).

    Regards,
    Randolf

    Oracle related stuff blog:
    http://oracle-randolf.blogspot.com/

    SQLTools++ for Oracle (Open source Oracle GUI for Windows):
    http://www.sqltools-plusplus.org:7676/
    http://sourceforge.net/projects/sqlt-pp/
  • 635471
    635471 Member Posts: 1,937
    Justin Cave wrote:
    There would probably be benefits if

    - You use local indexes (not strictly necessary, but ideal from a performance standpoint)
    - You load data into a staging table with the same structure as the partitioned table
    - You build the indexes on the staging table
    - You do a partition exchange with the partitioned table to load the staging table

    Justin
    I would consider benchmarking two cases here: direct path loading into an unindexed table and then rebuilding/creating the indexes, versus direct path inserting into the table with enabled indexes. You may find that Oracle's mechanism for optimising index maintenance during a direct path load by logging the required data into a separate segment is beneficial.
  • JustinCave
    JustinCave Member Posts: 30,293 Gold Crown
    You would consider a direct path load into a staging table with indexes enabled before doing the partition exchange? Or doing a direct path load into the partitioned table itself without doing the partition exchange?

    It sounds like you are suggesting the first (direct path load into a staging table with indexes enabled before doing a partition exchange). Have you done benchmarks on this sort of processing? I'd be hard-pressed to understand why a direct-path load would be better at building an index on a previously empty table than just building the index after the load. But I'd be very interested if you're aware of some mechanism I'm not...

    Justin
  • 635471
    635471 Member Posts: 1,937
    edited Sep 10, 2008 2:53PM
    Justin Cave wrote:
    You would consider a direct path load into a staging table with indexes enabled before doing the partition exchange? Or doing a direct path load into the partitioned table itself without doing the partition exchange?
    Well, either one I guess. I'm more of a partition-exchange person because it's a heck of a lot easier to back out the load in case of a problem. But with local indexes the two approaches are very similar.
    Justin Cave wrote:
    It sounds like you are suggesting the first (direct path load into a staging table with indexes enabled before doing a partition exchange). Have you done benchmarks on this sort of processing? I'd be hard-pressed to understand why a direct-path load would be better at building an index on a previously empty table than just building the index after the load. But I'd be very interested if you're aware of some mechanism I'm not...

    Justin
    There's a documented mechanism (in the Utilities Guide under SQL*Loader I think) that I ran some informal tests for and Jonathan Lewis also mentioned it, but I don't have anything clean or definitive enough to publish. The mechanism is that when a direct path serial load inserts into a table with indexes that need to be maintained, it also logs the required information for building the indexes in a temporary segment.

    So to take an extreme case, let's say that you are loading an empty table of 30 columns with a total of 1,000,000 rows, such that the average row length will be 100 bytes. That gives you a data segment in the order of 100 MB. If you want to build two indexes on that table, each on a column having an average column length of 4 bytes, then you need to full scan the table twice, reading 200MB to do so.

    However, if Oracle can store the required columns plus the rowid pseudocolumn for every row in a separate temporary segment then that segment can contain all the required information to build the indexes. The table has to store 2x4 + 16 bytes per row, so that's around 24MB to hold the data and the indexes can be built be scanning around 50MB instead of 200MB.

    This doesn't hapen with parallel direct path load.

    The tests I did seemed to support this in terms of timings, but there's obviously a great many issues that can affect this: compression in the data segment, bytes being indexed, pctfree of the table being loaded etc., and it's entirely possible for this feature to slow down the index build. Hence, I'd just benchmark the two approaches and use common sense -- if you're loading to a compressed table and indexing nearly every column then it's unlikely to be helpful.


    Edit: Here's the documentation reference: http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/ldr_modes.htm#sthref1475

    +"You can improve performance of direct path loads by using temporary storage. After each block is formatted, the new index keys are put in a sort (temporary) segment. The old index and the new keys are merged at load finish time to create the new index. The old index, sort (temporary) segment, and new index segment all require storage until the merge is complete. Then the old index and temporary segment are removed."+

    Edited by: David_Aldridge on Sep 10, 2008 11:51 AM
This discussion has been closed.