12 Replies Latest reply on Jan 27, 2016 10:01 AM by sinan_gg

    Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO

    sinan_gg

      Hi All,

       

      I have a question about the CTAS and "Insert /*+ Append */ Into" statements.

       

      Following case, i have a query which i didnt understand difference in running times in EXADATA.

       

      Both of select tables (g02_f01 and g02_f02) dont have any partition. But i could partition tables with hash partition method by "ip_id" column and i tried to execute same query with partition tables. Nothing change in execution durations.

       

      I have executed gather plan statistics for all tables. Both of tables have 13.176.888 records. Both of tables have same "ip_id" unique columns. I want to combine these tables in one table.

       

       

      First Query :

      insert /*+ append parallel(a,16) */ into    dg.tiz_irdm_g02_cc  a

      (ip_id, process_date, ....)

      select /*+ parallel (a,16) parallel (b,16) */ *

      from    tgarstg.tst_irdm_g02_f01 a, 

                tgarstg.tst_irdm_g02_f02 b

      where a.ip_id = b.ip_id


      Elapsed => 45:00 minutes


      Second Query :

      create table dg.tiz_irdm_g02_cc nologging parallel 16 compress for query high as

      select /*+ parallel (a,16) parallel (b,16) */ *

      from    tgarstg.tst_irdm_g02_f01 a , 

                tgarstg.tst_irdm_g02_f02 b 

      where a.ip_id = b.ip_id

       

      Elapsed => 04:00 minutes


      Here is the execution plans:


      1 - Insert Statement Execution Plan:

       

      Plan hash value: 3814019933

       

      ------------------------------------------------------------------------------------------------------------------------------------------

      | Id  | Operation                        | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

      ------------------------------------------------------------------------------------------------------------------------------------------

      |   0 | INSERT STATEMENT                 |                  |    13M|    36G|       |   127K  (1)| 00:00:05 |        |      |            |

      |   1 |  LOAD AS SELECT                  | TIZ_IRDM_G02_CC  |       |       |       |            |          |        |      |            |

      |   2 |   PX COORDINATOR                 |                  |       |       |       |            |          |        |      |            |

      |   3 |    PX SEND QC (RANDOM)           | :TQ10002         |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,02 | P->S | QC (RAND)  |

      |*  4 |     HASH JOIN BUFFERED           |                  |    13M|    36G|   921M|   127K  (1)| 00:00:05 |  Q1,02 | PCWP |            |

      |   5 |      PX RECEIVE                  |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,02 | PCWP |            |

      |   6 |       PX SEND HASH               | :TQ10000         |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | P->P | HASH       |

      |   7 |        PX BLOCK ITERATOR         |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWC |            |

      |   8 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F02 |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWP |            |

      |   9 |      PX RECEIVE                  |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,02 | PCWP |            |

      |  10 |       PX SEND HASH               | :TQ10001         |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | P->P | HASH       |

      |  11 |        PX BLOCK ITERATOR         |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWC |            |

      |  12 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F01 |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWP |            |

      ------------------------------------------------------------------------------------------------------------------------------------------

       

      Predicate Information (identified by operation id):

      ---------------------------------------------------

       

         4 - access("AIRDM_G02_F01"."IP_ID"="AIRDM_G02_F02"."IP_ID")

       

      2 - CTAS Execution Plan:

       

      Plan hash value: 3613570869

       

      ------------------------------------------------------------------------------------------------------------------------------------------

      | Id  | Operation                        | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

      ------------------------------------------------------------------------------------------------------------------------------------------

      |   0 | CREATE TABLE STATEMENT           |                  |    13M|    36G|       |   397K  (1)| 00:00:14 |        |      |            |

      |   1 |  PX COORDINATOR                  |                  |       |       |       |            |          |        |      |            |

      |   2 |   PX SEND QC (RANDOM)            | :TQ10002         |    13M|    36G|       |   255K  (1)| 00:00:09 |  Q1,02 | P->S | QC (RAND)  |

      |   3 |    LOAD AS SELECT                | TIZ_IRDM_G02_CC  |       |       |       |            |          |  Q1,02 | PCWP |            |

      |*  4 |     HASH JOIN                    |                  |    13M|    36G|  1842M|   255K  (1)| 00:00:09 |  Q1,02 | PCWP |            |

      |   5 |      PX RECEIVE                  |                  |    13M|    14G|       | 11465   (5)| 00:00:01 |  Q1,02 | PCWP |            |

      |   6 |       PX SEND HASH               | :TQ10000         |    13M|    14G|       | 11465   (5)| 00:00:01 |  Q1,00 | P->P | HASH       |

      |   7 |        PX BLOCK ITERATOR         |                  |    13M|    14G|       | 11465   (5)| 00:00:01 |  Q1,00 | PCWC |            |

      |   8 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F02 |    13M|    14G|       | 11465   (5)| 00:00:01 |  Q1,00 | PCWP |            |

      |   9 |      PX RECEIVE                  |                  |    13M|    21G|       | 36706   (3)| 00:00:02 |  Q1,02 | PCWP |            |

      |  10 |       PX SEND HASH               | :TQ10001         |    13M|    21G|       | 36706   (3)| 00:00:02 |  Q1,01 | P->P | HASH       |

      |  11 |        PX BLOCK ITERATOR         |                  |    13M|    21G|       | 36706   (3)| 00:00:02 |  Q1,01 | PCWC |            |

      |  12 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F01 |    13M|    21G|       | 36706   (3)| 00:00:02 |  Q1,01 | PCWP |            |

      ------------------------------------------------------------------------------------------------------------------------------------------

       

      Predicate Information (identified by operation id):

      ---------------------------------------------------

       

         4 - access("AIRDM_G02_F01"."IP_ID"="AIRDM_G02_F02"."IP_ID")

       

       

      Oracle Version :

       

      Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

      PL/SQL Release 11.2.0.4.0 - Production

      CORE 11.2.0.4.0 Production

      TNS for Linux: Version 11.2.0.4.0 - Production

      NLSRTL Version 11.2.0.4.0 - Production

        • 1. Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
          Jonathan Lewis

          CTAS is DDL, which is parallel-enabled by default

          Insert /*+ append */ is DML which is not parallel-enabled by default you need to: alter session enable parallel dml.

           

          So the CTAS creates multiple data segments in parallel and then tells the co-ordinator to link them together as a single segment.

          The Insert selects all the data in parallel then sends it all to the co-ordinator which creates a single segment from the entire data set all by itself.

           

          Regards

          Jonathan Lewis

           

          PS You can see the difference in the plans - the CTAS does "load as select at operation 4" then informs the query coordinator at operation 2; the insert plan sends the data to the coordinator (operations 3->2) which does the load as select at operation 1.

          • 2. Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
            Nimish Garg

            Here is a very good article on how append hint affect performance

            https://oracle-base.com/articles/misc/append-hint

             

            I my experience I have noticed CTAS perform better than insert with append index on blank table

            • 3. Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
              sinan_gg

              Ok i get it, but is there any different advice how can i providing better performance by using insert /*+ append */ into statement.

               

              Of course, for parallel session query, i executed "alter session enable parallel dml"

               

              In my opinion, 45 min is not good performance in exadata. 

              • 4. Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
                JuanM

                Hi,

                In your INSERT scenario,

                Did you create the table with NOLOGGING option?

                Because, in the INSERT statement you have not specified the ... APPEND PARALLEL(a,16) NOLOGGING

                Maybe here is the time difference.

                 

                Regards,

                Juan M

                • 5. Re: Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
                  Etbin

                  Did you try


                  insert /*+ append */ into dg.tiz_irdm_g02_cc

                  select *

                    from tgarstg.tst_irdm_g02_f01 a,

                         tgarstg.tst_irdm_g02_f02 b

                  where a.ip_id = b.ip_id


                  Regards


                  Etbin

                  • 6. Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
                    Jonathan Lewis

                    Why do you say "of course" ?

                     

                    According to the execution plan you supplied you did not enable parallel DML.

                     

                    This has two effects - the serialisation of the insert, and the buffering on the hash join.  On an exadata this much data and parallel 16 the buffering probably didn't flood down to disc and back again, but maybe it did, and maybe that had some effect. Check that you really had enabled parallel DML, and pull the execution plan from memory to check.

                     

                    If that's really not the problem then the next step would be to use SQL Monitoring to find out where the time went.  (There are a few comments about the procedure and reading the output in this blog note https://jonathanlewis.wordpress.com/2015/12/21/parallel-plans-2/ if you can't run up the OEM screens).

                     

                    Regards

                    Jonathan Lewis

                    • 7. Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
                      sinan_gg

                      @Juan, i created table with nologging option.

                       

                      @Etbin, i executed query without parallel hint, but nothing change by duration

                       

                      @Jonathan, you are right, after i did enable parallel dml, execution plan has changed, but execution time didnot change.

                       

                      PLAN_TABLE_OUTPUT

                       

                       

                      Plan hash value: 434397297

                       

                      --------------------------------------------------------------------------------------------------------------------------------------------

                      | Id  | Operation                          | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

                      --------------------------------------------------------------------------------------------------------------------------------------------

                      |   0 | INSERT STATEMENT                   |                  |    13M|    36G|       |   127K  (1)| 00:00:05 |        |      |            |

                      |   1 |  PX COORDINATOR                    |                  |       |       |       |            |          |        |      |            |

                      |   2 |   PX SEND QC (RANDOM)              | :TQ10003         |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,03 | P->S | QC (RAND)  |

                      |   3 |    LOAD AS SELECT                  | TIZ_IRDM_G02_CC  |       |       |       |            |          |  Q1,03 | PCWP |            |

                      |   4 |     PX RECEIVE                     |                  |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,03 | PCWP |            |

                      |   5 |      PX SEND RANDOM LOCAL          | :TQ10002         |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,02 | P->P | RANDOM LOCA|

                      |*  6 |       HASH JOIN BUFFERED           |                  |    13M|    36G|   921M|   127K  (1)| 00:00:05 |  Q1,02 | PCWP |            |

                      |   7 |        PX RECEIVE                  |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,02 | PCWP |            |

                      |   8 |         PX SEND HASH               | :TQ10000         |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | P->P | HASH       |

                      |   9 |          PX BLOCK ITERATOR         |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWC |            |

                      |  10 |           TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F02 |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWP |            |

                      |  11 |        PX RECEIVE                  |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,02 | PCWP |            |

                      |  12 |         PX SEND HASH               | :TQ10001         |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | P->P | HASH       |

                      |  13 |          PX BLOCK ITERATOR         |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWC |            |

                      |  14 |           TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F01 |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWP |            |

                      --------------------------------------------------------------------------------------------------------------------------------------------

                       

                      Predicate Information (identified by operation id):

                      ---------------------------------------------------

                       

                         6 - access("AIRDM_G02_F01"."IP_ID"="AIRDM_G02_F02"."IP_ID")

                      • 8. Re: Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
                        Jonathan Lewis

                        The question now (apart from checking where the time goes with SQL Monitoring) is why you have what appears to be a completely redundant distribution before the load:

                         

                        |  4 |    PX RECEIVE                      |                 |    13M|    36G|      |  127K  (1)| 00:00:05 |  Q1,03 | PCWP |            |

                        |  5 |      PX SEND RANDOM LOCAL          | :TQ10002        |    13M|    36G|      |  127K  (1)| 00:00:05 |  Q1,02 | P->P | RANDOM LOCA|

                         

                        These push the hash join into buffering - so perhaps there is an element of time lost to that as well as the messaging that's going on.

                        Are these still simple heap tables or have you partitioned the target table - if so, how ?

                        Is there anything in your select list other than a simple list of columns ? (e.g. a call to a pl/sql function).

                         

                        Regards

                        Jonathan Lewis

                         

                        Update: someone said something about nologging - are there any differences between the CTAS and the pre-defined table ? e.g. nologging, compression level, indexing ?

                        • 9. Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
                          Billy~Verreynne

                          sinan_gg wrote:

                           

                          In my opinion, 45 min is not good performance in exadata.

                          In my opinion, driving a supercar poorly reflects on the driver - not the supercar.

                          • 10. Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
                            sinan_gg

                            Hi Jonathan,

                             

                            My target table have interval partitioning by using process_date column (daily partition) as you can see following. And target table have no index, i think, index is not good idea for exadata tables.  And my select list has only simple list of columns.

                             

                            PARTITION BY RANGE (PROCESS_DATE)

                            INTERVAL( NUMTODSINTERVAL(1,'DAY'))

                              PARTITION VALUES LESS THAN (TO_DATE(' 2015-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))

                                NOLOGGING

                                COMPRESS FOR QUERY HIGH

                            )   

                             

                            CTAS table is not partitioned, and it has only "create table dg.tiz_irdm_g02_cc nologging parallel 16 compress for query high as" these properties.

                             

                            Now, i tried to insert non-partitioned table with same query. it was great execution time : 07:00 min

                             

                            This is the execution plan for non-partition table insert.

                             

                            PLAN_TABLE_OUTPUT

                            Plan hash value: 3613570869

                            ------------------------------------------------------------------------------------------------------------------------------------------

                            | Id  | Operation                        | Name             | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |

                            ------------------------------------------------------------------------------------------------------------------------------------------

                             

                            |   0 | INSERT STATEMENT                 |                  |    13M|    36G|       |   127K  (1)| 00:00:05 |        |      |            |

                            |   1 |  PX COORDINATOR                  |                  |       |       |       |            |          |        |      |            |

                            |   2 |   PX SEND QC (RANDOM)            | :TQ10002         |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,02 | P->S | QC (RAND)  |

                            |   3 |    LOAD AS SELECT                | SS_IRDM_G02      |       |       |       |            |          |  Q1,02 | PCWP |            |

                            |*  4 |     HASH JOIN                    |                  |    13M|    36G|   921M|   127K  (1)| 00:00:05 |  Q1,02 | PCWP |            |

                            |   5 |      PX RECEIVE                  |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,02 | PCWP |            |

                            |   6 |       PX SEND HASH               | :TQ10000         |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | P->P | HASH       |

                            |   7 |        PX BLOCK ITERATOR         |                  |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWC |            |

                            |   8 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F02 |    13M|    14G|       |  5732   (5)| 00:00:01 |  Q1,00 | PCWP |            |

                            |   9 |      PX RECEIVE                  |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,02 | PCWP |            |

                            |  10 |       PX SEND HASH               | :TQ10001         |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | P->P | HASH       |

                            |  11 |        PX BLOCK ITERATOR         |                  |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWC |            |

                            |  12 |         TABLE ACCESS STORAGE FULL| TST_IRDM_G02_F01 |    13M|    21G|       | 18353   (3)| 00:00:01 |  Q1,01 | PCWP |            |

                            ------------------------------------------------------------------------------------------------------------------------------------------

                             

                            Predicate Information (identified by operation id):

                            ---------------------------------------------------

                             

                               4 - access("AIRDM_G02_F01"."IP_ID"="AIRDM_G02_F02"."IP_ID")

                            • 11. Re: Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
                              Jonathan Lewis

                              Notice how that extra distribution has disappeared for the non-partitioned table.

                               

                              I think that with the partitioned table Oracle was trying to balance the number of slaves against the number of partitions it was expecting to use and decided to distribute the data to get a "fair share" workload, but hadn't allowed for the side effects of the buffered hash join that had to appear and the extra messaging for the distribution.

                               

                              You could try the pq_distribute() hint for the insert to tell Oracle that it should not disrtibute like this. e.g based on your original code:

                               

                              insert /*+ append parallel(a,16) pq_distribute(a none) */ into    dg.tiz_irdm_g02_cc a ...

                               

                              This may give you the performance you want with the partitioned table, but do check what it does to the space allocation as it may introduce a large number (16) of extents per segment that aren't fully populated and therefore be rather wasteful of space.

                               

                              Regards

                              Jonathan Lewis

                              • 12. Re: Performance Difference Between CTAS and INSERT /*+APPEND*/ INTO
                                sinan_gg

                                You are completely right.

                                 

                                i tried pq_distribute(a none) hint with partitioned_table, execution plan has changed according to your opinion. New exection time : 07:00 min

                                 

                                I learned too much thing from you. Thanks very much,

                                 

                                |   0 | INSERT STATEMENT                 |                  |    13M|    36G|       |   127K  (1)| 00:00:05 |        |      |            |

                                |   1 |  PX COORDINATOR                  |                  |       |       |       |            |          |        |      |            |

                                |   2 |   PX SEND QC (RANDOM)            | :TQ10002         |    13M|    36G|       |   127K  (1)| 00:00:05 |  Q1,02 | P->S | QC (RAND)  |

                                |   3 |    LOAD AS SELECT                | TIZ_IRDM_G02_CC  |       |       |       |            |          |  Q1,02 | PCWP |            |

                                |*  4 |     HASH JOIN                    |                  |    13M|    36G|   921M|   127K  (1)| 00:00:05 |  Q1,02 | PCWP |            |