8 Replies Latest reply: Feb 18, 2013 10:32 PM by Billy~Verreynne RSS

    Parallel DML inserting duplicate rows

    malik_Rahul
      Hi Folks,


      I recently came to know about the parallel DML to boost the batch process. I have use the below query


      Below is code I have written to copy the data of t2 into t1.
      alter table t1 parallel 10;
      
      alter table t1 nologging;
      
      alter session enable parallel dml;
      
      insert into t1
      select /*+full(x) parallel(x,20) */  * from t2 x;
      There is unique index on a column of t2.

      But above query duplicating the rows which are inserted in t2 after the start of query .

      say when the query is running I executed the below code. There were no record with value 12. as there is unique index
        insert into  t2 values (12,'test');
      afte the query complete
      select * from t1 where row_id=12;
      
      12,test
      12,test
      I am using 10 g


      Regards

      Rahul Malik

      Edited by: malik_Rahul on Feb 17, 2013 5:40 AM
        • 1. Re: Parallel DML inserting duplicate rows
          Etbin
          The text and the code in your post are contradicting each other.
          If you must use parallel dml (use it as the last resort is the usual advice) let Oracle choose the degree of parallelism
          alter table t1 parallel;
          
          alter table t2 parallel;
          
          alter session enable parallel dml;
          
          insert /*+ append */
            into t1
          select *
            from t2;
          seems to be all you need.
          I have no Database at hand to try to reproduce your findings and you didn't even specify your Database version.
          You're not trying to add a single row in parallel, don't you ?

          Regards

          Etbin
          • 2. Re: Parallel DML inserting duplicate rows
            user503635
            Based on your description, first you run below DML
            insert into t1
            select /*+full(x) parallel(x,20) */  * from t2 x;
            while the above is running, you run below from another session
            insert into  t2 values (12,'test');
            Then you get duplicate rows of rowid = 12 in t1, is my understanding as above is correct ? If yes, I will suggest you check all your steps, any mistakes occurred or missing steps?

            To my understanding you will never get duplicate records in t1 for the above scenario in Oracle database because of data read consistency. When you run the first SQL, it gets all records of t2 at that point of time. The row ID 12 in t2 inserted later will not be included even 2nd SQL is finished before the 1st SQL. Further, if you never commit the second insert SQL. Even if you run the 1st insert SQL twice, row ID 12 will still not be available.

            Edited by: user503635 on Feb 17, 2013 5:51 AM
            • 3. Re: Parallel DML inserting duplicate rows
              Billy~Verreynne
              It is not possible for Oracle's parallel processing to create a duplicate of the same physical row.

              The parallel processes each get a distinct and unique and non-overlapping range of physical rows to process. One such physical range is only processed by one parallel process. It is not possible for another process, with a different range of physical rows, to process the same physical row.

              Iif you want to dispute this, then you need to show 2 copies of the same physical source row, in the destination table.
              • 4. Re: Parallel DML inserting duplicate rows
                malik_Rahul
                Billy  Verreynne  wrote:
                It is not possible for Oracle's parallel processing to create a duplicate of the same physical row.

                The parallel processes each get a distinct and unique and non-overlapping range of physical rows to process. One such physical range is only processed by one parallel process. It is not possible for another process, with a different range of physical rows, to process the same physical row.

                Iif you want to dispute this, then you need to show 2 copies of the same physical source row, in the destination table.
                I am using the partition table. Is it make any sense? Using parallel on with partition tables, Is there a possibility that committed records picked by two process


                Below is the exact code I have written to copy. I will again simulate this and will publish the result.
                alter session enable parallel dml;
                
                alter table siebel.NEW_S_LOY_RDM_ITM nologging;
                
                alter table siebel.NEW_S_LOY_RDM_ITM parallel 16;
                
                INSERT                     
                      INTO  SIEBEL.NEW_S_LOY_RDM_ITM 
                   SELECT                                       /*+full(T1) parallel(T1,32) */
                         T1.*
                     FROM SIEBEL.S_LOY_RDM_ITM PARTITION (S_LOY_RDM_ITM_OCT_2010) T1;
                
                COMMIT;
                
                
                INSERT                    
                      INTO  SIEBEL.NEW_S_LOY_RDM_ITM 
                   SELECT                                       /*+full(T1) parallel(T1,32) */
                         T1.*
                     FROM SIEBEL.S_LOY_RDM_ITM T1
                    WHERE T1.ATTRB_DEFN_ID IN
                          ('1-AWPUO7Y',
                '1-D1ACDS2',
                '1-D1ACDS3',
                '1-CI3XZ0')
                          AND T1.PROCESS_DT BETWEEN TO_DATE ('01-Nov-2010 00:00:00',
                                                             'DD-Mon-YYYY HH24:MI:SS')
                                                AND TO_DATE ('31-Jul-2012 23:59:59',
                                                             'DD-Mon-YYYY HH24:MI:SS');
                
                COMMIT;
                
                
                
                
                INSERT                  
                      INTO  SIEBEL.NEW_S_LOY_RDM_ITM 
                   SELECT                                       /*+full(T1) parallel(T1,32) */
                         T1.*
                     FROM SIEBEL.S_LOY_RDM_ITM T1
                    WHERE T1.PROCESS_DT >=
                             TO_DATE ('1-Aug-2012 0:0:0', 'DD-Mon-YYYY HH24:MI:SS');
                
                SELECT TO_CHAR (SYSDATE, 'YYYY MM DD HH24:MI:SS') FROM DUAL;
                
                commit;
                
                
                alter session disable parallel dml;
                
                alter table siebel.NEW_S_LOY_RDM_ITM logging;
                
                alter table siebel.NEW_S_LOY_RDM_ITM noparallel;
                
                 PROMPT 'TABLE NEW_S_LOY_RDM_ITM POPULATED SUCCESFULLY'
                There were some real time transaction inserted in S_LOY_TXN during my run. S_LOY_Txn have the unique index on the row_id.


                For all those transaction processed during the execution of above query there were duplicate in new_s_loy_txn. It has the process_dt of the 16-Feb

                Edited by: malik_Rahul on Feb 17, 2013 1:14 PM
                • 5. Re: Parallel DML inserting duplicate rows
                  Billy~Verreynne
                  malik_Rahul wrote:

                  I am using the partition table. Is it make any sense? Using parallel on with partition tables, Is there a possibility that committed records picked by two process
                  Partition table make little difference in this case (ito parallel processing by rowid ranges).

                  Oracle adds the predicate "+rowid between :1 and :2+" to the SQL to run in parallel. In then "chunk" the table to process into distinct rowid ranges. Each parallel process gets its own unique physical range of data blocks and rows to read from disk.

                  To see the manual version of this (for building custom parallel processing), refer to the DBMS_PARALLEL_EXECUTE interface.

                  So it is simply not possible for the same SQL, running in parallel, to hit the same physical row by more than a single parallel process. Rowid is the physical address. Parallel processes are like delivery trucks each getting a different street and addresses to visit. In such a case, two trucks will not visit the same house address.

                  What you have shown though is multiple SQLs inserting into the same table. It is very likely that these different insert SQLs are hitting the same rows as one another, and inserting duplicates.

                  I do not see how parallel query can be causing this.
                  • 6. Re: Parallel DML inserting duplicate rows
                    Hemant K Chitale
                    S_LOY_Txn have the unique index on the row_id.
                    Interesting. Why ? Is that a Siebel design.

                    You have three different INSERTs. In theory, they seem to be sourcing from three different partitions. Is the table correctly range partitioned on PROCESS_DT ? Is PROCESS_DT really a DATE datatype ?


                    Hemant K Chitale
                    • 7. Re: Parallel DML inserting duplicate rows
                      user11973220
                      Hi Billly,

                      Thanks for your inputs. I work with Rahul. As regards your statement:

                      "What you have shown though is multiple SQLs inserting into the same table. It is very likely that these different insert SQLs are hitting the same rows as one another, and inserting duplicates."

                      These SQl's are embedded in script and run sequentially. I have run this insert many times and have never seen them run parallely at the same time. Moreover, all these SQL's address different time ranges and hence different partitions and there is no overlap between them.
                      r
                      It is very confusing; any way to determine the cause of this issue.

                      Niza
                      • 8. Re: Parallel DML inserting duplicate rows
                        Billy~Verreynne
                        user11973220 wrote:

                        These SQl's are embedded in script and run sequentially. I have run this insert many times and have never seen them run parallely at the same time. Moreover, all these SQL's address different time ranges and hence different partitions and there is no overlap between them.
                        I understand that each SQL run in serial - but that Oracle can decide to run that specific SQL in parallel.

                        As for the SQLs each addressing different data sets. That may be the intention, but from what is described in this thread, that does not seem to be the case. Do not look at the SQL running in parallel, as the cause of the duplication.

                        Oracle documentation (see Oracle® Database Concepts guide) is quite clear that each parallel slave process gets a unique set of rows to process:
                        >
                        The table is divided dynamically (dynamic partitioning) into load units called granules. Each granule is a range of data blocks of the table read by a single slave process, called a parallel execution server, which uses Pnnn as a name format.

                        The database maps granules to execution servers at execution time. When an execution server finishes reading the rows corresponding to a granule, and when granules remain, it obtains another granule from the coordinator. This operation continues until the table has been read. The execution servers send results back to the coordinator, which assembles the pieces into the desired full table scan.
                        >

                        No evidence have been provided that parallel processing is the cause of your row duplication. The manual specifically states that this is not possible as the same data block (granule), containing rows to process, cannot be issued more than once.

                        So in the absence of evidence that parallel processing is to blame, the only logical conclusion is that your multiple SQL insert statements are causing the duplicate row in the target table.