1 2 Previous Next 17 Replies Latest reply: Aug 22, 2012 3:55 PM by 957054 RSS

    Updating 10 million records from a billion records.

    957054
      Hello people,

      I have a specific issue for which I have been contemplating a solution, It would be great if you guys help me out with this issue.

      I have a business need to update 10 million rows in a table which has 1 billion records. Although I have read a lot about cursors I am unable to decide at this moment if using cursors is a good approach in this situation:

      1) I have 1 million rows in a emp_temp_table.

      2) I have to select the 1 million rows in emp_temp_table and update these records in EMP table.

      I have looked into "CREATE TABLE new_table as select <do the update "here"> from old_table;" but space is a concern to use this approach (cannot allocate temporary table which would occupy 1 TB. )

      Here is the procedure, I wanted to know if this approach is good. Its acceptable in our environment to execute this code for block of 6- 10 hrs :

      SET lines 80 pages 100
      SET serverout on size 1000000
      SET echo on verify on timing on time on trimspool on wrap on

      DECLARE

      v_record_updt_cnt NUMBER:=0;
      v_commit_cnt NUMBER:=0;
      v_rec_cnt NUMBER:=0;
      v_err_code VARCHAR2(10) := SQLCODE;
      v_err_msg VARCHAR2(256) := SQLERRM;


      CURSOR get_temp_records IS
      SELECT ett.emp_name, ett.emp_id, ett.emp_phone,
                          ett.date
      FROM emp_admn.emp_temp_table ett
      ;

      BEGIN

      /* Count of records to update EMP table. */
      SELECT count(*) INTO v_rec_cnt
      FROM emp_admn.emp_temp_table ett
      ;

      /* Update the EMP table records in the cursor */
      FOR emp_info IN get_temp_records
      LOOP
      BEGIN

      UPDATE emp_admn.emp
      SET emp_name = emp_info.emp_name
      WHERE emp_id = emp_info.emp_id
      AND emp_phone = emp_info.emp_phone
      ;

      v_record_updt_cnt := v_record_updt_cnt+SQL%ROWCOUNT;
      v_commit_cnt := v_commit_cnt+SQL%ROWCOUNT;

      IF v_commit_cnt = 100000
      THEN
      COMMIT;
      dbms_output.put_line(' 100000 Rows commited.');
      v_commit_cnt :=0;
      END IF;

      EXCEPTION
      WHEN OTHERS
      THEN
      dbms_output.put_line('Error: '|| SQLCODE || ' ' || SQLERRM);
      dbms_output.put_line('Error Record: ' || ' ID : ' || emp_info.emp_id || ' Name: ' || emp_info.emp_name );
      END;

      END LOOP;

      COMMIT;

      /* To verify all the records updated EMP table. */
      IF v_record_updt_cnt = v_rec_cnt
      THEN
      dbms_output.put_line('All the records are updated.');
      ELSE
      dbms_output.put_line('# records EMP_TEMP_TABLE table: ' || v_rec_cnt);
      dbms_output.put_line('# records updated in EMP TABLE: ' || v_record_updt_cnt);
      END IF;

      EXCEPTION
      WHEN OTHERS
      THEN
      dbms_output.put_line('Error: ' || ' ' || SQLCODE);
      dbms_output.put_line('Error: ' || ' ' || SQLERRM);
      END;
      /

      exit

      Edited by: 954051 on Aug 21, 2012 5:55 PM

      Edited by: 954051 on Aug 21, 2012 5:55 PM

      Edited by: 954051 on Aug 21, 2012 5:58 PM

      Edited by: 954051 on Aug 22, 2012 1:06 PM
        • 1. Re: Updating 10 million records from a billion records.
          sb92075
          954051 wrote:
          Hello people,

          I have a specific issue for which I have been contemplating a solution, It would be great if you guys help me out with this issue.

          I have a business need to update 10 million rows in a table which has 1 billion records. The table is a fact table . Although I have read a lot about cursors I am unable to decide at this moment if using cursors is a good approach in this situation:

          1) I have 1 million rows in a emp_temp_table.

          2) I have to select the 1 million rows in emp_temp_table and update these records in EMP table.

          I have looked into "CREATE TABLE new_table as select <do the update "here"> from old_table;" but space is a concern to use this approach (cannot allocate temporary table which would occupy 1 TB. )

          Here is the procedure, I wanted to know if this approach is good. Its acceptable in our environment to execute this code for block of 6- 10 hrs :

          SET lines 80 pages 100
          SET serverout on size 1000000
          SET echo on verify on timing on time on trimspool on wrap on

          DECLARE

          v_record_updt_cnt NUMBER:=0;
          v_commit_cnt NUMBER:=0;
          v_rec_cnt NUMBER:=0;
          v_err_code VARCHAR2(10) := SQLCODE;
          v_err_msg VARCHAR2(256) := SQLERRM;


          CURSOR get_temp_records IS
          SELECT ett.emp_name, ett.emp_id, ett.emp_phone,
                              ett.date
          FROM emp_admn.emp_temp_table ett
          ;

          BEGIN

          /* Count of records to update EMP table. */
          SELECT count(*) INTO v_rec_cnt
          FROM emp_admn.emp_temp_table ett
          ;

          /* Update the EMP table records in the cursor */
          FOR emp_info IN get_temp_records
          LOOP
          BEGIN

          UPDATE emp_admn.emp
          SET emp_name = emp_info.emp_name
          WHERE emp_id = emp_info.emp_id
          AND emp_phone = emp_info.emp_phone
          ;

          v_record_updt_cnt := v_record_updt_cnt+SQL%ROWCOUNT;
          v_commit_cnt := v_commit_cnt+SQL%ROWCOUNT;

          IF v_commit_cnt = 100000
          THEN
          COMMIT;
          dbms_output.put_line(' 100000 Rows commited.');
          v_commit_cnt :=0;
          END IF;

          EXCEPTION
          WHEN OTHERS
          THEN
          dbms_output.put_line('Error: '|| SQLCODE || ' ' || SQLERRM);
          dbms_output.put_line('Error Record: ' || ' ID : ' || emp_info.emp_id || ' Name: ' || emp_info.emp_name );
          END;

          END LOOP;

          COMMIT;

          /* To verify all the records updated EMP table. */
          IF v_record_updt_cnt = v_rec_cnt
          THEN
          dbms_output.put_line('All the records are updated.');
          ELSE
          dbms_output.put_line('# records EMP_TEMP_TABLE table: ' || v_rec_cnt);
          dbms_output.put_line('# records updated in EMP TABLE: ' || v_record_updt_cnt);
          END IF;

          EXCEPTION
          WHEN OTHERS
          THEN
          dbms_output.put_line('Error: ' || ' ' || SQLCODE);
          dbms_output.put_line('Error: ' || ' ' || SQLERRM);
          END;
          /

          exit

          Edited by: 954051 on Aug 21, 2012 5:55 PM

          Edited by: 954051 on Aug 21, 2012 5:55 PM
          can & should be done as a single UPDATE without ANY PL/SQL cursor or LOOP
          • 2. Re: Updating 10 million records from a billion records.
            rp0428
            Welcome to the forum! Whenever you post provide your 4 digit Oracle version (result of SELECT * FROM V$VERSION)

            Why would you use a CURSOR loop to do slow-by-slow (row by row) processing when all that is needed is a simple UPDATE statement?
            >
            I have a business need to update 10 million rows in a table which has 1 billion records. The table is a fact table . Although I have read a lot about cursors I am unable to decide at this moment if using cursors is a good approach in this situation:

            1) I have 1 million rows in a emp_temp_table.
            >
            And how did you go from 10 million rows to only 1 million?
            • 3. Re: Updating 10 million records from a billion records.
              957054
              The idea was to implement this as a test bed and gather the information. I could execute a simple update statement but in my opinion I wanted to have control over the number of records being updated. I initially wanted to just execute a simple update statement but was not sure how to bench mark it and thought a procedure would give me a certain sense of time and I could improve based on it.

              10 million is the goal but wanted to do in increments of 1 million also this is just a test procedure, it definitely needs work. But looks like none of it is required a simple update should work. Thanks.

              Edited by: 954051 on Aug 21, 2012 6:06 PM
              • 4. Re: Updating 10 million records from a billion records.
                sb92075
                954051 wrote:
                The idea was to implement this as a test bed and gather the information. I could execute a simple update statement but in my opinion I wanted to have control over the number of records being updated.
                ROLLBACK;
                • 5. Re: Updating 10 million records from a billion records.
                  rp0428
                  >
                  I wanted to have control over the number of records being updated
                  >
                  ALWAYS use SQL whenever possible. If you have to use PL/SQL use BULK processing not slow-by-slow. For that you would
                  SELECT COLA, COLB, COLC ... BULK COLLECT INTO myCollection FROM myTable LIMIT 10000;
                  That processes 10000 at a time and you would use a FORALL to update ALL 10000 with one statement. That causes only ONE context switch between the PL/SQL and SQL engines for each 10000 records processed. Your slow-by-slow has a context switch for every row.

                  See Limiting Rows for a Bulk FETCH Operation (LIMIT Clause) and Using FORALL and BULK COLLECT Together in the PL/SQL Language doc
                  http://docs.oracle.com/cd/B28359_01/appdev.111/b28370/tuning.htm#BCGFCBGH
                  • 6. Re: Updating 10 million records from a billion records.
                    957054
                    Thank you for the approach. Will try the simple update first and figure out the time taken and then do the PL/SQL with BULK mode if need be.
                    • 7. Re: Updating 10 million records from a billion records.
                      sb92075
                      954051 wrote:
                      Thank you for the approach. Will try the simple update first and figure out the time taken and then do the PL/SQL with BULK mode if need be.
                      PL/SQL can never be faster than plain SQL to process same result set.
                      • 8. Re: Updating 10 million records from a billion records.
                        Nikolay Savvinov
                        Hi,

                        1) when updating one table from another, you have a choice between an updatable view and a MERGE. MERGE is much simpler because you don't have to worry about the key preservation etc., plus it's much more readable.
                        2) I agree with what other said about row-by-row processing. An illusion of control does not justify voluntarily slowing down your code by an order of magnitude or even more.

                        Best regards,
                        Nikolay
                        • 9. Re: Updating 10 million records from a billion records.
                          Billy~Verreynne
                          954051 wrote:

                          I have a business need to update 10 million rows in a table which has 1 billion records. The table is a fact table . Although I have read a lot about cursors I am unable to decide at this moment if using cursors is a good approach in this situation:
                          You have not read enough. Or read about one side of the coin only.

                          You use SQL, right? So you are using cursors.

                          All SQL statements are parsed, stored and executed as cursors. There is no such thing as using SQL and not using a cursor.

                          Do not confuse row-by-row processing (using a cursor), with a single SQL update/merge statement (also a cursor).
                          • 10. Re: Updating 10 million records from a billion records.
                            Billy~Verreynne
                            954051 wrote:

                            I have a business need to update 10 million rows in a table which has 1 billion records.
                            The total row count is irrelevant - unless your SQL cursor does a full table or full index range scan.
                            1) I have 1 million rows in a emp_temp_table.

                            2) I have to select the 1 million rows in emp_temp_table and update these records in EMP table.
                            For this type of update, where rows are updated using other rows, the rows to update needs to be unique identified. Thus it is reasonable to assume that the rows to update will be identified via their primary keys.

                            A million such updates, thus means a million unique index reads (find the address of a row via its primary key), followed by a read of the row to update.

                            Due to the number of rows to update, this means a fair amount of unavoidable I/O. I/O is already optimised - a unique index read is used to identify the row to update. So there is no means of doing less I/O and increasing performance that way.

                            This then only leaves parallel processing to increase performance. Instead of doing the I/O for processing a million rows sequentially, use for example 10 parallel processes, with each process doing the I/O for a 100,000 rows.

                            The applicable interface to use for this approach is DBMS_PARALLEL_EXECUTE.
                            • 11. Re: Updating 10 million records from a billion records.
                              737979
                              Like the others, I'd do the update with a SQL statement and not a PL/SQL block.

                              The other glaring issue I see with your PL/SQL code is the exception section.
                              Why do you have it?

                              Exceptions are not Pokemon - you don't have to "catch them all"

                              Your code traps ALL exceptions and hides them in DBMS_OUTPUT statements.
                              You should be very careful when using an EXCEPTION WHEN OTHERS construct.
                              • 12. Re: Updating 10 million records from a billion records.
                                Billy~Verreynne
                                user12286392 wrote:

                                Exceptions are not Pokemon - you don't have to "catch them all"
                                LOL. Nice... :D
                                • 13. Re: Updating 10 million records from a billion records.
                                  jgarry
                                  Billy  Verreynne  wrote:
                                  user12286392 wrote:

                                  Exceptions are not Pokemon - you don't have to "catch them all"
                                  LOL. Nice... :D
                                  If you are using merge, everything will get rolled back if you don't. Even pokemon bugs had [url http://pokemondb.net/type]bugs.
                                  • 14. Re: Updating 10 million records from a billion records.
                                    Mark Malakanov (user11181920)
                                    I could execute a simple update statement but in my opinion I wanted to have control over the number of records being updated. I initially wanted to just execute a simple update statement but was not sure how to bench mark it and thought a procedure would give me a certain sense of time and I could improve based on it.
                                    -- this will give you "a certain sense of time"
                                    set timing on

                                    -- this will give you a control over the number of records being updated.
                                    update TBL set col='new val' where col='old val' and ROWNUM<=1000000;
                                    1 2 Previous Next