1 2 3 Previous Next 42 Replies Latest reply: Jan 24, 2014 2:22 AM by Adme12 RSS

    Row Generating in faster way

    Adme12

       

       

      Hello Experts

       

      I need a script, which fills my table with test data up.

       

      What are your opinion, could this script be tuned?

       

      CREATE TABLE TEST_TAB

      (  id_test   NUMBER(10)

      , x_domain  VARCHAR2(1)

      , id_object NUMBER(10)

      )

      TABLESPACE ALL_DATA

      /

       

      id_test  : This is the primary key. It should goes from 1 to 10 000 000.

      x_domain : It has 3 values and the distribution should be so:

                 M : 94 %    (M = Main)

                 S : 4  %    (S = Sub)

                 O : 2  %    (O = Other)

      id_object : Random number between 0 and 10 000 000

       

      SET TERMOUT ON SERVEROUTPUT ON

       

      DECLARE

         v_start       TIMESTAMP(6);

         v_end         TIMESTAMP(6);

      BEGIN

       

         EXECUTE IMMEDIATE 'TRUNCATE TABLE TEST_TAB';

       

         v_start := CURRENT_TIMESTAMP;

       

         INSERT /*+ APPEND PARALLEL (16) */

           INTO TEST_TAB (id_test, x_domain, id_object)

         SELECT LEVEL AS ID_TEST

              , CASE WHEN level <= 10000000/100*94                              THEN 'M'

                     WHEN level  > 10000000/100*94 AND level <= 10000000/100*98 THEN 'S'

                   ELSE 'O'

                END AS X_DOMAIN

              , TRUNC(dbms_random.value(0, 10000000)) AS ID_OBJECT

           FROM DUAL

         CONNECT BY LEVEL <= 10000000

         ;

       

         COMMIT;

       

         v_end := CURRENT_TIMESTAMP;

       

         DBMS_OUTPUT.PUT_LINE('Insert took : ' || TO_CHAR(v_end-v_start));

      END;

      /

       

      SQL*Plus: Release 11.2.0.1.0 Production on Thu Jan 16 11:21:43 2014

       

      Copyright (c) 1982, 2010, Oracle.  All rights reserved.

       

      Connected to:

      Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

      With the Partitioning, OLAP, Data Mining and Real Application Testing options

       

      Insert took : +000000000 00:02:12.020934000

       

      PL/SQL procedure successfully completed.

       

      SQL>

       

      I have read from rp0428 that firstly we have to use SQL and not PL/SQL. But it seems for me that this sql took too long time.. Can it be tuned?

        • 1. Re: Row Generating in faster way
          David Berger

          Hello Adme12

           

          In this case do not use the connect by -> Use COLLECTION + BULK INSERT! It should be faster!

          SET TERMOUT ON SERVEROUTPUT ON

           

          DECLARE

             id_test       PLS_INTEGER;

             v_domain      VARCHAR2(1);

             v_object      PLS_INTEGER;

             TYPE ty_table IS TABLE OF TEST_TAB%ROWTYPE;

             c_col_table   ty_table := ty_table();

           

             v_start       TIMESTAMP(6);

             v_end         TIMESTAMP(6);

          BEGIN

             EXECUTE IMMEDIATE 'TRUNCATE TABLE TEST_TAB';

             v_start := CURRENT_TIMESTAMP;

           

             FOR l_counter IN 1..10000000

             LOOP

                CASE WHEN l_counter <= 10000000/100*94                                  THEN v_domain := 'M';

                     WHEN l_counter  > 10000000/100*94 AND l_counter <= 10000000/100*98 THEN v_domain := 'S';

                  ELSE v_domain := 'O';

                END CASE;

           

                v_object := TRUNC(dbms_random.value(0, 10000000));

           

                c_col_table.EXTEND(1);

                c_col_table(l_counter).id_test   := l_counter;

                c_col_table(l_counter).x_domain  := v_domain;

                c_col_table(l_counter).id_object := v_object;

              END LOOP;

           

              FORALL l_counter in c_col_table.FIRST..c_col_table.LAST

              INSERT INTO TEST_TAB VALUES c_col_table(l_counter);

           

              COMMIT;

           

              v_end := CURRENT_TIMESTAMP;

           

              DBMS_OUTPUT.PUT_LINE('Insert took : ' || TO_CHAR(v_end-v_start));

          END;

          /

           

          Please provide me how long the execution of this select took!

           

          An other issue:

          Your Hint: PARALLEL(16) .. I think it can not work for CONNECT BY -> So, maybe for your Insert could use this hint but because of the serial execution of the Select you can not benefit from it.

           

          I hope it helps you!

          • 2. Re: Row Generating in faster way
            Adme12

            Hello David

             

            I have tested it:

             

            SQL*Plus: Release 11.2.0.1.0 Production on Thu Jan 16 12:45:11 2014

             

            Copyright (c) 1982, 2010, Oracle.  All rights reserved.

             

            Connected to:

            Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

            With the Partitioning, OLAP, Data Mining and Real Application Testing options

             

            Insert took : +000000000 00:01:41.247130000

             

            PL/SQL procedure successfully completed.

             

            SQL>

             

            Thanks, you helped me a lot!

             

            • 3. Re: Row Generating in faster way
              Martin Preiss

              only for the sake of completeness: the connect by method uses a lot of UGA memory when you try to generate millions of rows. Tanel Poder described the problem in his blog - http://blog.tanelpoder.com/2008/06/08/generating-lots-of-rows-using-connect-by-safely/ - and also mentioned the workaround of using multiple small generator queries and doing a cartesian join with the results. His article also contains a link to Adrain Billingtons article http://www.oracle-developer.net/display.php?id=408 that shows additional options (pipelined functions, model clause).

               

              To tell something about the performance I would not only look at the elapsed time but also check the session statistics.

              • 4. Re: Row Generating in faster way
                _jum

                Instead of using .EXTEND million times should be faster:

                 

                SET TERMOUT ON SERVEROUTPUT ON

                 

                DECLARE
                
                   id_test       PLS_INTEGER;
                   v_domain      VARCHAR2(1);
                   v_object      PLS_INTEGER;
                   TYPE ty_table IS TABLE OF TEST_TAB%ROWTYPE;
                   c_col_table   ty_table := ty_table();
                   v_start       TIMESTAMP(6);
                   v_end         TIMESTAMP(6);
                
                BEGIN
                
                   EXECUTE IMMEDIATE 'TRUNCATE TABLE TEST_TAB';
                
                   v_start := CURRENT_TIMESTAMP;
                   
                   c_col_table.EXTEND(10000000);
                
                   FOR l_counter IN 1..10000000
                
                   LOOP
                
                      CASE WHEN l_counter <= 10000000/100*94                                  THEN v_domain := 'M';
                           WHEN l_counter  > 10000000/100*94 AND l_counter <= 10000000/100*98 THEN v_domain := 'S';
                        ELSE v_domain := 'O';
                      END CASE;
                
                      v_object := TRUNC(dbms_random.value(0, 10000000));
                
                      --c_col_table.EXTEND(1);
                      c_col_table(l_counter).id_test   := l_counter;
                      c_col_table(l_counter).x_domain  := v_domain;
                      c_col_table(l_counter).id_object := v_object;
                
                    END LOOP;
                
                    FORALL l_counter in c_col_table.FIRST..c_col_table.LAST
                    INSERT INTO TEST_TAB VALUES c_col_table(l_counter);
                
                    COMMIT;
                
                    v_end := CURRENT_TIMESTAMP;
                
                    DBMS_OUTPUT.PUT_LINE('Insert took : ' || TO_CHAR(v_end-v_start));
                
                END;
                
                • 5. Re: Row Generating in faster way
                  Adme12

                  Hi _jum

                   

                  I tried your recommendation:

                   

                  SQL*Plus: Release 11.2.0.1.0 Production on Thu Jan 16 16:33:58 2014

                   

                  Copyright (c) 1982, 2010, Oracle.  All rights reserved.

                   

                  Connected to:

                  Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

                  With the Partitioning, OLAP, Data Mining and Real Application Testing options

                   

                  Insert took : +000000000 00:01:33.863080000

                   

                  PL/SQL procedure successfully completed.

                   

                  SQL>

                   

                  Not a million times but it is something! Thank you!

                  • 6. Re: Row Generating in faster way
                    Martin Preiss

                    tested with multiple generator queries on my PC (with 11.2.0.1):

                    DECLARE

                       v_start       TIMESTAMP(6);

                       v_end         TIMESTAMP(6);

                    BEGIN

                     

                       EXECUTE IMMEDIATE 'TRUNCATE TABLE TEST_TAB';

                     

                       v_start := CURRENT_TIMESTAMP;

                     

                       INSERT /*+ APPEND */

                         INTO TEST_TAB (id_test, x_domain, id_object)

                       SELECT rownum AS ID_TEST

                            , CASE WHEN rownum <= 10000000/100*94                              THEN 'M'

                                   WHEN rownum  > 10000000/100*94 AND rownum <= 10000000/100*98 THEN 'S'

                                 ELSE 'O'

                              END AS X_DOMAIN

                            , TRUNC(dbms_random.value(0, 10000000)) AS ID_OBJECT

                         FROM (select rownum r from dual connect by rownum <= 100) a

                            , (select rownum r from dual connect by rownum <= 100) b

                            , (select rownum r from dual connect by rownum <= 1000) c

                       ;

                     

                       COMMIT;

                     

                       v_end := CURRENT_TIMESTAMP;

                     

                       DBMS_OUTPUT.PUT_LINE('Insert took : ' || TO_CHAR(v_end-v_start));

                    END;

                    /

                     

                    Insert took : +000000000 00:00:36.915000000

                    But I guess my workstation is amazingly fast...

                     

                    By the way: I would exclude the truncate operation from the test.

                     

                    Message was edited by: Martin Preiss

                    • 7. Re: Row Generating in faster way
                      Adme12

                      Hi Martin

                       

                      I have tried your solution in the same way as I did with the previous sqls.

                       

                      The result:

                       

                      SQL*Plus: Release 11.2.0.1.0 Production on Thu Jan 16 17:03:46 2014

                       

                      Copyright (c) 1982, 2010, Oracle.  All rights reserved.

                       

                      Connected to:

                      Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

                      With the Partitioning, OLAP, Data Mining and Real Application Testing options

                       

                      Insert took : +000000000 00:02:07.753245000

                       

                      PL/SQL procedure successfully completed.

                       

                      SQL>

                       

                      Sorry, but your solution is not faster. Thank you anyway!

                       

                      • 8. Re: Row Generating in faster way
                        _jum

                        my comment should be read:

                        Instead of using .EXTEND million times, this version should be (a little bit) faster:

                         

                        BTW: with (don't miss it)

                        ALTER SESSION ENABLE PARALLEL DML;
                        

                        both scenarios need nearly the same time at my instance (ORACLE 11.2.0.3.)

                        • 9. Re: Row Generating in faster way
                          Jonathan Lewis

                          I am a little surprised that Martin's suggestion of using a join of two or three smaller "connect by" subqueries is so much slower than the PL/SQL. I personally write them up using "subquery factoring" with a materialize hint just in case Oracle does something bizarre with them as inline view.  The PL/SQL solution requires a very large memory to hold the entire generated data set - which does make it a bit of a threat - while the pure SQL version inserts data as it is generated and keeps the memory demand to a minimum.

                           

                          The key performance feature in your data set is the call to dbms_random() - try, temporarily, replacing the call with just rownum and see how much quicker the call is (using pl/sql and SQL if you like) - in my case 85% of the run time was spent in calls to dbms_random(), so the trick is to parallelise that across as many CPUs as you can. This is not entirely trivial as you also have to serialise on ROWNUM, so you need to write some SQL that will generate the random numbers in parallel and then serialise the result set to generate the rownum before inserting.  Here's a solution (which, with a little luck, might be readable):

                           

                           

                          commit;

                          alter session enable parallel dml;

                           

                           

                          declare

                            v_start timestamp(6);

                            v_end timestamp(6);

                          begin

                           

                            execute immediate 'truncate table test_tab';

                           

                            v_start := current_timestamp;

                           

                            insert /*+ append */ into test_tab (

                            id_test, x_domain, id_object

                            )

                            with v1 as (

                            select /*+ materialize */

                            rownum id from dual

                            connect by level <= 1e3

                            ),

                            v2 as (

                            select /*+ materialize */

                            rownum id from dual

                            connect by level <= 1e4

                            )

                            select

                            rownum as id_test,

                            case

                            when rownum <= 9400000

                            then 'M'

                            when rownum <= 9800000

                            then 'S'

                                        else 'O'

                                    end as x_domain,

                            id_object

                            from (

                            select

                            /*+ parallel(v1) parallel(v2) no_merge */

                            trunc(dbms_random.value(0, 10000000)) as id_object

                                from

                                v1, v2

                              )

                            ;

                           

                           

                            commit;

                           

                            v_end := current_timestamp;

                           

                            dbms_output.put_line('Insert took : ' || to_char(v_end - v_start));

                          end;

                          /

                           

                           

                           

                           

                          On my machine, running 11.2.0.4, a non-parallel version of this code took 1 minute 29 seconds using 100% of one CPU; running parallel 2 it took 49 seconds to complete (using 100% of 2 CPUs).

                           

                          Regards

                          Jonathan Lewis

                          • 10. Re: Row Generating in faster way
                            jihuyao

                            SELECT LEVEL AS ID_TEST

                                    , CASE WHEN level <= 10000000/100*94                              THEN 'M'

                                           WHEN level  > 10000000/100*94 AND level <= 10000000/100*98 THEN 'S'

                                         ELSE 'O'

                                      END AS X_DOMAIN

                                    , TRUNC(dbms_random.value(0, 10000000)) AS ID_OBJECT

                                 FROM DUAL

                               CONNECT BY LEVEL <= 10000000

                             

                            After all, if not care about how M/S/O is set in sequence, why use CASE?  One IF costs nothing but 10 millions of IF or more?  Anything can be broken.

                             

                            So it is straight,

                            for i in 1..9.4*100*100, insert M

                            for i in ...., insert S

                            for i in ...,, insert O

                             

                            It might be interesting to see how the connect by is performing in the execution plan and related statistics.  At least it does not have to be recursive one by one but N by N in recusive sql.

                            • 11. Re: Row Generating in faster way
                              Adme12

                              Hello Johnatan

                               

                              I tested your SQL.

                               

                              The Results are:

                               

                              With Parallelizing:

                              ==============

                              SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 17 09:41:42 2014

                               

                              Copyright (c) 1982, 2010, Oracle.  All rights reserved.

                               

                              Connected to:

                              Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

                              With the Partitioning, OLAP, Data Mining and Real Application Testing options

                               

                              Commit complete.

                               

                              Session altered.

                               

                              Insert took : +000000000 00:00:50.622744000

                               

                              PL/SQL procedure successfully completed.

                               

                              SQL>

                               

                              Without Parallelizing (Disabled Parallel DML and NO Parallel Hint):

                              =====================================================

                              SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 17 09:52:16 2014

                               

                              Copyright (c) 1982, 2010, Oracle.  All rights reserved.

                               

                              Connected to:

                              Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

                              With the Partitioning, OLAP, Data Mining and Real Application Testing options

                               

                              Commit complete.

                               

                              Session altered.

                               

                              Insert took : +000000000 00:02:10.210821000

                               

                              PL/SQL procedure successfully completed.

                               

                              SQL>

                               

                              So it is very good and really the only thing which could help if the SQL executes parallell...

                               

                              But, in this case I would be interested how long would a parallelized version of a Collection + Bulk-Insert take!

                               

                              Could you help me that you rewrite the code from David and you parallelize it?

                               

                              But.. If it is not parallelized than the collect by takes much longer as we see...

                               

                              Best Regards for your help!

                              • 12. Re: Row Generating in faster way
                                Jonathan Lewis

                                First a couple of questions - what degree of parallelism did your system run at, how many CPUs does it have, and were any other jobs using noticeable amounts of CPU at the time of the test.  (Added question - did you test with a simple rownum in your original code instead of dbms_random() ? It's important to KNOW where you can save time, and how much).

                                 

                                Since most of the time is spent on calls to dbms_random, the simplest way to parallelize the PL/SQL is to turn it into a parameter-driven procedure and run multiple copies to generate the data, making sure that they serialise on the insert.

                                 

                                Change the PL/SQL to have two parameters - a start and end id.

                                Add a "Lock table test_tab in exclusive mode" just before the "forall insert" command.

                                 

                                Start up as many sessions as you have CPUs, giving each a different range.

                                They will populate their arrays in parallel - but the first one to get to the insert will lock the table, blocking the rest; when it commits the second one will get its lock and insert, blocking the rest, and so on.

                                 

                                Reminder: the PL/SQL solution is NOT a scalable solution - it depends on having enough memory to create the entire data set in memory before writing it to disk.

                                 

                                 

                                Regards

                                Jonathan Lewis

                                • 13. Re: Row Generating in faster way
                                  David Berger

                                  Hello Jonathan

                                   

                                  I read your post and I see what I could make better too.

                                   

                                  I have questions:

                                  1.)

                                  ... the simplest way to parallelize the PL/SQL is to turn it into a parameter-driven procedure and run multiple copies to generate the data, making sure that they serialise on the insert.

                                  What is the benefit if we have multiple copies to generate data BUT we serialize the insert?

                                   

                                  2.)

                                  Add a "Lock table test_tab in exclusive mode" just before the "forall insert" command.

                                   

                                  Start up as many sessions as you have CPUs, giving each a different range.

                                  They will populate their arrays in parallel - but the first one to get to the insert will lock the table, blocking the rest; when it commits the second one will get its lock and insert, blocking the rest, and so on.

                                  Why do we need the "Lock table test_tab in exclusive mode" before the "forall insert"?

                                   

                                  I do not see why we have to execute the bulk-insert step by step after each other.

                                   

                                  Thank You in advance.

                                  • 14. Re: Row Generating in faster way
                                    Jonathan Lewis

                                    David,

                                     

                                    As I said in my earlier post, most of the time spent on my system was in the call to dbms_random() (something like 85% of the time), so the most significant improvement could be gained by making that component parallel, while the time to insert is relatively unimportant; moreover, I was anticipating a problem with contention on the inserts - either due to buffer busy waits, or log writer waits on massively parallel inserts.

                                     

                                    The table locking was simply a convenient way to implement the serialisation. In fact I've done a couple of quick tests on a small machine, and saw virtually no time lost on the inserts - but I only had 2 CPUs, so minimum risk of contention.  An alternative to locking, anyway, would be to add the append_values hint to the insert to use direct path array inserts which (I think) would mean an implicit table lock.  This MIGHT speed up the processing (it would certainly reduce the undo and redo somewhat, especially if this was a test database that was not running in archivelog mode) - on the other hand, it might slow things down as each session would have to wait for its own data block writes to complete.

                                     

                                    A simple of speeding up the pl/sql, by the way, would be to set an arraysize of 100,000 and loop around it 100 times (adjust to suit) - committing after each loop. This should be noticeable faster than growing an array of 10M entries. In this case the append_values hint may be more appropriate since any waits for locks would be waiting for much smaller writes to complete.

                                     

                                    On my little machine the best times I got were (parallel 2):  50 seconds for the parallel SQL solution and 32 seconds for the parameterised pl/sql solution using an arraysize of 100000 and normal inserts with no locking.

                                     

                                    Regards

                                    Jonathan Lewis

                                    1 2 3 Previous Next