1 2 3 Previous Next 42 Replies Latest reply: Jan 24, 2014 12:22 AM by Adme12 RSS

Row Generating in faster way

Adme12 Newbie
Currently Being Moderated

 

 

Hello Experts

 

I need a script, which fills my table with test data up.

 

What are your opinion, could this script be tuned?

 

CREATE TABLE TEST_TAB

(  id_test   NUMBER(10)

, x_domain  VARCHAR2(1)

, id_object NUMBER(10)

)

TABLESPACE ALL_DATA

/

 

id_test  : This is the primary key. It should goes from 1 to 10 000 000.

x_domain : It has 3 values and the distribution should be so:

           M : 94 %    (M = Main)

           S : 4  %    (S = Sub)

           O : 2  %    (O = Other)

id_object : Random number between 0 and 10 000 000

 

SET TERMOUT ON SERVEROUTPUT ON

 

DECLARE

   v_start       TIMESTAMP(6);

   v_end         TIMESTAMP(6);

BEGIN

 

   EXECUTE IMMEDIATE 'TRUNCATE TABLE TEST_TAB';

 

   v_start := CURRENT_TIMESTAMP;

 

   INSERT /*+ APPEND PARALLEL (16) */

     INTO TEST_TAB (id_test, x_domain, id_object)

   SELECT LEVEL AS ID_TEST

        , CASE WHEN level <= 10000000/100*94                              THEN 'M'

               WHEN level  > 10000000/100*94 AND level <= 10000000/100*98 THEN 'S'

             ELSE 'O'

          END AS X_DOMAIN

        , TRUNC(dbms_random.value(0, 10000000)) AS ID_OBJECT

     FROM DUAL

   CONNECT BY LEVEL <= 10000000

   ;

 

   COMMIT;

 

   v_end := CURRENT_TIMESTAMP;

 

   DBMS_OUTPUT.PUT_LINE('Insert took : ' || TO_CHAR(v_end-v_start));

END;

/

 

SQL*Plus: Release 11.2.0.1.0 Production on Thu Jan 16 11:21:43 2014

 

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

 

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

With the Partitioning, OLAP, Data Mining and Real Application Testing options

 

Insert took : +000000000 00:02:12.020934000

 

PL/SQL procedure successfully completed.

 

SQL>

 

I have read from rp0428 that firstly we have to use SQL and not PL/SQL. But it seems for me that this sql took too long time.. Can it be tuned?

  • 1. Re: Row Generating in faster way
    David Berger Pro
    Currently Being Moderated

    Hello Adme12

     

    In this case do not use the connect by -> Use COLLECTION + BULK INSERT! It should be faster!

    SET TERMOUT ON SERVEROUTPUT ON

     

    DECLARE

       id_test       PLS_INTEGER;

       v_domain      VARCHAR2(1);

       v_object      PLS_INTEGER;

       TYPE ty_table IS TABLE OF TEST_TAB%ROWTYPE;

       c_col_table   ty_table := ty_table();

     

       v_start       TIMESTAMP(6);

       v_end         TIMESTAMP(6);

    BEGIN

       EXECUTE IMMEDIATE 'TRUNCATE TABLE TEST_TAB';

       v_start := CURRENT_TIMESTAMP;

     

       FOR l_counter IN 1..10000000

       LOOP

          CASE WHEN l_counter <= 10000000/100*94                                  THEN v_domain := 'M';

               WHEN l_counter  > 10000000/100*94 AND l_counter <= 10000000/100*98 THEN v_domain := 'S';

            ELSE v_domain := 'O';

          END CASE;

     

          v_object := TRUNC(dbms_random.value(0, 10000000));

     

          c_col_table.EXTEND(1);

          c_col_table(l_counter).id_test   := l_counter;

          c_col_table(l_counter).x_domain  := v_domain;

          c_col_table(l_counter).id_object := v_object;

        END LOOP;

     

        FORALL l_counter in c_col_table.FIRST..c_col_table.LAST

        INSERT INTO TEST_TAB VALUES c_col_table(l_counter);

     

        COMMIT;

     

        v_end := CURRENT_TIMESTAMP;

     

        DBMS_OUTPUT.PUT_LINE('Insert took : ' || TO_CHAR(v_end-v_start));

    END;

    /

     

    Please provide me how long the execution of this select took!

     

    An other issue:

    Your Hint: PARALLEL(16) .. I think it can not work for CONNECT BY -> So, maybe for your Insert could use this hint but because of the serial execution of the Select you can not benefit from it.

     

    I hope it helps you!

  • 2. Re: Row Generating in faster way
    Adme12 Newbie
    Currently Being Moderated

    Hello David

     

    I have tested it:

     

    SQL*Plus: Release 11.2.0.1.0 Production on Thu Jan 16 12:45:11 2014

     

    Copyright (c) 1982, 2010, Oracle.  All rights reserved.

     

    Connected to:

    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

    With the Partitioning, OLAP, Data Mining and Real Application Testing options

     

    Insert took : +000000000 00:01:41.247130000

     

    PL/SQL procedure successfully completed.

     

    SQL>

     

    Thanks, you helped me a lot!

     

  • 3. Re: Row Generating in faster way
    Martin Preiss Expert
    Currently Being Moderated

    only for the sake of completeness: the connect by method uses a lot of UGA memory when you try to generate millions of rows. Tanel Poder described the problem in his blog - http://blog.tanelpoder.com/2008/06/08/generating-lots-of-rows-using-connect-by-safely/ - and also mentioned the workaround of using multiple small generator queries and doing a cartesian join with the results. His article also contains a link to Adrain Billingtons article http://www.oracle-developer.net/display.php?id=408 that shows additional options (pipelined functions, model clause).

     

    To tell something about the performance I would not only look at the elapsed time but also check the session statistics.

  • 4. Re: Row Generating in faster way
    _jum Journeyer
    Currently Being Moderated

    Instead of using .EXTEND million times should be faster:

     

    SET TERMOUT ON SERVEROUTPUT ON

     

    DECLARE
       id_test       PLS_INTEGER;
       v_domain      VARCHAR2(1);
       v_object      PLS_INTEGER;
       TYPE ty_table IS TABLE OF TEST_TAB%ROWTYPE;
       c_col_table   ty_table := ty_table();
       v_start       TIMESTAMP(6);
       v_end         TIMESTAMP(6);
    BEGIN
       EXECUTE IMMEDIATE 'TRUNCATE TABLE TEST_TAB';
       v_start := CURRENT_TIMESTAMP;
       
       c_col_table.EXTEND(10000000);
       FOR l_counter IN 1..10000000
       LOOP
          CASE WHEN l_counter <= 10000000/100*94                                  THEN v_domain := 'M';
               WHEN l_counter  > 10000000/100*94 AND l_counter <= 10000000/100*98 THEN v_domain := 'S';
            ELSE v_domain := 'O';
          END CASE;
          v_object := TRUNC(dbms_random.value(0, 10000000));
          --c_col_table.EXTEND(1);
          c_col_table(l_counter).id_test   := l_counter;
          c_col_table(l_counter).x_domain  := v_domain;
          c_col_table(l_counter).id_object := v_object;
        END LOOP;
        FORALL l_counter in c_col_table.FIRST..c_col_table.LAST
        INSERT INTO TEST_TAB VALUES c_col_table(l_counter);
        COMMIT;
        v_end := CURRENT_TIMESTAMP;
        DBMS_OUTPUT.PUT_LINE('Insert took : ' || TO_CHAR(v_end-v_start));
    END;
  • 5. Re: Row Generating in faster way
    Adme12 Newbie
    Currently Being Moderated

    Hi _jum

     

    I tried your recommendation:

     

    SQL*Plus: Release 11.2.0.1.0 Production on Thu Jan 16 16:33:58 2014

     

    Copyright (c) 1982, 2010, Oracle.  All rights reserved.

     

    Connected to:

    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

    With the Partitioning, OLAP, Data Mining and Real Application Testing options

     

    Insert took : +000000000 00:01:33.863080000

     

    PL/SQL procedure successfully completed.

     

    SQL>

     

    Not a million times but it is something! Thank you!

  • 6. Re: Row Generating in faster way
    Martin Preiss Expert
    Currently Being Moderated

    tested with multiple generator queries on my PC (with 11.2.0.1):

    DECLARE

       v_start       TIMESTAMP(6);

       v_end         TIMESTAMP(6);

    BEGIN

     

       EXECUTE IMMEDIATE 'TRUNCATE TABLE TEST_TAB';

     

       v_start := CURRENT_TIMESTAMP;

     

       INSERT /*+ APPEND */

         INTO TEST_TAB (id_test, x_domain, id_object)

       SELECT rownum AS ID_TEST

            , CASE WHEN rownum <= 10000000/100*94                              THEN 'M'

                   WHEN rownum  > 10000000/100*94 AND rownum <= 10000000/100*98 THEN 'S'

                 ELSE 'O'

              END AS X_DOMAIN

            , TRUNC(dbms_random.value(0, 10000000)) AS ID_OBJECT

         FROM (select rownum r from dual connect by rownum <= 100) a

            , (select rownum r from dual connect by rownum <= 100) b

            , (select rownum r from dual connect by rownum <= 1000) c

       ;

     

       COMMIT;

     

       v_end := CURRENT_TIMESTAMP;

     

       DBMS_OUTPUT.PUT_LINE('Insert took : ' || TO_CHAR(v_end-v_start));

    END;

    /

     

    Insert took : +000000000 00:00:36.915000000

    But I guess my workstation is amazingly fast...

     

    By the way: I would exclude the truncate operation from the test.

     

    Message was edited by: Martin Preiss

  • 7. Re: Row Generating in faster way
    Adme12 Newbie
    Currently Being Moderated

    Hi Martin

     

    I have tried your solution in the same way as I did with the previous sqls.

     

    The result:

     

    SQL*Plus: Release 11.2.0.1.0 Production on Thu Jan 16 17:03:46 2014

     

    Copyright (c) 1982, 2010, Oracle.  All rights reserved.

     

    Connected to:

    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

    With the Partitioning, OLAP, Data Mining and Real Application Testing options

     

    Insert took : +000000000 00:02:07.753245000

     

    PL/SQL procedure successfully completed.

     

    SQL>

     

    Sorry, but your solution is not faster. Thank you anyway!

     

  • 8. Re: Row Generating in faster way
    _jum Journeyer
    Currently Being Moderated

    my comment should be read:

    Instead of using .EXTEND million times, this version should be (a little bit) faster:

     

    BTW: with (don't miss it)

    ALTER SESSION ENABLE PARALLEL DML;

    both scenarios need nearly the same time at my instance (ORACLE 11.2.0.3.)

  • 9. Re: Row Generating in faster way
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated

    I am a little surprised that Martin's suggestion of using a join of two or three smaller "connect by" subqueries is so much slower than the PL/SQL. I personally write them up using "subquery factoring" with a materialize hint just in case Oracle does something bizarre with them as inline view.  The PL/SQL solution requires a very large memory to hold the entire generated data set - which does make it a bit of a threat - while the pure SQL version inserts data as it is generated and keeps the memory demand to a minimum.

     

    The key performance feature in your data set is the call to dbms_random() - try, temporarily, replacing the call with just rownum and see how much quicker the call is (using pl/sql and SQL if you like) - in my case 85% of the run time was spent in calls to dbms_random(), so the trick is to parallelise that across as many CPUs as you can. This is not entirely trivial as you also have to serialise on ROWNUM, so you need to write some SQL that will generate the random numbers in parallel and then serialise the result set to generate the rownum before inserting.  Here's a solution (which, with a little luck, might be readable):

     

     

    commit;

    alter session enable parallel dml;

     

     

    declare

      v_start timestamp(6);

      v_end timestamp(6);

    begin

     

      execute immediate 'truncate table test_tab';

     

      v_start := current_timestamp;

     

      insert /*+ append */ into test_tab (

      id_test, x_domain, id_object

      )

      with v1 as (

      select /*+ materialize */

      rownum id from dual

      connect by level <= 1e3

      ),

      v2 as (

      select /*+ materialize */

      rownum id from dual

      connect by level <= 1e4

      )

      select

      rownum as id_test,

      case

      when rownum <= 9400000

      then 'M'

      when rownum <= 9800000

      then 'S'

                  else 'O'

              end as x_domain,

      id_object

      from (

      select

      /*+ parallel(v1) parallel(v2) no_merge */

      trunc(dbms_random.value(0, 10000000)) as id_object

          from

          v1, v2

        )

      ;

     

     

      commit;

     

      v_end := current_timestamp;

     

      dbms_output.put_line('Insert took : ' || to_char(v_end - v_start));

    end;

    /

     

     

     

     

    On my machine, running 11.2.0.4, a non-parallel version of this code took 1 minute 29 seconds using 100% of one CPU; running parallel 2 it took 49 seconds to complete (using 100% of 2 CPUs).

     

    Regards

    Jonathan Lewis

  • 10. Re: Row Generating in faster way
    jihuyao Journeyer
    Currently Being Moderated

    SELECT LEVEL AS ID_TEST

            , CASE WHEN level <= 10000000/100*94                              THEN 'M'

                   WHEN level  > 10000000/100*94 AND level <= 10000000/100*98 THEN 'S'

                 ELSE 'O'

              END AS X_DOMAIN

            , TRUNC(dbms_random.value(0, 10000000)) AS ID_OBJECT

         FROM DUAL

       CONNECT BY LEVEL <= 10000000

     

    After all, if not care about how M/S/O is set in sequence, why use CASE?  One IF costs nothing but 10 millions of IF or more?  Anything can be broken.

     

    So it is straight,

    for i in 1..9.4*100*100, insert M

    for i in ...., insert S

    for i in ...,, insert O

     

    It might be interesting to see how the connect by is performing in the execution plan and related statistics.  At least it does not have to be recursive one by one but N by N in recusive sql.

  • 11. Re: Row Generating in faster way
    Adme12 Newbie
    Currently Being Moderated

    Hello Johnatan

     

    I tested your SQL.

     

    The Results are:

     

    With Parallelizing:

    ==============

    SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 17 09:41:42 2014

     

    Copyright (c) 1982, 2010, Oracle.  All rights reserved.

     

    Connected to:

    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

    With the Partitioning, OLAP, Data Mining and Real Application Testing options

     

    Commit complete.

     

    Session altered.

     

    Insert took : +000000000 00:00:50.622744000

     

    PL/SQL procedure successfully completed.

     

    SQL>

     

    Without Parallelizing (Disabled Parallel DML and NO Parallel Hint):

    =====================================================

    SQL*Plus: Release 11.2.0.1.0 Production on Fri Jan 17 09:52:16 2014

     

    Copyright (c) 1982, 2010, Oracle.  All rights reserved.

     

    Connected to:

    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

    With the Partitioning, OLAP, Data Mining and Real Application Testing options

     

    Commit complete.

     

    Session altered.

     

    Insert took : +000000000 00:02:10.210821000

     

    PL/SQL procedure successfully completed.

     

    SQL>

     

    So it is very good and really the only thing which could help if the SQL executes parallell...

     

    But, in this case I would be interested how long would a parallelized version of a Collection + Bulk-Insert take!

     

    Could you help me that you rewrite the code from David and you parallelize it?

     

    But.. If it is not parallelized than the collect by takes much longer as we see...

     

    Best Regards for your help!

  • 12. Re: Row Generating in faster way
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated

    First a couple of questions - what degree of parallelism did your system run at, how many CPUs does it have, and were any other jobs using noticeable amounts of CPU at the time of the test.  (Added question - did you test with a simple rownum in your original code instead of dbms_random() ? It's important to KNOW where you can save time, and how much).

     

    Since most of the time is spent on calls to dbms_random, the simplest way to parallelize the PL/SQL is to turn it into a parameter-driven procedure and run multiple copies to generate the data, making sure that they serialise on the insert.

     

    Change the PL/SQL to have two parameters - a start and end id.

    Add a "Lock table test_tab in exclusive mode" just before the "forall insert" command.

     

    Start up as many sessions as you have CPUs, giving each a different range.

    They will populate their arrays in parallel - but the first one to get to the insert will lock the table, blocking the rest; when it commits the second one will get its lock and insert, blocking the rest, and so on.

     

    Reminder: the PL/SQL solution is NOT a scalable solution - it depends on having enough memory to create the entire data set in memory before writing it to disk.

     

     

    Regards

    Jonathan Lewis

  • 13. Re: Row Generating in faster way
    David Berger Pro
    Currently Being Moderated

    Hello Jonathan

     

    I read your post and I see what I could make better too.

     

    I have questions:

    1.)

    ... the simplest way to parallelize the PL/SQL is to turn it into a parameter-driven procedure and run multiple copies to generate the data, making sure that they serialise on the insert.

    What is the benefit if we have multiple copies to generate data BUT we serialize the insert?

     

    2.)

    Add a "Lock table test_tab in exclusive mode" just before the "forall insert" command.

     

    Start up as many sessions as you have CPUs, giving each a different range.

    They will populate their arrays in parallel - but the first one to get to the insert will lock the table, blocking the rest; when it commits the second one will get its lock and insert, blocking the rest, and so on.

    Why do we need the "Lock table test_tab in exclusive mode" before the "forall insert"?

     

    I do not see why we have to execute the bulk-insert step by step after each other.

     

    Thank You in advance.

  • 14. Re: Row Generating in faster way
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated

    David,

     

    As I said in my earlier post, most of the time spent on my system was in the call to dbms_random() (something like 85% of the time), so the most significant improvement could be gained by making that component parallel, while the time to insert is relatively unimportant; moreover, I was anticipating a problem with contention on the inserts - either due to buffer busy waits, or log writer waits on massively parallel inserts.

     

    The table locking was simply a convenient way to implement the serialisation. In fact I've done a couple of quick tests on a small machine, and saw virtually no time lost on the inserts - but I only had 2 CPUs, so minimum risk of contention.  An alternative to locking, anyway, would be to add the append_values hint to the insert to use direct path array inserts which (I think) would mean an implicit table lock.  This MIGHT speed up the processing (it would certainly reduce the undo and redo somewhat, especially if this was a test database that was not running in archivelog mode) - on the other hand, it might slow things down as each session would have to wait for its own data block writes to complete.

     

    A simple of speeding up the pl/sql, by the way, would be to set an arraysize of 100,000 and loop around it 100 times (adjust to suit) - committing after each loop. This should be noticeable faster than growing an array of 10M entries. In this case the append_values hint may be more appropriate since any waits for locks would be waiting for much smaller writes to complete.

     

    On my little machine the best times I got were (parallel 2):  50 seconds for the parallel SQL solution and 32 seconds for the parameterised pl/sql solution using an arraysize of 100000 and normal inserts with no locking.

     

    Regards

    Jonathan Lewis

1 2 3 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points