Forum Stats

  • 3,740,538 Users
  • 2,248,269 Discussions
  • 7,861,320 Comments

Discussions

ORA-30009: Not enough memory for CONNECT BY operation

Quanwen Zhao
Quanwen Zhao Member Posts: 573 Blue Ribbon
edited Jun 18, 2018 1:38AM in SQL & PL/SQL

Hi, experts

Currently I wanna create a test table TEST1 for doing something else on my oracle db server 11.2.0.4.0 for Linux x86_64.

At the same time I don't make sure whether creating table failed is related to the value (default 15) of the parameter arraysize of SQL*Plus.

The following are my some operation steps,

(1) on SQL*Plus,

[email protected]> show arraysizearraysize [email protected]> set arraysizeSP2-0267: arraysize option 0 out of range (1 through 5000)

(2) on SQLcl 18.1.1,

SQL> show arraysizearraysize 15SQL> set arraysizeSQLPLUS command failed - not enough arguments

SQL> set arraysize 5000SQL> SQL> show arraysizearraysize 5000SQL> SQL> set arraysize 5001SP2-0267: arraysize option 5001 out of range (1 through 5000)SQL> set arraysize 5000SQL> SQL> show arraysizearraysize 5000

SQL> SQL> set timing onSQL> show timingtiming ONSQL> SQL> create table test1  2  segment creation immediate  3  nologging  4  as  5  select rownum as id  6         , to_char(sysdate + rownum/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime  7         , trunc(dbms_random.value(0, 100)) as random_id  8         , dbms_random.string('x', 20) random_string  9  from dual 10  connect by level <= 100000000 11  ;Error starting at line : 1 in command -create table test1segment creation immediatenologgingasselect rownum as id       , to_char(sysdate + rownum/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime       , trunc(dbms_random.value(0, 100)) as random_id       , dbms_random.string('x', 20) random_stringfrom dualconnect by level <= 100000000Error report -ORA-30009: Not enough memory for CONNECT BY operationElapsed: 00:37:11.019

As you can see it has shown error ORA-30009 finally, could you help me for trouble-shooting it?

Thanks in advance.

Best Regards

Quanwen Zhao

BEDEJarkko TurpeinenQuanwen ZhaoPierre Yotti

Best Answer

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,673 Gold Crown
    edited Jun 8, 2018 9:49AM Accepted Answer

    You've already got the answer to the question about the 30009 - recursive descent just takes memory.

    The Gaz in Oz approach is the most common - and very efficient - workaround, but a couple of extra comments on your code:

    a) always start with a dbms_random.seed(0)  - or some other seed point - so that you can reproduce the same random value on repeat tests.

    b) I tend to use exponent notation to avoid micounting on big numbers as in:  1e8 for 100,000,000

    c) Try to be a bit clever about use of dbms_random - it's CPU intensive:

    e.g.

    create table t1

    nologging

    as

    with generator as (

            select

                    rownum id

            from dual

            connect by

                    level <= 1e4

    )

    select

    --      dbms_random.string('U',20)      padding

            rpad(dbms_random.string('U',6),20)      padding

    from

            generator       v1,

            generator       v2

    where

            rownum <= 1e6

    ;

    Note the rpad() of a 6 character random string - this gets me 20 character strings which are guaranteed nearly unique, but does a lot less work than generating all characters randomly. On my 1M rows table the padded version took 41 seconds to run, the pure dbms_random version took 115 seconds.  That matters a lot when you're try to build 1e8 rows.  (When you've decided your data patterns, test the time to generate a small data set before you create the big one).

    I tend to stick with 1e4 rows for my WITH subquery - but that's good for 1e8 rows in the result set, and I don't often worry about adding in a 3rd generator unless I want to use v1.id and v2.id (and potentially v3.id) to produce particular patterns - which I happen to have one in the following posting: https://www.red-gate.com/simple-talk/sql/oracle/massive-deletes-part-2/

    Regards

    Jonathan Lewis

    Quanwen ZhaoQuanwen Zhao
«1

Answers

  • John Thorton
    John Thorton Member Posts: 14,493 Silver Crown
    edited Jun 7, 2018 11:01PM

    unwilling or incapable to use GOOGLE yourself?

    [[email protected] trace]$ oerr ora 30009

    30009, 0000, "Not enough memory for %s operation"

    // *Cause: The memory size was not sufficient to process all the levels of the

    //         hierarchy specified by the query.

    // *Action: In WORKAREA_SIZE_POLICY=AUTO mode, set PGA_AGGREGATE_TARGET to

    //          a reasonably larger value.

    //          Or, in WORKAREA_SIZE_POLICY=MANUAL mode, set SORT_AREA_SIZE to a

    //          reasonably larger value.

    Pierre Yotti
  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Jun 8, 2018 2:00AM

    What has arraysize got to do with this at all? That is just a setting to determine how many rows to fetch from a cursor each iteration..

    Either reduce the amount of times you are connecting the row in dual to itself - do you really need 100 million rows just to try something?

    Or rewrite your statement to use cross joins or something else to prevent needing to store so much in memory.

  • Quanwen Zhao
    Quanwen Zhao Member Posts: 573 Blue Ribbon
    edited Jun 8, 2018 2:45AM
    Either reduce the amount of times you are connecting the row in dual to itself - do you really need 100 million rows just to try something?Or rewrite your statement to use cross joins or something else to prevent needing to store so much in memory.

    Hi, Andrew

    Yep, I wanna use this test table to simulate production system's table size (approximately 6 gigabytes) for testing something.

    Last time I used this figure "10000000" with connect by but the table's size is about 600 megabytes.

    SQL> set timing onSQL> SQL> create table test1  2  segment creation immediate  3  nologging  4  as  5  select rownum as id  6         , to_char(sysdate + rownum/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime  7         , trunc(dbms_random.value(0, 100)) as random_id  8         , dbms_random.string('x', 20) random_string  9  from dual 10  connect by level <= 10000000  <<== 11  ;Table TEST1 created.Elapsed: 00:15:58.255SQL> select sum(blocks),sum(bytes)/1024/1024 size_mb from user_segments  2  where segment_name = 'TEST1'  3  and segment_type = 'TABLE';SUM(BLOCKS)    SIZE_MB----------- ----------      76800        600

    It's so far as 6 gigabytes so I change that figure to be "100000000" for retry it.

    Best Regards

    Quanwen Zhao

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Jun 8, 2018 2:50AM
    Quanwen Zhao wrote:Either reduce the amount of times you are connecting the row in dual to itself - do you really need 100 million rows just to try something?Or rewrite your statement to use cross joins or something else to prevent needing to store so much in memory.Hi, Andrew Yep, I wanna use this test table to simulate production system's table size (approximately 6 gigabytes) for testing something.Last time I used this figure "10000000" with connect by but the table's size is about 600 megabytes.
    1. SQL>settimingon
    2. SQL>
    3. SQL>createtabletest1
    4. 2segmentcreationimmediate
    5. 3nologging
    6. 4as
    7. 5selectrownumasid
    8. 6,to_char(sysdate+rownum/24/3600,'yyyy-mm-ddhh24:mi:ss')asinc_datetime
    9. 7,trunc(dbms_random.value(0,100))asrandom_id
    10. 8,dbms_random.string('x',20)random_string
    11. 9fromdual
    12. 10connectbylevel<=10000000<<==
    13. 11;
    14. TableTEST1created.
    15. Elapsed:00:15:58.255
    16. SQL>selectsum(blocks),sum(bytes)/1024/1024size_mbfromuser_segments
    17. 2wheresegment_name='TEST1'
    18. 3andsegment_type='TABLE';
    19. SUM(BLOCKS)SIZE_MB
    20. ---------------------
    21. 76800600
    SQL> set timing on SQL>  SQL> create table test1  2 segment creation immediate  3 nologging  4 as  5 select rownum as id  6 , to_char(sysdate + rownum/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime  7 , trunc(dbms_random.value(0, 100)) as random_id  8 , dbms_random.string('x', 20) random_string  9 from dual  10 connect by level <= 10000000 <<==  11 ;   Table TEST1 created.   Elapsed: 00:15:58.255   SQL> select sum(blocks),sum(bytes)/1024/1024 size_mb from user_segments  2 where segment_name = 'TEST1'  3 and segment_type = 'TABLE';   SUM(BLOCKS) SIZE_MB ----------- ----------  76800 600
    It's so far as 6 gigabytes so I change that figure to be "100000000" for retry it.Best RegardsQuanwen Zhao

    Its likely you need the table to have the same row distribution, the same indexes etc if you want your tests to be meaningful.

    Cant you spin up a duplicate DB from a backup of production, you would need to obfuscate data that doesn’t belong outside of production before you use it though (And if this effects your selectivities then you’ll need to change your queries accordingly)

    To directly answer this, like I said before you can use a cross join to bump this up, if level<=100,000 is fine then cross join that to a generated row source of 100 rows (I can’t count your 0s properly on my phone screen so you’ll have to do the maths)

  • Quanwen Zhao
    Quanwen Zhao Member Posts: 573 Blue Ribbon
    edited Jun 8, 2018 2:57AM
    unwilling or incapable to use GOOGLE yourself?[[email protected] trace]$ oerr ora 3000930009, 0000, "Not enough memory for %s operation"// *Cause: The memory size was not sufficient to process all the levels of the//         hierarchy specified by the query.// *Action: In WORKAREA_SIZE_POLICY=AUTO mode, set PGA_AGGREGATE_TARGET to//          a reasonably larger value.//          Or, in WORKAREA_SIZE_POLICY=MANUAL mode, set SORT_AREA_SIZE to a//          reasonably larger value.

    Hello, John Thorton

    As you know, due to country's specific situation I haven't never logged in Google in China. BTW I've searched other posts but it hasn't any clues and answers so I come here.

    Best Regards

    Quanwen Zhao

    Jarkko TurpeinenPierre Yotti
  • BEDE
    BEDE Oracle Developer Member Posts: 2,266 Gold Trophy
    edited Jun 8, 2018 3:00AM

    Wouldn't export and import provide better conditions to test whatever you wish to test?

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Jun 8, 2018 3:07AM
    BEDE wrote:Wouldn't export and import provide better conditions to test whatever you wish to test?

    Any logical duplication (like export and import) will leave the rows in any order it likes. Your indexes will then all be built from scratch - your statistics will be different and the performance of using your indexes will be different.

    If you want to test performance then you absolutely must use physical duplication

  • Gaz in Oz
    Gaz in Oz Member Posts: 3,777 Bronze Crown
    edited Jun 8, 2018 3:14AM

    Try using less memory with "connect by":

    Generating lots of rows using connect by – safely! | Tanel Poder's Performance & Troubleshooting blog

    For example:

    create table test1segment creation immediatenologgingselect r as id     , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime     , trunc(dbms_random.value(0, 100)) as random_id     , dbms_random.string('x', 20) random_stringfrom  (SELECT rownum r       FROM  (SELECT level r FROM dual CONNECT BY level <= 1000) d1,             (SELECT level r FROM dual CONNECT BY level <= 1000) d2,             (SELECT level r FROM dual CONNECT BY level <= 1000) d3       WHERE  rownum <= 100000000);

    ...generating 100,000,000 will take time.

    BEDE
  • BEDE
    BEDE Oracle Developer Member Posts: 2,266 Gold Trophy
    edited Jun 8, 2018 3:22AM

    Physical duplication would mean copying the database files as they are - a cold backup. But that may not be feasible. So, for some tests, export and import of some data may be used, although, as you have pointed out it will not be the same for performance. Moreover, for exact duplication, not only a cold backup would be needed, but also exactly the same kind of system (processors, memory and disks), which I pretty much doubt that many organizations have. For development and test environments pretty many use not so powerful computers as in production, and, if in test environment, on a not so powerful machine, some batch process performs satisfactory, then, in production, it should do no worse...

    I meant export and import in order to have the same data.

  • John_K
    John_K Member Posts: 2,498 Gold Trophy
    edited Jun 8, 2018 3:25AM

    So instead of using a very narrow table where each row is going to consume a very small amount of space, why not base your query on something a bit more hefty like dba_objects?

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Jun 8, 2018 3:43AM
    BEDE wrote:Physical duplication would mean copying the database files as they are - a cold backup. But that may not be feasible. So, for some tests, export and import of some data may be used, although, as you have pointed out it will not be the same for performance. Moreover, for exact duplication, not only a cold backup would be needed, but also exactly the same kind of system (processors, memory and disks), which I pretty much doubt that many organizations have. For development and test environments pretty many use not so powerful computers as in production, and, if in test environment, on a not so powerful machine, some batch process performs satisfactory, then, in production, it should do no worse... I meant export and import in order to have the same data.

    If export and import are good enough then there is absolutely no way its necessary to create such a big random data set from dual.

    You don’t need a cold backup for physical duplication. Hot backups work fine and should already be available. I imagine PDB cloning can achieve the same.

    Sure, different hardware will perform differently. But so long as you use default system stats (which is the current recommendation) your queries will do the same plans (for the same bind peeking profile) and any system related time can be scaled up or down depending on your known hardware differences. Sure, it’s not quick to determine this but if you’re going to test on different hardware you are going to have to do this, otherwise whats the point of testing?

    If your company says they can’t afford to test properly then theyll have to deal with the consequences

  • Quanwen Zhao
    Quanwen Zhao Member Posts: 573 Blue Ribbon
    edited Jun 9, 2018 10:55PM
    create table test1segment creation immediatenologgingselect r as id     , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime     , trunc(dbms_random.value(0, 100)) as random_id     , dbms_random.string('x', 20) random_stringfrom  (SELECT rownum r       FROM  (SELECT level r FROM dual CONNECT BY level <= 1000) d1,             (SELECT level r FROM dual CONNECT BY level <= 1000) d2,             (SELECT level r FROM dual CONNECT BY level <= 1000) d3       WHERE  rownum <= 100000000);

    Hi, Gaz in Oz

    According to your vice SQL code, I've created test table test1 and it has shown this error ORA-00922 below,

    SQL> create table test1  2  segment creation immediate  3  nologging  4  select r as id  5       , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime  6       , trunc(dbms_random.value(0, 100)) as random_id  7       , dbms_random.string('x', 20) random_string  8  from  (SELECT rownum r  9         FROM  (SELECT level r FROM dual CONNECT BY level <= 1000) d1, 10               (SELECT level r FROM dual CONNECT BY level <= 1000) d2, 11               (SELECT level r FROM dual CONNECT BY level <= 1000) d3 12         WHERE  rownum <= 100000000 13  );Error starting at line : 1 in command -create table test1segment creation immediatenologgingselect r as id     , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime     , trunc(dbms_random.value(0, 100)) as random_id     , dbms_random.string('x', 20) random_stringfrom  (SELECT rownum r       FROM  (SELECT level r FROM dual CONNECT BY level <= 1000) d1,             (SELECT level r FROM dual CONNECT BY level <= 1000) d2,             (SELECT level r FROM dual CONNECT BY level <= 1000) d3       WHERE  rownum <= 100000000)Error report -ORA-00922: missing or invalid option

    Best Regards

    Quanwen Zhao

  • Gaz in Oz
    Gaz in Oz Member Posts: 3,777 Bronze Crown
    edited Jun 8, 2018 5:36AM

    Missing "as" keyword.

    create table test1segment creation immediatenologging asselect r as id     , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime     , trunc(dbms_random.value(0, 100)) as random_id     , dbms_random.string('x', 20) random_stringfrom  (SELECT rownum r       FROM  (SELECT level r FROM dual CONNECT BY level <= 1000) d1,             (SELECT level r FROM dual CONNECT BY level <= 1000) d2,             (SELECT level r FROM dual CONNECT BY level <= 1000) d3       WHERE  rownum <= 100000000);
    Quanwen Zhao
  • BEDE
    BEDE Oracle Developer Member Posts: 2,266 Gold Trophy
    edited Jun 8, 2018 5:38AM

    The select itself is running in Oracle 11.2.0.4. And so does the create table as select. All you need is an "AS" after nologging (if that is exactly as you have posted).

    Of course the figure I used was only 10000, for I wasn't that insane to produce that many rows you need.

    Quanwen Zhao
  • Etbin
    Etbin Member Posts: 8,968 Gold Crown
    edited Jun 8, 2018 8:36AM

    Do it in steps (what follows is just for testing)

    create table etbin as

      select level id, 

             to_char(trunc(sysdate) + level/24/3600,'yyyy-mm-dd hh24:mi:ss') as inc_datetime,

             trunc(dbms_random.value(0,100)) as random_id,

             dbms_random.string('x',20) random_string 

        from dual 

      connect by level <= 10

    followed by

    begin

      for i in 1 .. 3

      loop

      insert into etbin

      select 10 * i + level id, 

             to_char(trunc(sysdate) + (10 * i + level)/24/3600,'yyyy-mm-dd hh24:mi:ss') as inc_datetime,

             trunc(dbms_random.value(0,100)) as random_id,

             dbms_random.string('x',20) random_string 

        from dual 

      connect by level <= 10;

      commit; 

      end loop;

    end;

    Regards

    Etbin

    Quanwen Zhao
  • James Su
    James Su Member Posts: 1,112 Gold Trophy
    edited Jun 8, 2018 9:35AM
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,673 Gold Crown
    edited Jun 8, 2018 9:49AM Accepted Answer

    You've already got the answer to the question about the 30009 - recursive descent just takes memory.

    The Gaz in Oz approach is the most common - and very efficient - workaround, but a couple of extra comments on your code:

    a) always start with a dbms_random.seed(0)  - or some other seed point - so that you can reproduce the same random value on repeat tests.

    b) I tend to use exponent notation to avoid micounting on big numbers as in:  1e8 for 100,000,000

    c) Try to be a bit clever about use of dbms_random - it's CPU intensive:

    e.g.

    create table t1

    nologging

    as

    with generator as (

            select

                    rownum id

            from dual

            connect by

                    level <= 1e4

    )

    select

    --      dbms_random.string('U',20)      padding

            rpad(dbms_random.string('U',6),20)      padding

    from

            generator       v1,

            generator       v2

    where

            rownum <= 1e6

    ;

    Note the rpad() of a 6 character random string - this gets me 20 character strings which are guaranteed nearly unique, but does a lot less work than generating all characters randomly. On my 1M rows table the padded version took 41 seconds to run, the pure dbms_random version took 115 seconds.  That matters a lot when you're try to build 1e8 rows.  (When you've decided your data patterns, test the time to generate a small data set before you create the big one).

    I tend to stick with 1e4 rows for my WITH subquery - but that's good for 1e8 rows in the result set, and I don't often worry about adding in a 3rd generator unless I want to use v1.id and v2.id (and potentially v3.id) to produce particular patterns - which I happen to have one in the following posting: https://www.red-gate.com/simple-talk/sql/oracle/massive-deletes-part-2/

    Regards

    Jonathan Lewis

    Quanwen ZhaoQuanwen Zhao
  • Quanwen Zhao
    Quanwen Zhao Member Posts: 573 Blue Ribbon
    edited Jun 8, 2018 11:16PM
    Missing "as" keyword.

    Hi, Gaz

    Very nice, many thanks for your reminding and guiding.

    Nevertheless the advice afterwards Jonathan has given to me is more better, haven't you minded whether I mark his suggestion as a good answer?

    Best Regards

    Quanwen Zhao

  • Quanwen Zhao
    Quanwen Zhao Member Posts: 573 Blue Ribbon
    edited Jun 8, 2018 11:20PM
    James Su wrote:use pipelined function instead of connect by:http://jonathanlewis.wordpress.com/2011/12/08/test-data/#comments

    Hello, James

    Thank you, according to your advice I'll read that post from Jonathan.

    Best Regards

    Quanwen Zhao

  • padders
    padders Member Posts: 1,053 Silver Trophy
    edited Jun 9, 2018 3:16AM

    > Try to be a bit clever about use of dbms_random - it's CPU intensive:

    Agreed and dbms_random.string appears increasingly inefficient for longer strings (which is hardly surprising given that it is essentially a PL/SQL loop).

    I personally find that if you are not too precious about actual string content then ora_hash/standard_hash/sys_guid can be combined to very efficiently generate quite a range of things without needing PL/SQL function calls.

  • Gaz in Oz
    Gaz in Oz Member Posts: 3,777 Bronze Crown
    edited Jun 9, 2018 6:10AM
    Nevertheless the advice afterwards Jonathan has given to me is more better, haven't you minded whether I mark his suggestion as a good answer?

    Sure, all good.

    Quanwen Zhao
  • Quanwen Zhao
    Quanwen Zhao Member Posts: 573 Blue Ribbon
    edited Jun 10, 2018 10:58PM
    create table t1nologgingaswith generator as (selectrownum idfrom dualconnect bylevel <= 1e4)select-- dbms_random.string('U',20) paddingrpad(dbms_random.string('U',6),20) paddingfromgenerator v1,generator v2whererownum <= 1e6;

    Hi, Jonathan

    I've tested the SQL code of creating table you supplied and it has spent only 32 seconds creating successfully.

    According to Gaz in Oz's sql, I've created table test1 successfully and it has almost spent about 2 hours and 35 minutes. Why is it so slow?

    SQL> create table test1  2  segment creation immediate  3  nologging  4  as  5  select r as id  6       , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime  7       , trunc(dbms_random.value(0, 100)) as random_id  8       , dbms_random.string('x', 20) random_string  9  from  (SELECT rownum r 10         FROM  (SELECT level r FROM dual CONNECT BY level <= 1000) d1, 11               (SELECT level r FROM dual CONNECT BY level <= 1000) d2, 12               (SELECT level r FROM dual CONNECT BY level <= 1000) d3 13         WHERE  rownum <= 100000000 14  );Table TEST3 created.Elapsed: 02:34:55.198

    BTW the parameter value of PGA_AGGREGATE_TARGET is 20g.

    SQL> show parameter pgaNAME                 TYPE        VALUE -------------------- ----------- ----- pga_aggregate_target big integer 20G

    Based on your nice SQL code, I've mimicked my own the following,

    SQL> drop table t1;Table dropped.SQL> execute dbms_random.seed(0);PL/SQL procedure successfully completed.SQL> set timing onSQL> SQL> create table t1  2  segment creation immediate  3  nologging  4  as  5  with generator as (  6  select rownum id  7  from dual  8  connect by level <= 1e4  9  ) 10  select rownum as id 11         , to_char(sysdate + rownum/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime 12         , trunc(dbms_random.value(0, 100)) as random_id 13         , dbms_random.string('x', 20) random_string 14  from generator v1 15       , generator v2 16  where rownum <= 1e8 17  ;Table T1 created.Elapsed: 02:32:54.566SQL> select sum(blocks),sum(bytes)/1024/1024 size_mb from user_segments  2  where segment_name ='T1'  3  and segment_type = 'TABLE';SUM(BLOCKS)    SIZE_MB----------- ----------     770176       6017SQL> select count(distinct random_string) from t1;COUNT(DISTINCTRANDOM_STRING)----------------------------                   100000000Elapsed: 00:01:35.487SQL> show parameter cpuNAME                            TYPE    VALUE ------------------------------- ------- ----- cpu_count                       integer 32    parallel_threads_per_cpu        integer 2     resource_manager_cpu_allocation integer 32SQL> select /*+ parallel */ count(distinct random_string) from t1;COUNT(DISTINCTRANDOM_STRING)----------------------------                   100000000Elapsed: 00:00:05.437

    Furthermore it's very slow as well and still spent approximately 2 hours and 33 minutes.

    I suppose that I understand your suggestion as follows,

    • firstly producing a small result set of 1e4 (10,000) rows via CONNECT BY and ROWNUM in "with ... as ... ()";
    • secondly building the cross join of two or three tables (all based on the same table "generator");
    • 1e4 * 1e4 = 1e8, this could easily produce 1e8 rows of data;

    Best Regards

    Quanwen Zhao

  • John_K
    John_K Member Posts: 2,498 Gold Trophy
    edited Jun 11, 2018 3:36AM

    Is it any quicker if you take advantage of materialization?

    with x as ((SELECT level r FROM dual CONNECT BY level <= 1000))select r as id       , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime       , trunc(dbms_random.value(0, 100)) as random_id       , dbms_random.string('x', 20) random_string  from  (SELECT rownum r         FROM  x d1,               x d2,               x d3         WHERE  rownum <= 100000000  );

    Also, whilst it doesn't seem to affect the plan in this case, I always find it useful to use the cardinality hint on a row generator so the optimizer has a bit more information on the number of rows it's going to process. This is particularly important with dual, because the connect by cardinality is calculated based on dual, which is of course, one row.

    with x as ((SELECT /*+cardinality(1000)*/ level r FROM dual CONNECT BY level <= 1000))select r as id       , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime       , trunc(dbms_random.value(0, 100)) as random_id       , dbms_random.string('x', 20) random_string  from  (SELECT /*+all_rows*/ rownum r         FROM  x d1,               x d2,               x d3         WHERE  rownum <= 100000000  );

    Same plan on my system - but the cardinalities are correct.

    ---------------------------------------------------------------------------------------------------------------| Id  | Operation                      | Name                        | E-Rows |E-Bytes| Cost (%CPU)| E-Time   |---------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT               |                             |    100M|  1239M|  2137K  (7)| 00:01:24 ||   1 |  TEMP TABLE TRANSFORMATION     |                             |        |       |            |          ||   2 |   LOAD AS SELECT               | SYS_TEMP_0FD9D6A1A_7657F032 |        |       |            |          ||*  3 |    CONNECT BY WITHOUT FILTERING|                             |        |       |            |          ||   4 |     FAST DUAL                  |                             |      1 |       |     2   (0)| 00:00:01 ||   5 |   VIEW                         |                             |    100M|  1239M|  2137K  (7)| 00:01:24 ||*  6 |    COUNT STOPKEY               |                             |        |       |            |          ||   7 |     MERGE JOIN CARTESIAN       |                             |   1000M|       |  2137K  (7)| 00:01:24 ||   8 |      MERGE JOIN CARTESIAN      |                             |   1000K|       |  2137   (7)| 00:00:01 ||   9 |       VIEW                     |                             |   1000 |       |     2   (0)| 00:00:01 ||  10 |        TABLE ACCESS FULL       | SYS_TEMP_0FD9D6A1A_7657F032 |   1000 | 13000 |     2   (0)| 00:00:01 ||  11 |       BUFFER SORT              |                             |   1000 |       |  2137   (7)| 00:00:01 ||  12 |        VIEW                    |                             |   1000 |       |     2   (0)| 00:00:01 ||  13 |         TABLE ACCESS FULL      | SYS_TEMP_0FD9D6A1A_7657F032 |   1000 | 13000 |     2   (0)| 00:00:01 ||  14 |      BUFFER SORT               |                             |   1000 |       |  2137K  (7)| 00:01:24 ||  15 |       VIEW                     |                             |   1000 |       |     2   (0)| 00:00:01 ||  16 |        TABLE ACCESS FULL       | SYS_TEMP_0FD9D6A1A_7657F032 |   1000 | 13000 |     2   (0)| 00:00:01 |---------------------------------------------------------------------------------------------------------------

    This is more applicable though when you're then joining that with other tables. For example, you might generate a row for every day of the year over the next 10 years, and join that to another table - without cardinality hinting the CBO may opt for a nested loops approach because it calculates only one row will be returned from the "generator", when in reality it's more like 3.6k. I've seen problems compound like this to generate loops that get executed several billion times!

  • Quanwen Zhao
    Quanwen Zhao Member Posts: 573 Blue Ribbon
    edited Jun 12, 2018 2:40AM
    with x as ((SELECT level r FROM dual CONNECT BY level <= 1000))select r as id     , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime     , trunc(dbms_random.value(0, 100)) as random_id     , dbms_random.string('x', 20) random_stringfrom  (SELECT rownum r       FROM  x d1,             x d2,             x d3       WHERE  rownum <= 100000000);

    Hello, John_K

    I've also tested your code on my machine and it still need about 2 hours and 38 minutes to generate that test table t1.

    SQL> set timing onSQL> SQL> create table t1  2  segment creation immediate  3  nologging  4  as  5  with x as ((SELECT /*+cardinality(1000)*/ level r FROM dual CONNECT BY level <= 1e3))  6  select r as id  7       , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime  8       , trunc(dbms_random.value(0, 100)) as random_id  9       , dbms_random.string('x', 20) random_string 10  from  (SELECT /*+all_rows*/ rownum r 11         FROM  x d1, 12               x d2, 13               x d3 14         WHERE  rownum <= 1e8 15  );Table T1 created.Elapsed: 02:37:24.056

    How long does it spend about creating the 1e8 rows of data on table t1 on your system?

    On the other hand, I repeat reading the content Jonathan was in response to me last time. He pointed out that it's best to use rpad function for filling the generated random strings, avoiding more consumption of CPU. Therefore I modify my code as well.

    SQL> drop table t1 purge;Table dropped.SQL> SQL> set timing onSQL> SQL> create table t1  2  segment creation immediate  3  nologging  4  as  5  with x as ((SELECT level r FROM dual CONNECT BY level <= 1e4))  6  select r as id  7         , to_char(sysdate + r/24/3600, 'yyyy-mm-dd hh24:mi:ss') as inc_datetime  8         , trunc(dbms_random.value(0, 100)) as random_id  9         , rpad(dbms_random.string('x', 6), 20) random_string 10  from  (SELECT /*+ no_merge */ 11         rownum r 12         FROM  x d1, 13               x d2 14         WHERE  rownum <= 1e8 15  );Table T1 created.Elapsed: 01:09:30.462SQL> SQL> set timing offSQL> SQL> select sum(bytes)/1024/1024 size_mb from user_segments  2  where segment_name = 'T1';   SIZE_MB----------      5990

    Do you have faster and more efficient method of it? About almost spending 1 hours and 10 minutes is also very slow this time.

    Best Regards

    Quanwen Zhao

  • John_K
    John_K Member Posts: 2,498 Gold Trophy
    edited Jun 12, 2018 3:25AM

    It depends on what you want for your data I guess. For example, at the moment you're padding a random string out to 20 characters - is there any reason why you can't pad that out to say 4000 characters? Then you don't need as many rows to reach your desired end table size. i.e.

    create table xxt1 as
      select rpad(dbms_random.string('X',5),20) from dual
      connect by level <= 100000;create table xxt2 as
      select rpad(dbms_random.string('X',5),4000) from dual
      connect by level <= 100000;select sum(bytes)/1024/1024 size_mb from user_segments 
    where segment_name = 'XXT1'; select sum(bytes)/1024/1024 size_mb from user_segments 
    where segment_name = 'XXT2'; SQL> set timing on
    SQL> @a.sqlTable created.Elapsed: 00:00:04.10Table created.Elapsed: 00:00:09.49  SIZE_MB
    ----------
            3Elapsed: 00:00:00.17  SIZE_MB
    ----------
        785.25Elapsed: 00:00:00.13
    SQL>

    As you can see, the table is significantly larger simply because I've padded it with a load of space. So rather than generating 100M rows, you might be able to get away with less than 1M given that your only worry seems to be the final size of the end table? Similarly, you can also add dummy columns in there that are just padded space if you're not too bothered about the content.

    Quanwen Zhao
This discussion has been closed.