Forum Stats

  • 3,734,268 Users
  • 2,246,935 Discussions
  • 7,857,216 Comments

Discussions

Best Syntax for Bulk Collect

Migs_Isip
Migs_Isip Member Posts: 340
edited Oct 30, 2016 5:45AM in SQL & PL/SQL

Hi All, i was wondering if there's a better or preferred way to write Bulk Collects.

I've seen different approaches and some are in the below sample Code:

    set serveroutput on;    declare        cursor  ext_data_file is         select  *        from    PER_ALL_PEOPLE_F;                TYPE type_emp_rec_ext IS TABLE OF ext_data_file%ROWTYPE index by pls_integer;                l_emp_tab_ext_raw         type_emp_rec_ext;        l_emp_tab_ext_clean       type_emp_rec_ext;           l_start                   number;        l_count                   number := 0;          l_count2                  number := 0;      begin        /* Regular Cursor Fetch */                l_start := DBMS_UTILITY.get_time;        FOR cur_rec IN ext_data_file LOOP            l_count2 := l_count2 + 1;        END LOOP;                DBMS_OUTPUT.put_line('Regular    (' || l_count2 || ' rows): ' || (DBMS_UTILITY.get_time - l_start) ||' ms');               /* Bulk Collect 1 */                l_start := DBMS_UTILITY.get_time;               open   ext_data_file;        fetch  ext_data_file        bulk collect        into   l_emp_tab_ext_raw;        close  ext_data_file;                DBMS_OUTPUT.put_line('Bulk 1  (' || l_emp_tab_ext_raw.count || ' rows): ' || (DBMS_UTILITY.get_time - l_start) ||' ms');                /* Bulk Collect 2 */                l_start := DBMS_UTILITY.get_time;                SELECT *        BULK COLLECT INTO         l_emp_tab_ext_clean        FROM   PER_ALL_PEOPLE_F;                 DBMS_OUTPUT.put_line('Bulk 2  (' || l_emp_tab_ext_clean.count || ' rows): ' || (DBMS_UTILITY.get_time - l_start) ||' ms');                                     /* Bulk Collect 3 */                l_start := DBMS_UTILITY.get_time;                OPEN ext_data_file;        LOOP            FETCH ext_data_file            BULK COLLECT INTO l_emp_tab_ext_raw LIMIT 1000;                l_count := l_emp_tab_ext_raw.count + l_count;            EXIT WHEN l_emp_tab_ext_raw.count = 0;                END LOOP;        CLOSE ext_data_file;              DBMS_OUTPUT.put_line('Bulk 3  (' || l_count || ' rows): ' || (DBMS_UTILITY.get_time - l_start) ||' ms');                                   EXCEPTION        WHEN OTHERS THEN            dbms_output.put_line(sqlerrm);    end;

In Terms of Performance, i was surprised to see a difference. the Regular Fetch was faster than the Bulk Collects!

    Regular (1202666 rows): 4174 ms    Bulk 1  (1202666 rows): 6369 ms    Bulk 2  (1202666 rows): 7204 ms    Bulk 3  (1202666 rows): 4380 ms

Here are my Observations:

    Bulk Collect 1 : Commonly used when using an Explicit Cursor and has no LIMIT Clause. Also when the cursor Involves numerous Joined Tables.    Bulk Collect 2 : Commonly used when using a single table or an Implicit Cursor and has no LIMIT Clause.    Bulk Collect 3 : Commonly used when using an Explicit Cursor and has a LIMIT Clause.

Kindly confirm if above observations are correct.

And my Questions are:

    1. Why is the Regular Fetch Faster?    2. Why of these Bulk Collects are the Best Syntax to Use in Terms of Overall Performance and Memory Consumption)

I have always been using Syntax 1, but i reckon the best is Syntax 3.

Database Details:

    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production    PL/SQL Release 11.2.0.4.0 - Production    "CORE 11.2.0.4.0 Production"    TNS for Solaris: Version 11.2.0.4.0 - Production    NLSRTL Version 11.2.0.4.0 - Production

Thanks!

Message was edited by: Migs_Isip

Migs_IsipAndrewSayerWilliam Robertson

Best Answer

  • sdstuber
    sdstuber Member Posts: 215 Bronze Badge
    edited Oct 29, 2016 7:52PM Accepted Answer
    So even if i don't specify an explicit BULK COLLECT, the Database will automatically convert it to a BULK COLLECT with a LIMIT of 100. correct?

    Yes, essentially.  it's not quite the same thing though because you don't have access to the internal collection.  So, while you can iterate within the cursor loop,  you can't do a FORALL bulk processing with the results or any other collection actions.

    If however, iteration is all you need, then a cursor for loop will often be not only the simplest, but also the fastest.

    There are, of course, exceptions and if the row is large, an internal 100-row collection might be consuming more memory than you want.

    Conversely, if you have lots of memory then it can (not always) be faster to do bulk collect manually with larger limits.

    As mentioned above though, there is no "one way" that will always be best for all solutions.  So, keep them all in your tool box.  Pull out the one you need, when you need it.

    Migs_Isip

Answers

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Oct 29, 2016 2:38PM

    Best use case of anything depends on the entire situation, in most scenarios I have no found a requirement for bulk collect or anything that requires going through a result set row by row (slow by slow).

    If you use bulk collect without a limit clause then you are loading the entire result set into (expensive) PGA memory in one go. Plsql will automatically handle an implicit for loop (for record_type in (query) loop) using good (in most cases) fetch sizes to reduce memory footprint.

    Btw

    EXCEPTION
            WHEN OTHERS THEN
                dbms_output.put_line(sqlerrm);
        end;

    will take any decent code and make it absolutely worthless.

  • John Thorton
    John Thorton Member Posts: 14,493 Silver Crown
    edited Oct 29, 2016 2:52PM

    do NOT do in PL/SQL that which can be done in plain SQL; which eliminates BULK in the vast majority of the cases.

  • Unknown
    edited Oct 29, 2016 5:07PM
    TYPE type_emp_rec_ext IS TABLE OF ext_data_file%ROWTYPE index by pls_integer;

    Why are you using an associative array. Just use a nested table - that is what they are for.

           SELECT *        BULK COLLECT INTO         l_emp_tab_ext_clean        FROM   PER_ALL_PEOPLE_F;                 DBMS_OUTPUT.put_line('Bulk 2  (' || l_emp_tab_ext_raw.count || ' rows): ' || (DBMS_UTILITY.get_time - l_start) ||' ms');

    Why are you collecting into 'l_emp_tab_ext_clean' but displaying the count of 'l_emp_tab_ext_raw'?

    Kindly confirm if above observations are correct.

    None of the options has ANYTHING to do with what type of cursor you use or how many tables the query uses. The %ROWTYPE does just that - takes ALL columns no matter where they came from.

    1. Why is the Regular Fetch Faster?

    No way to know. Your system could have had all kinds of other things going on.

    The tests are flawed anyway since the first tests may have loaded a lot of blocks into the buffer cache that the other tests then used. That could make the numbers even farther apart than what you show.

        2. Why of these Bulk Collects are the Best Syntax to Use in Terms of Overall Performance and Memory Consumption)

    None of them - based on what you posted.

    Collections and bulk processing are part of a 'solution' to a 'problem'. You don't have any problem so you don't need a solution.

    The 'best' method depends on what the problem is. In particular it depends on what you plan to do with the data you fetch from the cursor.

    If you are doing a lot of analysis/cleansing as you might for complex ETL processes the time spent for each row on the analysis/cleansing will be longer than the io time and will be the determining factor for performance.

    A typical ETL process involves AT LEAST three components:

    1. sourcing the data

    2. processing/analyzing/cleansing the data

    3. dumping the data back to a table or passing it to another process

    You haven't told us ANYTHING about what you plan to do with the data once you get it. So it just isn't possible to know what 'solution' should be used to get the best performance.

  • sdstuber
    sdstuber Member Posts: 215 Bronze Badge
    edited Oct 29, 2016 5:33PM

    The reason your cursor for loop is fastest is because you're still doing a bulk collect, but without the extra overhead of user defined pl/sql.

         A cursor for loop has a built in "bulk collect limit 100"

    This optimization was made with the introduction of 10g.  I've not been able to find any mention of it in "normal" documentation but both askTom and Steven Feuerstein have confirmed it over the years.  Here's one source:

    PLSQL 101: Working with Cursors

    Migs_Isip
  • Migs_Isip
    Migs_Isip Member Posts: 340
    edited Oct 29, 2016 5:39PM
    Btw EXCEPTION
    WHEN OTHERS THEN
    dbms_output.put_line(sqlerrm);
    end;will take any decent code and make it absolutely worthless.

    Don't worry, It was just for Sample Purposes in an Anonymous Block. I don't use that in Production or in any Development Work I Do.

  • Migs_Isip
    Migs_Isip Member Posts: 340
    edited Oct 29, 2016 5:41PM

    Yes Generally that's the rule of thumb. But the question in context was for PL/SQL Processing. The samples were created just to check for differences in performance and in Syntax.

  • Migs_Isip
    Migs_Isip Member Posts: 340
    edited Oct 29, 2016 5:54PM

    Oh hello there @rp0428, i see you're in your usual tone.

    Why are you collecting into 'l_emp_tab_ext_clean' but displaying the count of 'l_emp_tab_ext_raw'?

    Corrected the Typo in the Original Post.

    Collections and bulk processing are part of a 'solution' to a 'problem'. You don't have any problem so you don't need a solution.

    Currently I'm just studying up on what Bulk Collect Syntax are best for some 'problems' so i can be ready with a 'solution'.

  • Migs_Isip
    Migs_Isip Member Posts: 340
    edited Oct 29, 2016 5:58PM
    sdstuber wrote:The reason your cursor for loop is fastest is because you're still doing a bulk collect, but without the extra overhead of user defined pl/sql. A cursor for loop has a built in "bulk collect limit 100"This optimization was made with the introduction of 10g. I've not been able to find any mention of it in "normal" documentation but both askTom and Steven Feuerstein have confirmed it over the years. Here's one source:PLSQL 101: Working with Cursors 

    Hi, Yes i read about that too, just some hours ago in https://oracle-base.com/articles/9i/bulk-binds-and-record-processing-9i 

    "From Oracle 10g onward, the optimizing PL/SQL compiler converts cursor FOR LOOPs into BULK COLLECTs with an array size of 100."

    So even if i don't specify an explicit BULK COLLECT, the Database will automatically convert it to a BULK COLLECT with a LIMIT of 100. correct?

  • Unknown
    edited Oct 29, 2016 6:23PM
    Oh hello there rp0428, i see you're in your usual tone.

    Any 'tone' you perceive is of your own making - perhaps coming from within?

    There are no 'tones' on the forum - at least none of my making.

    Currently I'm just studying up on what Bulk Collect Syntax are best for some 'problems' so i can be ready with a 'solution'.

    Ok - but I fail to see how an improper and totally useless test of the times for mere counts provides any useful info at all that would help one be 'ready' when a problem occurs.

    Wouldn't it be much more useful to ask questions like::

    1. what use cases are appropriate for pl/sql?

    2. what use cases are appropriate for collections?

    3. what use cases are appropriate for bulk processing

    4. what factors can affect the performance of bulk processing

    5. what are the common mistakes people make when implementing collections and/or bulk processing.

    6. I've seen this 'abc' use case before - is it appropriate for bulk processing? Why or why not?

    A question like 'why is useless count #1 faster than useless count #2' doesn't really add to one's knowledge base IMHO.

    AndrewSayer
  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Oct 29, 2016 6:59PM
    Migs_Isip wrote:Btw EXCEPTION
    WHEN OTHERS THEN
    dbms_output.put_line(sqlerrm);
    end;will take any decent code and make it absolutely worthless.Don't worry, It was just for Sample Purposes in an Anonymous Block. I don't use that in Production or in any Development Work I Do.

    So why do it for a sample purpose? IMO it is one of the easiest ways to draw the wrong conclusion from anything. If it's not good enough for development, it's not good enough for test cases.

    William Robertson
  • sdstuber
    sdstuber Member Posts: 215 Bronze Badge
    edited Oct 29, 2016 7:52PM Accepted Answer
    So even if i don't specify an explicit BULK COLLECT, the Database will automatically convert it to a BULK COLLECT with a LIMIT of 100. correct?

    Yes, essentially.  it's not quite the same thing though because you don't have access to the internal collection.  So, while you can iterate within the cursor loop,  you can't do a FORALL bulk processing with the results or any other collection actions.

    If however, iteration is all you need, then a cursor for loop will often be not only the simplest, but also the fastest.

    There are, of course, exceptions and if the row is large, an internal 100-row collection might be consuming more memory than you want.

    Conversely, if you have lots of memory then it can (not always) be faster to do bulk collect manually with larger limits.

    As mentioned above though, there is no "one way" that will always be best for all solutions.  So, keep them all in your tool box.  Pull out the one you need, when you need it.

    Migs_Isip
This discussion has been closed.