Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

PL/SQL Bulk Processing Checklist

Steven Feuerstein-OracleMar 15 2016 — edited Mar 16 2016

Comments

BluShadow Mar 15 2016

Looks like a decent checklist to me.

I'm glad the first thing mentioned was "can you do it in pure SQL?" as that certainly should be the first consideration.

Also glad that the limit was mentioned, though perhaps needs a mention that recent versions of the database already cache for row-by-row processing, in many cases doing 100 rows at a time anyway, so there has to be consideration as to whether bulk collect is going to add enough benefit.

Looks like a decent checklist to me.

I'm glad the first thing mentioned was "can you do it in pure SQL?" as that certainly should be the first consideration.

Also glad that the limit was mentioned, though perhaps needs a mention that recent versions of the database already cache for row-by-row processing, in many cases doing 100 rows at a time anyway, so there has to be consideration as to whether bulk collect is going to add enough benefit.

Thanks, @"BluShadow"!

As for "already cache for row-by-row processing", I assume you mean that a cursor FOR loop is automatically optimized to return 100 rows with each fetch?

It is worth mentioning, but I think I need to add a paragraph like this at the top:

Scenario: you have a loop in your code that executes one or more non-query DML statements inside of it. It is running too smoothly and you need to make it go faster.

In other words, I want to focus on loops that are not "read only" but contain non-query DML. This means that the automatic optimization of the CFL will not be enough. Agreed?

BluShadow Mar 15 2016

Thanks, @"BluShadow"!

As for "already cache for row-by-row processing", I assume you mean that a cursor FOR loop is automatically optimized to return 100 rows with each fetch?

It is worth mentioning, but I think I need to add a paragraph like this at the top:

Scenario: you have a loop in your code that executes one or more non-query DML statements inside of it. It is running too smoothly and you need to make it go faster.

In other words, I want to focus on loops that are not "read only" but contain non-query DML. This means that the automatic optimization of the CFL will not be enough. Agreed?

Yep, that's what I was referring to.

It is running too smoothly

Do you mean it "isn't" running too smoothly.

Aside from that... agreed.  :-)

Yep, that's what I was referring to.

It is running too smoothly

Do you mean it "isn't" running too smoothly.

Aside from that... agreed.  :-)

I like code reviews. No one should code (or write "doc") by themselves.

It should have said "running too slowly". Fixed!

padders Mar 15 2016

Step 4 phases are numbered 1,2,2 rather than 1,2,3.

Step 4 might also be a good place for a NOCOPY reminder.

Step 6 the word 'SELECT' is misspelled.

I think you need a point about exit conditions in BULK COLLECT LIMIT loops, it comes up a lot.

Step 4 phases are numbered 1,2,2 rather than 1,2,3.

Step 4 might also be a good place for a NOCOPY reminder.

Step 6 the word 'SELECT' is misspelled.

I think you need a point about exit conditions in BULK COLLECT LIMIT loops, it comes up a lot.

Thanks, padders! I incorporated all your suggestions and re-posted the doc.

unknown-7404 Mar 15 2016

The order of the steps in that checklist is confusing to me.

The subject says 'Converting ... to Bulk Processing'. Is that REALLY the subject? Or are you primarily focused only on the FORALL aspect of bulk processing?

If the focus is really bulk processing then I suggest you put the technical steps in the same order you show in your step #4: load, modify, push

So your step 2 (analyze and document) should really be step 1 since you shouldn't really do ANYTHING

before you analyze and document what is really needed.

The result of that analysis should then be used as a guide as to whether you need to use Bulk Processing. Then FORALL is just one possible part of that bulk processing.

Then your step 3 (build regression test) can be step 2. As you already suggest test cases should come as early as

possible and for a conversion that can be done in parallel with the actual development and even by another team.

Then the new step 3 could be a modified version of your step 1 except it would read:

Make sure you need to use Bulk Processing

I replaced the word FORALL in that first item with Bulk Processing.

Generally, you need FORALL if you need to perform complex procedural steps.

Hmmm - actually that better describes the need for bulk processing since FORALL is ONLY used to put data INTO the database. It really has NOTHING to do with getting data out of the database or processing data that is in collections.

Also the 'make sure you need ' step should actually include the 3 items in your step 4 (load - modify - push). You should really decide WHICH of those three you should/might use. Using 'load collections' might not be feasible depending on the data source.

Your step 6 (bulk collect) would go before step 5 (forall) since creating the collection(s) comes before using them

I think it important to mention COMMIT in you list of steps even though I know you will have it in the actual text and detail. Far too many people embed a commit in the innermost layers and that really disrupts modular architecture.

Just as your step 6 mentions LIMIT and 'soft-code the limit' for batch processing I almost always soft-code the commit interval. Sometimes that involves specifying two values: whether to commit at all and, if so, when to commit.

Those 'three basic phases' you mention in your step 4 cry out for a 'controller' phase that manages the entire process and handles the general flow of processing. The 'controller' would perform initialization: specify logging method and level, initialization of global values (number of retries, bulk limits, commit intervals)

Vysakh Suresh - 3035408 Mar 15 2016 — edited on Mar 15 2016

A very decent checklist Steven..;).

The order of the steps in that checklist is confusing to me.

The subject says 'Converting ... to Bulk Processing'. Is that REALLY the subject? Or are you primarily focused only on the FORALL aspect of bulk processing?

If the focus is really bulk processing then I suggest you put the technical steps in the same order you show in your step #4: load, modify, push

So your step 2 (analyze and document) should really be step 1 since you shouldn't really do ANYTHING

before you analyze and document what is really needed.

The result of that analysis should then be used as a guide as to whether you need to use Bulk Processing. Then FORALL is just one possible part of that bulk processing.

Then your step 3 (build regression test) can be step 2. As you already suggest test cases should come as early as

possible and for a conversion that can be done in parallel with the actual development and even by another team.

Then the new step 3 could be a modified version of your step 1 except it would read:

Make sure you need to use Bulk Processing

I replaced the word FORALL in that first item with Bulk Processing.

Generally, you need FORALL if you need to perform complex procedural steps.

Hmmm - actually that better describes the need for bulk processing since FORALL is ONLY used to put data INTO the database. It really has NOTHING to do with getting data out of the database or processing data that is in collections.

Also the 'make sure you need ' step should actually include the 3 items in your step 4 (load - modify - push). You should really decide WHICH of those three you should/might use. Using 'load collections' might not be feasible depending on the data source.

Your step 6 (bulk collect) would go before step 5 (forall) since creating the collection(s) comes before using them

I think it important to mention COMMIT in you list of steps even though I know you will have it in the actual text and detail. Far too many people embed a commit in the innermost layers and that really disrupts modular architecture.

Just as your step 6 mentions LIMIT and 'soft-code the limit' for batch processing I almost always soft-code the commit interval. Sometimes that involves specifying two values: whether to commit at all and, if so, when to commit.

Those 'three basic phases' you mention in your step 4 cry out for a 'controller' phase that manages the entire process and handles the general flow of processing. The 'controller' would perform initialization: specify logging method and level, initialization of global values (number of retries, bulk limits, commit intervals)

Thanks, @"rp0428", for the detailed feedback.

Some responses:

The subject says 'Converting ... to Bulk Processing'. Is that REALLY the subject? Or are you primarily focused only on the FORALL aspect of bulk processing?

Yes, that is my primary focus. This checklist is supposed to be used as follows:

You found a loop that contains DML. It is running too slowly. You want to convert it over to bulk processing. What do you need to do, to get that done correctly?

I agree with some of your restructuring suggestions. I will incorporate and load up a new version.

I will mention COMMIT under the Analyze step. If there are commits in your current process, you need to make sure commit before after bulkification is consistent or acceptable. I am not going to get into incremental commit processing in this checklist. I want to keep it focused on how to correctly apply FORALL (and secondarily BULK COLLECT). And I don't want to go into too much detail (it makes the checklist too long).

"Cry out for a controller phase" - um, yes, I guess so. I don't see that in the scope of this checklist, more of a general recommendation for how to write your programs. For example: specify logging method and level - I will assume (and make that explicit in the checklist) that the user is logging within current version of code. If not, use a utility like Logger. That's about all I want to get into in this checklist.

Strong differences of opinion out there? Again, I need to keep this short (1 page, I feel) and focused (FORALL to improve your DML-saturated loops) to be most useful.

Patch72 Mar 16 2016

...non-­-query DML statements...

Technically every DML statement is a query. Maybe this should be: non-select DML statements

...non-­-query DML statements...

Technically every DML statement is a query. Maybe this should be: non-select DML statements

Please clarify. I am not smart enough SQL to understand.

BluShadow Mar 16 2016

...non-­-query DML statements...

Technically every DML statement is a query. Maybe this should be: non-select DML statements

Patch72 wrote:

...non--query DML statements...

Technically every DML statement is a query. Maybe this should be: non-select DML statements

Really?  Most people don't consider, INSERT, UPDATE or DELETE to be queries.

However DML encompasses all those 3 plus SELECT, which, although it's not really manipulating the data on the database is still classed as DML.

I think Steve's "non-query DML" is perfectly ok, and it made sense to me.

Billy Verreynne Mar 16 2016 — edited on Mar 16 2016

My biggest problem with bulk fetching and binding (forall) is that it simply does not scale. Which makes it only a worthwhile solution for processing small data sets at best - and even that can be a problem in today's environment where a million row table is considered smallish.

Thus a question that needs to be asked is whether "bulkfication" of cursor loops with nested DMLs provide any real and tangible benefits? Or is it a case of putting a small band-aid plaster on a badly bleeding shotgun wound? Especially given the effort to refactor the existing code for bulk processing without breaking the existing processing logic.

And how much effort will a proper scalable approach take instead?

The core issue in my view is not a technical one that needs to be addressed on how to turn cursor-loops-with-nested-DMLs into bulk processing - it is one of educating the developer of (modern!) basic data processing concepts. Of just what the impact is of (ancient) serialised row-by-row processing. What the power is of the SQL language as data processing language (as oppose to PL/SQL, Java, C#, etc). And how to scale SQL processing using PQ and DBMS_PARALLEL_EXECUTE - and if processing is too complex for the SQL language to deal with, how to multi-thread (parallelise processing) using pipeline table functions.

In other words, give the developer the knowledge to realise just how silly cursor-fetch-loops-with-nested-DMLs are, and the knowledge on how to use 21st century data processing concepts, instead of techniques we used in the 70's and 80's in COBOL for processing tapes and ISAM files.

My biggest problem with bulk fetching and binding (forall) is that it simply does not scale. Which makes it only a worthwhile solution for processing small data sets at best - and even that can be a problem in today's environment where a million row table is considered smallish.

Thus a question that needs to be asked is whether "bulkfication" of cursor loops with nested DMLs provide any real and tangible benefits? Or is it a case of putting a small band-aid plaster on a badly bleeding shotgun wound? Especially given the effort to refactor the existing code for bulk processing without breaking the existing processing logic.

And how much effort will a proper scalable approach take instead?

The core issue in my view is not a technical one that needs to be addressed on how to turn cursor-loops-with-nested-DMLs into bulk processing - it is one of educating the developer of (modern!) basic data processing concepts. Of just what the impact is of (ancient) serialised row-by-row processing. What the power is of the SQL language as data processing language (as oppose to PL/SQL, Java, C#, etc). And how to scale SQL processing using PQ and DBMS_PARALLEL_EXECUTE - and if processing is too complex for the SQL language to deal with, how to multi-thread (parallelise processing) using pipeline table functions.

In other words, give the developer the knowledge to realise just how silly cursor-fetch-loops-with-nested-DMLs are, and the knowledge on how to use 21st century data processing concepts, instead of techniques we used in the 70's and 80's in COBOL for processing tapes and ISAM files.

Very good points, @"Billy~Verreynne", but I will judge them out of scope for this effort. I encourage you to offer up a document that addresses the issues you raise. But I think there are still many scenarios involving a substantial amount of date (maybe not in the millions or billions) and a lot of pieces of code in which FORALL could be a solution that a developer can implement in a reasonable amount of time, on an existing piece of production code that needs to be made more efficient, and in which there is not time or resources to go back and re-architect that program and perhaps big chunks of other programs.

BluShadow Mar 16 2016

My biggest problem with bulk fetching and binding (forall) is that it simply does not scale. Which makes it only a worthwhile solution for processing small data sets at best - and even that can be a problem in today's environment where a million row table is considered smallish.

Thus a question that needs to be asked is whether "bulkfication" of cursor loops with nested DMLs provide any real and tangible benefits? Or is it a case of putting a small band-aid plaster on a badly bleeding shotgun wound? Especially given the effort to refactor the existing code for bulk processing without breaking the existing processing logic.

And how much effort will a proper scalable approach take instead?

The core issue in my view is not a technical one that needs to be addressed on how to turn cursor-loops-with-nested-DMLs into bulk processing - it is one of educating the developer of (modern!) basic data processing concepts. Of just what the impact is of (ancient) serialised row-by-row processing. What the power is of the SQL language as data processing language (as oppose to PL/SQL, Java, C#, etc). And how to scale SQL processing using PQ and DBMS_PARALLEL_EXECUTE - and if processing is too complex for the SQL language to deal with, how to multi-thread (parallelise processing) using pipeline table functions.

In other words, give the developer the knowledge to realise just how silly cursor-fetch-loops-with-nested-DMLs are, and the knowledge on how to use 21st century data processing concepts, instead of techniques we used in the 70's and 80's in COBOL for processing tapes and ISAM files.

And Steven did include the "Can you do it in Pure SQL?" as the first thing in the document, so it does kind of highlight that Pure SQL would be the better option in most cases.

Certainly I don't use bulk processing in anything I write, because I know the benefits of using pure SQL, but it's necessary to know the features you don't use and why you don't use them and how you would use them if you really had a need for it (or just wanted a quick fudge for a "one off" thing).

Billy Verreynne Mar 17 2016

And Steven did include the "Can you do it in Pure SQL?" as the first thing in the document, so it does kind of highlight that Pure SQL would be the better option in most cases.

Certainly I don't use bulk processing in anything I write, because I know the benefits of using pure SQL, but it's necessary to know the features you don't use and why you don't use them and how you would use them if you really had a need for it (or just wanted a quick fudge for a "one off" thing).

The only place I use bulk fetching is for building XML data - creating a data set for generating graphs on web browsers (aka dynamic PL/SQL regions that use the HTP interface to output XML data in hidden textarea tags for Javascript to use as graphing data set).

I am honestly hard pressed to think of any valid example of using a bulk fetch cursor loop in PL/SQL, and chucking nested DMLs into that loop. This can be done better and faster using only SQL, or DBMS_PARALLEL_EXECUTE, or a PL/SQL parallel pipeline (which should be bulk fetching internally).

A bulk cursor-loop-with-nested-DMLs is in my view a distant 4th option - and one that should be the exception. Instead it is touted as the norm for optimised data processing by many a blog. I feel that there needs to be a hard push against that concept, in order to clearly make developers grasp just how inflexible the concept is when it comes to performance and scalability.

BluShadow Mar 17 2016

The only place I use bulk fetching is for building XML data - creating a data set for generating graphs on web browsers (aka dynamic PL/SQL regions that use the HTP interface to output XML data in hidden textarea tags for Javascript to use as graphing data set).

I am honestly hard pressed to think of any valid example of using a bulk fetch cursor loop in PL/SQL, and chucking nested DMLs into that loop. This can be done better and faster using only SQL, or DBMS_PARALLEL_EXECUTE, or a PL/SQL parallel pipeline (which should be bulk fetching internally).

A bulk cursor-loop-with-nested-DMLs is in my view a distant 4th option - and one that should be the exception. Instead it is touted as the norm for optimised data processing by many a blog. I feel that there needs to be a hard push against that concept, in order to clearly make developers grasp just how inflexible the concept is when it comes to performance and scalability.

I know what you mean Billy and I agree with it being an exception to the rule.

However, in terms of this PL/SQL community and Steven being the advocate of using PL/SQL, the checklist for moving things from slow-by-slow... erm... row by row processing to something more performant is certainly good to highlight to newbie developers.  Perhaps it just needs a little more emphasis on the "Most things can be done with pure SQL, so if that's the case, that should be your preference, however if for any valid reason you can't then here's the steps to consider when converting your slow row-by-row processing to bulk processing"

unknown-7404 Mar 17 2016

Very good points, @"Billy~Verreynne", but I will judge them out of scope for this effort. I encourage you to offer up a document that addresses the issues you raise. But I think there are still many scenarios involving a substantial amount of date (maybe not in the millions or billions) and a lot of pieces of code in which FORALL could be a solution that a developer can implement in a reasonable amount of time, on an existing piece of production code that needs to be made more efficient, and in which there is not time or resources to go back and re-architect that program and perhaps big chunks of other programs.

I think the rewrite/reorg (as of the current timestamp of my reply) does a nice job of summarizing, in order, the issues to be considered for the use case you presented.

It is also easy to understand and self-contained as it doesn't refer to info outside the summary itself.

ashley.hong Aug 27 2018

Thanks,

This paper briefly introduces the bulk PL/SQL processing concepts and enhancements in Oracle 8i. Bulk or array processing has been available in Oracle for some time (using PRO*C or DBMS_SQL), but has only become available in native PL/SQL with the release of Oracle 8i.

1 - 19

Post Details

Added on Mar 15 2016
19 comments
1,503 views