12 Replies Latest reply on Feb 6, 2013 9:17 AM by nipuna86

    Performance Issue Bulk Insert PL/SQL table type

    nipuna86
      Hi All,

      I am implementing a batch process to populate a table with some large number of data records(>3,000,000). In order to reduce the execution time i have used PL/SQL tables to temporarily store the data that are to be written in to the destination table. Once all the records are accumulated in to the PL/SQL table i use a FORALL operator to bulk insert the the records into physical table.

      Currently I have followed two approaches to implement the above process. (Please see the below code segments). I need to choose the performance wise best way among these two approaches. I really appreciate all the expert comments regarding the execution time of this two approaches.

      (I cannot see much difference in time consumption in my testing environment which has limited set of data. This process suppose to build a complex set of large product structures once deployed in the production environment.)


      Approach I :_
      DECLARE
      TYPE test_type IS TABLE OF test_tab%ROWTYPE INDEX BY BINARY_INTEGER;
      test_type_ test_type;
      ins_idx_ NUMBER;
      BEGIN
           ins_idx_ := 1;
           NESTED LOOP
                test_type_(ins_idx_).column1 := value1;
                test_type_(ins_idx_).column2 := value2;
                test_type_(ins_idx_).column3 := value3;
                ins_idx_ := ins_idx_ + 1;
           END LOOP;
           
           FORALL i_ in 1..test_type_.COUNT
      INSERT INTO test_tab VALUES test_type_(i_);
      END;
      /


      Approach II :_
      DECLARE
      TYPE column1 IS TABLE OF test_tab.column1%TYPE INDEX BY BINARY_INTEGER;
      TYPE column2 IS TABLE OF test_tab.column2%TYPE INDEX BY BINARY_INTEGER;
      TYPE column3 IS TABLE OF test_tab.column3%TYPE INDEX BY BINARY_INTEGER;
      column1_ column1;
      column2_ column2;
      column3_ column3;
      ins_idx_ NUMBER;
      BEGIN
           ins_idx_ := 1;
           NESTED LOOP
                column1_(ins_idx_) := value1;
                column2_(ins_idx_) := value2;
                column3_(ins_idx_) := value3;
                ins_idx_ := ins_idx_ + 1;
           END LOOP;
           
           FORALL idx_ in 1..column1_.COUNT
      INSERT
                     INTO n_part_cost_bucket_tab (
      column1,
      column2,
      column3)
                     VALUES (
      column1_(idx_),
      column2_(idx_),
      column3_(idx_));
      END;
      /

      Best Regards,
      Nipuna

      Edited by: nipuna86 on Jan 3, 2013 10:23 PM
        • 1. Re: Performance Issue Bulk Insert PL/SQL table type
          Purvesh K
          My View:
          Scrap the entire code (both approaches) and replace it with:
          INSERT INTO n_part_cost_bucket_tab (
          column1,
          column2,
          column3)
          select column1, column2, column3
            from test_tab
          --where some_condition = some_value  --> Use this if you want to insert records that align to some condition
          Moreover, you are talking of 3 Million records being stored into your expensive PGA, without limiting the records in your collection, and then written to another table. That would be one of the worst ways to achieve a simple requirement of Insert into table select from another table. And, most certainly, you are experiencing performance issues due to this approach itself.
          • 2. Re: Performance Issue Bulk Insert PL/SQL table type
            nipuna86
            Thanks for the reply,

            I agree with your suggestion, But unfortunately this cannot be done with a simple SELECT statement as you have suggested. Please not that I had to use a nested loop structure to feed the dataset into the PL/SQL table. The logic is too complex to be written in a plain SQL SELECT query.
            • 3. Re: Performance Issue Bulk Insert PL/SQL table type
              Purvesh K
              nipuna86 wrote:
              Thanks for the reply,

              I agree with your suggestion, But unfortunately this cannot be done with a simple SELECT statement as you have suggested. Please not that I had to use a nested loop structure to feed the dataset into the PL/SQL table. The logic is too complex to be written in a plain SQL SELECT query.
              Can you post the complex logic? Experts around might suggest some innovative way to handle such a requirement.

              As an alternative way, I would suggest to stop loading entire 3 M records but to process the records iteratively, use of Bulk Collect with a LIMIT clause and then a FORALL.
              • 4. Re: Performance Issue Bulk Insert PL/SQL table type
                Billy~Verreynne
                nipuna86 wrote:

                I am implementing a batch process to populate a table with some large number of data records(>3,000,000). In order to reduce the execution time i have used PL/SQL tables to temporarily store the data that are to be written in to the destination table. Once all the records are accumulated in to the PL/SQL table i use a FORALL operator to bulk insert the the records into physical table.
                Performance is more than simply reducing execution time.

                Just like breaking a car is more than just stopping a car in the quickest possible time.

                If it was (breaking a car by simply stopping it), then a brick wall build with reinforced concrete, would have been the ideal way to stop all kinds of motor vehicles at all kinds of speed.

                Only problem (well more than one actually) is that stopping a vehicle that way is bad for the car, the engine, the driver, the passengers and any other contents in it.

                And pushing 3 million records into a PL/SQL "table" (which btw is WRONG terminology - there are no table structures in PL/SQL) in order to run a SQL INSERT cursor 3 million times, for reducing execution time, is no different than using a brick wall to stop a car.

                Both approaches are pretty much flawed. Both places an unreasonable demand on PGA memory. Both are still row by row (aka slow-by-slow) processing.
                • 5. Re: Performance Issue Bulk Insert PL/SQL table type
                  nipuna86
                  Thank you very much, it's a very nice way of explaining with an example :)
                  Could you please suggest a workaround for this?
                  • 6. Re: Performance Issue Bulk Insert PL/SQL table type
                    Billy~Verreynne
                    As already suggested - use SQL. The basic mantra for well designed Oracle systems is: Maximise SQL. Minimise PL/SQL.

                    Do not use PL/SQL what SQL is capable of doing. SQL is the fastest, most powerful, and most flexible, language for processing database data.

                    Unfortunately, some still see it as a mere I/O call layer - something to move the data from the database into an application language, for processing the data in that language (and then use SQL to move the resulting data in the application language back to the database).

                    The correct approach is to use the SQL language for most, if not all, of the data processing required. And have it returns the results of that processing to the application layer.

                    The fastest way to insert data from existing data in the database, is an INSERT..SELECT SQL statement. It can do direct path inserts. It can do it in parallel. The data read by the SQL engine is written by the SQL engine. The data is not send via a detour to some application language in order to wind up back where the data originated from - in the database.

                    If the SQL language is not able to do the required processing and data transformation in the SELECT part of the INSERT..SELECT SQL statement, only then is PL/SQL considered.

                    The SQL statement however remains the same as the basic INSERT..SELECT is the most optimal method.

                    The SELECT part is changed to use a PL/SQL pipeline - enabling PL/SQL code to be called by the SQL engine (parallel processing supported). This PL/SQL code can then apply the complex transformation of input data, and output the transformed data for the INSERT part of the INSERT..SELECT to write back to the database.

                    If the nature of the data and processing is beyond the feature and functionality provided by a PL/SQL pipeline, then other methods can be considered. Like using DBMS_PARALLEL_EXECUTE interface to drive parallel processing of PL/SQL code.

                    Where does this 3 million rows come from? What business logic needs to be applied? What are the performance constraints?

                    3 million rows are not much. In the modern day RDBMS is actually tiny. About an hour ago in a production database of mine, an automated process (similar to an INSERT..SELECT as described above) read 28,900,063 rows, joined that to other tables, processed the results into 4,428,736 rows, and added these results to an existing table. In 154 seconds (less than 3 minutes).

                    Do not underestimate (like so many seems to) the ability and performance of the SQL language in Oracle.
                    • 7. Re: Performance Issue Bulk Insert PL/SQL table type
                      6363
                      nipuna86 wrote:

                      The logic is too complex to be written in a plain SQL SELECT query.
                      What functions or operators are you using in PL/SQL that are not available in SQL for the processing?
                      • 8. Re: Performance Issue Bulk Insert PL/SQL table type
                        nipuna86
                        Hi All,

                        Thank you very much for all the valuable responses and i am really sorry for the late response.

                        This procedure has a complex nested loop structure and some cumulative calculations. Therefore it is not possible to get the result data set in a single/multiple sql SELECT statements without altering the functional requirement. I have used INSERT INTO <TAB> VALUES .... SELECT FROM several times before and it executes really fast. But unfortunately it is not a possibility without altering the requirement.

                        I agree that collecting all the records into a PL/SQL Record collection is not a good idea. I am going to LIMIT the number of records. will it have any impact on performance?

                        I am sorry, but i cannot publish the source code according to some of the conditions in our project agreement.

                        Regards,
                        Nipuna
                        • 9. Re: Performance Issue Bulk Insert PL/SQL table type
                          Billy~Verreynne
                          nipuna86 wrote:

                          This procedure has a complex nested loop structure and some cumulative calculations. Therefore it is not possible to get the result data set in a single/multiple sql SELECT statements without altering the functional requirement.
                          I would dispute that. SQL is EXTREMELY capable.. and can solve complex problems more elegantly and with less code than PL/SQL or Java, or whatever else. Such as solving knapsack based problems using SQL ({message:id=10308684}) that will take a lot more code to do using PL/SQL - and would be a lot slower using PL/SQL.


                          But as you cannot show us the code, and insists on using PL/SQL, consider changing that PL/SQL into a PL/SQL pipeline table function - and driving that pipeline in parallel.

                          If the current outer loop used, cannot be changed into a parallel SQL to drive the current inner loop as a pipeline, then don't bother. A pipeline will in this case not provide any real tangible performance improvements.

                          In which case the only remaining s/w based solution (as oppose to getting better and faster h/w) is to do manual parallel processing. The outer loop needs be "chunked" - in order to run multiple copies of this outer loop in parallel. With each copy driving an inner loop to process the data from that specific outer loop.

                          On 11g, DBMS_PARALLEL_EXECUTE can be used to perform the chunking and processing (see {message:id=10571826} and {message:id=10746644} for examples).

                          On 10g, the chunking needs to be done manually (such as using the approach shown in {message:id=1534900}) and using DBMS_JOB (or scheduler) to drive the parallel process threads.
                          • 10. Re: Performance Issue Bulk Insert PL/SQL table type
                            Oracle Maniac
                            DO we have any undo tablespace considerations while performing

                            1) insert into tablename....select * from tablename

                            2) CTAS (Said to be the fastest possible way of creating a table using another table)


                            The main consideration is a commit. If i use insert into tablename...select * from tablename , what is the records slot for commit ?
                            I have seen cases where a poor commit point generated a runtime exception ( Unable to extend undo segment by ....bytes ).
                            I dnt know the behaviour after Automatic undo management introduced .

                            What exaclty happens when we use the above statements . As per your statement " The data read by the SQL engine is written by the SQL engine" .Can you just elaborate this a little, so that the remaining analysis can be done.



                            Also , what would be the best way to delete 1 million of records from a table having 50 millions .I have used a proc to bulk collect the rowids and then used those rowids to generated bulk delete statements using forall and send it to the sql engine ?


                            Thanks a lot.

                            Edited by: Rahul K on Jan 9, 2013 10:46 PM
                            • 11. Re: Performance Issue Bulk Insert PL/SQL table type
                              Billy~Verreynne
                              Rahul  K wrote:

                              The main consideration is a commit. If i use insert into tablename...select * from tablename , what is the records slot for commit ?
                              I have seen cases where a poor commit point generated a runtime exception ( Unable to extend undo segment by ....bytes ).
                              Tom has answered this far better than what I could, many times, on AskTom (click link to see one of these discussions).

                              I will weigh in with the issue of resources in principle.

                              If you want to store a million rows, or videos, or images - you are not going to try and do that using a 1.44MB stiffy. It is obvious that the resource used must cater for the demands placed on that resource.

                              If you want to play a HD video, you are not going to design and write all kinds of work-arounds to perform the playback via 4MB of video memory using a single CPU that supports at most 2 processing threads from your application (together with running the rest of the o/s and desktop load).

                              Why would you even want to attempt designing and writing s/w to perform a task on resources not able to meet the basic requirements for that task? Even if you hack and workaround all the problems using the inferior resources (at great cost ito design and code complexity), the end-result will still be inferior s/w. So instead you would rather design robust and well performing s/w - that does the HD video playback using h/w suited for that job.

                              So what does this have to do with UNDO/REDO space? It is also a resource. It needs to be correctly sized for the database processing that is performed.

                              It does NOT make sense to undersize this resource, and then attempt to hack database processing to work around this resource problem. For the very same reasons you will not even try to store a million rows on a 1.4Mb stiffy, or write s/w to use a 4MB video card and old CPU to play HD videos.

                              The resource MUST meet the requirements of the process load. And UNDO and REDO are resources that need to be set up to meet the transaction loads of database processes.

                              Bluntly put - IMO not doing that, not correctly sizing resources, and then hack s/w to use insufficient resources, is idiotic. And that is my view too of code using frequent commits.
                              • 12. Re: Performance Issue Bulk Insert PL/SQL table type
                                nipuna86
                                Thanks for all the valued responses.