6 Replies Latest reply: Feb 4, 2013 1:07 PM by rp0428 RSS

    Parallel pipeline function decreases performance

    988837
      I am currently using the BULK COLLECT and FOR ALL clause to fetch around 100,000 records from one table, perform transformations on it and insert it into another table.

      I tried modifying the procedure to include parallel pipeline. But now, it takes a longer time to execute than what it did earlier. I checked the Explain plan to make sure it was actually running parallel. Just for checking, I also tried parallel pipeline in a simple proceedure. Even that ran slower than a normal one that used BULK COLLECT.

      The simple procedure that I ran was..

      *1) Using BULK COLLECT*
      set timing on;
      DECLARE
      feed_pipe2 SYS_REFCURSOR;
      n NUMBER;
      m NUMBER;
      TYPE ret_type IS TABLE OF target_table%ROWTYPE;
      ret_data ret_type;
      BEGIN
      OPEN feed_pipe2 FOR SELECT owner FROM base_test_table;
      LOOP
      FETCH feed_pipe2
      BULK COLLECT INTO ret_data
      LIMIT 100;
      FORALL i IN 1 .. ret_data.COUNT
      INSERT INTO target_table
      VALUES ret_data (i);
      COMMIT;
      EXIT WHEN feed_pipe2%NOTFOUND;
      END LOOP;
      CLOSE feed_pipe2;
      END;
      /
      
      
      anonymous block completed
      *656ms* elapsed
      *2) Using parallel pipeline*
      set timing on;
      DECLARE
      feed_pipe SYS_REFCURSOR;
      TYPE ret_type IS TABLE OF target_table%ROWTYPE;
      ret_data ret_type;
      BEGIN
      OPEN feed_pipe FOR SELECT /*+ parallel */ * FROM base_test_table;
      INSERT INTO target_table b
      SELECT lower_owner FROM TABLE (parallel_pipeline (feed_pipe)); 
      END;
      /
      
      
      anonymous block completed
      *2,093ms* elapsed
        • 1. Re: Parallel pipeline function decreases performance
          rp0428
          Welcome to the forum!

          Whenever you post provide your 4 digit Oracle version.
          >
          I am currently using the BULK COLLECT and FOR ALL clause to fetch around 100,000 records from one table, perform transformations on it and insert it into another table.
          >
          Why? Why are you using PL/SQL instead of SQL?

          You haven't posted anything showing a need for PL/SQL.
          • 2. Re: Parallel pipeline function decreases performance
            Karthick_Arp
            The fastest way to do is to do it in SQL everything else (BULK COLLECT, Pipelined Function etc.) will be slower.

            Just do it in a single INSERT INTO.. SELECT .. FROM statement. And then check for the performance and you will be amazed.
            • 3. Re: Parallel pipeline function decreases performance
              988837
              Sorry guys, new to this forum. The Oracle version i use is 11.2.0.1

              The rows that I fetch from the first table are modified inside the pl/sql procedure. Also, there are calls to a few other procedures for each row that is fetched. That is the reason I am not able to completely use sql alone.
              • 4. Re: Parallel pipeline function decreases performance
                rp0428
                >
                The rows that I fetch from the first table are modified inside the pl/sql procedure. Also, there are calls to a few other procedures for each row that is fetched. That is the reason I am not able to completely use sql alone.
                >
                You haven't provide ANY information about your pipelined function.

                And the code that you did post using a function may use PARALLEL for the SELECT but it is not using PARALLEL for the table insert.

                You also did not post any information that shows that the BULK processing has any performance or other issue that needs to be addressed.

                If it ain't broke, don't fix it.

                1. Identify a problem.
                2. Determine the actual cause of the problem.
                3. Identify possible solutions.
                4. Select a solution for evaluation/testing

                You are asking us about step 4 but you haven't provided anything that indicates that you have done steps 1, 2 and 3. If you did then please provide that supporting information.
                • 5. Re: Parallel pipeline function decreases performance
                  988837
                  The code that I've posted is only for checking whether pipeline runs at a lower time than bulk collect. ( Taken from [http://oracleblogging.wordpress.com/2011/09/06/bulk-collections-and-parallel-processing/] ). It has been mentioned that the time reduces significantly in their case (But it increases in mine). My actual procedure is similar to it but with a lot more transformations and another procedure ( SP_UPDATE in this code) is called for each row fetched.

                  create or replace
                  FUNCTION parallel_pipeline (my_cursor IN SYS_REFCURSOR)
                  RETURN result_type_tab
                  PIPELINED 
                  PARALLEL_ENABLE(PARTITION my_cursor BY ANY) 
                  IS
                  TYPE source_data_type IS TABLE OF base_test_table%ROWTYPE;
                  source_data source_data_type;
                  result_data result_type;
                  BEGIN
                  LOOP
                  FETCH my_cursor
                  BULK COLLECT INTO source_data
                  LIMIT 100; 
                  FOR i IN 1 .. source_data.COUNT
                  LOOP
                  source_data (i).owner := LOWER (source_data (i).owner);
                  source_data (i).object_id := source_data (i).object_id * 2;
                  result_data := result_type (source_data (i).owner,source_data (i).object_id);
                  
                  
                  
                  --This is the procedure called for each row
                  if(source_data (i).owner = 'L' )
                  SP_UPDATE(source_data (i).owner,source_data (i).object_id);
                  
                  
                  PIPE ROW (result_data);
                  END LOOP;
                  EXIT WHEN my_cursor%NOTFOUND;
                  END LOOP;
                  CLOSE my_cursor;
                  RETURN;
                  END parallel_pipeline;
                  • 6. Re: Parallel pipeline function decreases performance
                    rp0428
                    >
                    The code that I've posted is only for checking whether pipeline runs at a lower time than bulk collect. ( Taken from [http://oracleblogging.wordpress.com/2011/09/06/bulk-collections-and-parallel-processing/] ). It has been mentioned that the time reduces significantly in their case (But it increases in mine). My actual procedure is similar to it but with a lot more transformations and another procedure ( SP_UPDATE in this code) is called for each row fetched.
                    >
                    We can only go by what you post in the forum.

                    When you say you have a performance issue doing things the PARALLEL way but then use a custom function we have no way of knowing what that custom function is doing that might be causing or contributing to the problem.

                    And now, based on what you just posted, it looks like your custom 'parallel' function probably IS the problem.

                    You appear to be corrupting a BULK process by including slow-by-slow (row-by-row) processing
                    --This is the procedure called for each row
                    if(source_data (i).owner = 'L' )
                    SP_UPDATE(source_data (i).owner,source_data (i).object_id);
                    But again - there is no way to know just what that SP_UPDATE procedure is doing. But since you are passing it ROW-level information it is NOT bulk processing for that step. That could very well be what is slowing things back down.

                    Your very first example used a FORALL which reduces the context switching for the DML but you slow example does not use FORALL and now worse uses another custom procedure.

                    The article you cited, by the way, talks about "parallel pipes" in the context of gathering the SELECT data quicker. That is seldom the problem. The problem is usually in doing the DML that inserts/updates the data back to the SQL engine.

                    I would be surprised if you even needed to use parallel for your use case. I doubt that your process is delayed in getting started. It is most likely stalling on that 'serialization' (row-by-row) updating that you are doing.