1 2 Previous Next 20 Replies Latest reply on Nov 2, 2010 6:25 AM by user60022

    pipelined parallel enable table function

    user60022
      Hi All,

      For 'pipelined parallel enable' table function, do I have to enable session parallel query first?

      I have read some white papers or web pages on Map and reduce implemented with Oracle table function, and find that in table function,

      we have to use cursor in the loop to get one row and pipe it out. This changes SQL to pl/sql.

      is it the cost that must be paid in table function?

      Last, how can I confirm that Oracle has parallelized the table function?

      Best regards,

      Leon
        • 1. Re: pipelined parallel enable table function
          802709
          if you create a function as parallel_enable , then it will work as parallel. if this is a pipelined function and you use it in select, function will work parallel to produce rows but your query wont, well actually your query is depend on to your system. if you enable parallel query and give parallel hint, then also your query will work parallel
          • 2. Re: pipelined parallel enable table function
            William Robertson
            Yes, your system and session settings for PQ will apply. it's 'enabled', not 'forced'.
            • 3. Re: pipelined parallel enable table function
              user60022
              Thank you for your reply.

              And what's your opion on that we have to use 'loop in cursor' to pipe row instead of one SQL to return all the rows?

              I know if data are too huge to be held by memory, the stored procedures will fail. In this case, we have to use pipelined table function.

              But for a medium size of data, is pipelined table function not as good as the stored procedures that returns a collection of user-defined type?

              Best regards,

              Leon
              • 4. Re: pipelined parallel enable table function
                802709
                if you can return your data in a sql, then do not use pipelined function anyway... use subquery or view.
                • 5. Re: pipelined parallel enable table function
                  Billy~Verreynne
                  user12064076 wrote:

                  But for a medium size of data, is pipelined table function not as good as the stored procedures that returns a collection of user-defined type?
                  Define medium size? A few million? A few 100 million? Remember that today's larger databases deal with billions of rows. There is no such thing as a "+medium sized+" data set as data volumes can differ significantly from database to database.

                  Returning collections? Nope. The correct method for an Oracle database server to return data to a client is via a cursor.

                  That is the default method. The best performing method. The best scalable method. The most flexible method.

                  Thinking that a PL/SQL procedure or function returning a collection is better suited than using a cursor, is flawed. Collections are exceptions. Not the norm. Likewise pipeline tables are exceptions and not the norm.

                  When deviating from the default and the norm, you need think it though carefully and have valid technical reasons as justification. And from experience, often these exceptions are not exceptions, but result from not understanding Oracle concepts and features.

                  Make sure you do actually use the right tool in the Oracle toolbox and not the one you thinks look "nifty and shiny". :-)
                  1 person found this helpful
                  • 6. Re: pipelined parallel enable table function
                    user60022
                    Hi Billy,

                    Thank you for your reply. I agree with you on that we should suitable tools.
                    In my application, I wrote stored procedures that returns collection of user-definded object types to Java.
                    With object types, we can remove most of redundent data.
                    For example,

                    customer books
                    1 B1
                    1 B2
                    ...
                    ...
                    1 B1990

                    we can use the type to hold it
                    id number
                    books array_string

                    This OO design reduces the load of network and makes it easy for Java to parse.

                    But the problem is we have to pile up all the data in memory first and push them to client in a whole. If it's too huge, ORA-22813 will occur.
                    That's why I plan to resort to piplelined table function.

                    If ,as you said, I returned a cursor, it would be very difficult for Java to organize the data.

                    Best regards,
                    Leon
                    • 7. Re: pipelined parallel enable table function
                      Billy~Verreynne
                      user12064076 wrote:

                      In my application, I wrote stored procedures that returns collection of user-definded object types to Java.
                      Flawed approach using expensive private server memory (PL/SQL PGA) to cache SQL data and then push that data to the client - when the client can instead use the far more scalable and superior shared memory buffer cache instead.

                       
                      With object types, we can remove most of redundent data.
                      This statement does not make sense as the same applies to SQL and cursors. And removing redundant data is best done at the SQL engine itself - via partitioning pruning, predicates, only selecting the applicable columns for the SQL projection and so on.

                       
                      This OO design reduces the load of network and makes it easy for Java to parse.
                      Incorrect. It does not reduce network load - it may in fact increase it. As for Java "parsing" - that is a flawed approach from the onset if the client is required to parse data it receives from the database server. Parsing requires CPU time. Lots of parsing means a hefty CPU overhead that will degrade performance.

                      Perhaps you are using parsing out of context here? I find it difficult to believe that one would design an application and use a database server this way. Parsing means for example receiving XML data (as text) and then parsing it into an object-like structure for use.

                      Data from the database should be retrieved by the client in a binary structured format and not in a free format that requires the client to parse it into a structured format.

                       
                      But the problem is we have to pile up all the data in memory first and push them to client in a whole. If it's too huge, ORA-22813 will occur.
                      That's why I plan to resort to piplelined table function.
                      How is that going to solve the problem?

                      As I follow your logic:
                      1) You do not want to use cursor for some obscure (and likely completely unwarranted) reason.
                      2) You cannot return large PL/SQL collection variables without significantly denting PGA (private process memory) on the server and running into errors (this approach is conceptually flawed anyway)
                      3) You now use a pipeline table that runs PL/SQL code to run SQL code - and in turn needs to be executed by the client as a SQL using a cursor

                      So why put all the other moving parts (the pipeline code) in between when the client
                      a) must use SQL for access?
                      b) creates a cursor?

                       
                      If ,as you said, I returned a cursor, it would be very difficult for Java to organize the data.
                      A pipeline table needs to be used via a cursor. All DML statements received by Oracle from a client are parsed as cursors.

                      Read the Oracle® Database PL/SQL Language Reference guide section on "+Chaining Pipelined Table Functions for Multiple Transformation+".

                      The format for using a pipeline is (from this manual):
                      SELECT * FROM TABLE(table_function_name(parameter_list))
                      The "+pipeline+" is created by the SQL engine - it needs SQL to execute the PL/SQL code via the SQL TABLE() function.

                      You stated that the reason for using a pipeline is transforming a relational row data structure into an object structure. You do not need a pipeline for that. Plain vanilla SQL can do the same, without the overheads of using PL/SQ, fetching SQL data, and context switching inside the pipeline between the PL/SQL and SQL engines.

                      You simply call the constructor of the object class in the SQL projection and have the SQL cursor returning instantiated objects. E.g.
                      create or replace type TEmployee is object( 
                       .. property definitions ...
                      );
                      
                      create or replace type TEmployeeCollection is table of TEmployee;
                      
                      select
                        TEmployee( col1, col2, .., coln ) as EMP_COLLECTION
                      from emp
                      where dept_id = :0
                      and salary > :1
                      and date_employed >= :2
                      order by 
                        salary, date_employed
                      No need for PL/SQL code. No need for a pipeline. The client will open the cursor and fetch objects from a collection. The very same approach that the client would have used when fetching from a cursor on a pipeline table function.

                      Pipelines are best used as data transformation processes where SQL alone cannot perform the transformation. I have never in many years of designing and writing Oracle applications used a PL/SQL pipeline in production on a SQL table. Simply because SQL itself is capable and powerful enough to do the job - and do it faster and better.

                      Where I have used pipelines is for transforming data from external sources into SQL data sets. For example, a pipeline on a web service. Where the PL/SQL code constructs the SOAP envelope, does the HTTP call, parses the XML, and returns the contents as rows and columns - allowing one to run a SQL SELECT statement against web-service-turned-into-a-SQL-table.

                      If you do not mind me saying Leon - it seems to me that your approach is a typical Java approach that has very little understanding of database concepts and Oracle fundamentals. You cannot treat Oracle as a mere persistence layer. You cannot treat SQL and PL/SQL as a mere I/O interface for extracting data from Oracle and crunching that in Java. Not if you expect your Java system to perform and scale.

                      Get the Oracle layer right, use it correctly - and your application will perform and will scale. Guaranteed.

                      However, in my experience many J2EE proponents instead select to treat Oracle as a black box, no more advance than some kind of file system tasked with storing structured data, and attempt to do it all in the Java layer. And this a fail. And I have seen it failing - of the jaw dropping epic failures kind (hitting all national papers and media as a result and impacting ordinary people having to deal with government).

                      And that's a pity.. SQL and PL/SQL are superior to Java in this respect and are far more capable layers for dealing and crunching data in the database. Real world example - our busiest database's largest table is growing between 350 and 450 million rows a day and all our data crunching of the data in this table happens inside the database layer - and not in a Java layer. Oracle performs and scales magnificently.. when used correctly.
                      • 8. Re: pipelined parallel enable table function
                        user60022
                        Thank you so much, Billy.

                        Although you have some misunderstanding of my method due to my misleading and confusing description, you give me a good lesson and remind me of that ref cursor can return object type.

                        I got stuck by pipelined table function and forgot this important feature.

                        Last, I completely agree with you that sql based solution is much more scalable than Java-based one.

                        Best regards,
                        Leon
                        • 9. Re: pipelined parallel enable table function
                          user60022
                          Hi Billy,

                          I am thinking about a question. Another stored procedure uses the result of the first one to do calculation.

                          If it returns a collection, it's very easy to achieve this.

                          But if it returns a ref cursor, how should I implement this?

                          Do I have to fetch them into a collection first?

                          Except this, I don't know the way to take it as a table to be selected against.

                          Best regards,
                          Leon
                          • 10. Re: pipelined parallel enable table function
                            user60022
                            Sorry, I posted the duplicate contents. Please ignore this floor.

                            Edited by: user12064076 on Oct 27, 2010 6:49 PM
                            • 11. Re: pipelined parallel enable table function
                              Billy~Verreynne
                              user12064076 wrote:

                              I am thinking about a question. Another stored procedure uses the result of the first one to do calculation.
                              What type of calculation and on what data?

                              If the calculation is done on columns and rows.. then this calculation should be done using SQL and not PL/SQL. It is inherently slow using PL/SQL to process SQL data. It is even slower using Java and other languages.

                              Why?

                              Because the data has to be shipped from the database's buffer cache into PL/SQL memory space and variables. Or into Java memory and variables. When using SQL natively, there is no need to copy those data blocks containing row data to another memory area (which in a Java case, may reside on an app server, several milliseconds away across a tcp connection).

                              Pushing and pulling data between memory areas (PL/SQL PGA to/from database buffer cache) is an overhead - and overheads add up and impacts performance.

                              So no - I would not use PL/SQL code to perform calculations on SQL rows. It is not likely to scale and performance will be "iffy" and reliant on the amount of rows, number of context switches to be made, and so on.

                              There are other methods that can be considered too. If the calculation is deriving a value per row, why not pre-calculate it on update/insert? Pay up front for the calculation and spread that cost across the 1000's of transactions that happen - as oppose to pay the price all in one go via a single operation to perform the calculation.

                              There are materialised views that can be used - where rows are grouped and the calculation is done via an aggregation of values.
                              If it returns a collection, it's very easy to achieve this.
                              But if it returns a ref cursor, how should I implement this?
                              Do I have to fetch them into a collection first?
                              A cursor can be processed using bulk processing - and this should be the default approach to cursors. Bulk collect from a cursor and process the collection. But only if that processing is too complex to do in SQL and needs PL/SQL instead.

                              The key issue is: Think Data Set Processing and not Row Processing+
                              1 person found this helpful
                              • 12. Re: pipelined parallel enable table function
                                user60022
                                Hi Billy,

                                Here is the logic of my application.Let's assume we return the ref cusror instead of collection.

                                Stored procedure1:

                                open my_cusror for sql_statement

                                Please note the statememt is complex and is built with dymanic SQL and looks like

                                select due_date,
                                complete_date,
                                nested_tables,
                                nested_tables
                                from tables
                                where joins conditions and filter conditions

                                Stored procecedure2 will calculate the statistics based on due_date and complete_date of the result set of Stored procedure1.

                                To acheive this, I have to fetch the cursor into a collection first.And the use
                                select * from table(collection).

                                But I don't think it's superior to my previous method(return collection), since anyway I don't avoid of collection.

                                But without it, what should I do? It's not OK for me to redo the select that has been done in Stored procedure1. Correct?

                                What's your opinon?

                                Best regards,
                                Leon
                                • 13. Re: pipelined parallel enable table function
                                  Billy~Verreynne
                                  user12064076 wrote:

                                  Please note the statememt is complex and is built with dymanic SQL and looks like

                                  select due_date,
                                  complete_date,
                                  nested_tables,
                                  nested_tables
                                  from tables
                                  where joins conditions and filter conditions
                                  I dislike nested tables like this - it adds complexities to the SQL projection and can have performance implications.

                                  Why is this not a flat structure instead? (<i>control-break</i> processing on this flat structure will be equivalent to processing it using nested tables)
                                  Stored procecedure2 will calculate the statistics based on due_date and complete_date of the result set of Stored procedure1.

                                  To acheive this, I have to fetch the cursor into a collection first.And the use
                                  select * from table(collection).
                                  Why go to the SQL engine to process the collection? Can you not apply that SQL processing to the original SQL that creates the collection?

                                  How does that SQL create those nested tables - via multiset and cast? Why not take the SQL predicates and processing of the "+select * from table(collection)+". and adding that to the ref cursor SQL statement?

                                  Note that doing such a "+select * from table(collection)+" requires the collection (a PL/SQL variable) to be send to the SQL engine as a bind variable - and this is merely pushing the very same data the SQL engine just gave you, via the ref cursor, back to the SQL engine.. and does not make sense. It is a resource and performance overhead best avoided.

                                  If you do not need anything specific from that collection that require you to step out of PL/SQL and back into SQL, then simply stay in PL/SQL - and instead use a plain for loop to process the collection.

                                  I would also be hesitant to bulk collect from the ref cursor that contains nested tables/collections - these alone will require a chunk of PL/SQL memory to fetch. If you now bulk collect from the ref cursor, you are creating a collection that contains 2 sets of nested collections. This can be ugly for those rows that return large nested tables.

                                  Assuming that using nested collections in the SQL projection of the cursor is the best method (not convinced - only have used it in production for generating XM) then I would approach it along the following lines:
                                  open refcursor for 
                                    select 
                                            due_date,          
                                            complete_date,
                                            nested_table1,
                                            nested_tables
                                     from  tables
                                     where  joins conditions and filter conditions;
                                  
                                    ..
                                  loop
                                    fetch refcursor into buffer; // no bulk collect due to the nested tables being fetched
                                  
                                    // process 1st nested collection for example
                                    for i in 1..buffer.nested_table1.Count
                                    loop
                                      Proc2( buffer.due_date, buffer.complete_date, buffer.nested_table1 );
                                      ..
                                    end loop;
                                  
                                    exit when refcursor%notfound;
                                  end;
                                  
                                  close refcursor;
                                  What is not clear is what the proc2 output is and what needs to be done with this output. Does it need to go back into some SQL table? Does it need to be passed to an external client? Written to file/XML/etc?

                                  This will have a major impact on the design of the code.
                                  1 person found this helpful
                                  • 14. Re: pipelined parallel enable table function
                                    user60022
                                    Thank you very much, Billy. And I try to answer you quesion to make my problem clearer.

                                    <quote>
                                    Why is this not a flat structure instead? (control-break processing on this flat structure will be equivalent to processing it using nested tables)
                                    </quote>
                                    Because the architet uses the OO feature of the Oracle. In light of this, Java can get the object type from the stored procedures and get its attributes easily.

                                    <quote>
                                    How does that SQL create those nested tables - via multiset and cast? Why not take the SQL predicates and processing of the "select * from table(collection)". and adding that to the ref cursor SQL statement?
                                    </quote>
                                    Exactly. We use cast and multiset. And in stored procedure1, I do take your advice and change it to:

                                    v_sql varchar2(32000) := 'select due_date,
                                    complete_date,
                                    nested_tables,
                                    nested_tables
                                    from tables
                                    where joins conditions and filter conditions.';

                                    Open my_cursor for v-sql using
                                    bind_v1,
                                    bind_v2,
                                    ...;


                                    And in this SP, we really don't return a collection of object type. So far, so good.

                                    <quote>
                                    What is not clear is what the proc2 output is and what needs to be done with this output. Does it need to go back into some SQL table? Does it need to be passed to an external client? Written to file/XML/etc?
                                    </quote>
                                    In a short, I want to take the result of stored procedure1 as a table, and do some calculation. I can't use a loop, since this does part by part not as a whole. Right?
                                    That's why I think I have to fetch all the rows to a collection first and then use 'select * from table(collection)'.
                                    This is just my opinion. Because I can't get a way to bypass it.

                                    But if I do this, as you said, it has no difference from the way that stored procedure1 returns a collection.
                                    We will encounter the memory problems sooner or later.

                                    OK, I can do the sql again in the stored procedure2 in which case we don't have to rely on the result of stored procedure1 at all. But don't you think it's waste of
                                    resources and time?

                                    Best regards,
                                    Leon
                                    1 2 Previous Next