8 Replies Latest reply: Nov 30, 2012 5:54 AM by user8704911 RSS

    parallel pipelined table functions - pipe row(...) slowed down if high load

    user8704911
      Hi,

      my application is using parallel pipelined table functions to produce result records.
      The main session collects the results from the invoked table functions and stores them into a global temporary table (output GTT)
      Now, if i start a lot of table function instances and generate heavy load (actually overload) for the CPU, the efforts (time) for the table functions to just do their pipe row(...) is increasing.
      I am wondering if this is a direct result of CPU overload, because the consuming main thread cannot catch up with collecting the result records?
      Or is it a matter of missing parallel DML for insert into GTT?
      Or maybe inserting into GTT a bad idea for generated result records via table functions? (is GTT taking up more CPU than a real table?)

      The execution plan for the table function invocation is that:
      Insert Statement                                (output GTT collecting results from the parallel pipelined table functions)
        Load Table Conventional
          PX Coordinator
            PX Send QC (Random)
              VIEW                                    (left-over from 10g issue using 'SELECT * FROM (SELECT *' to avoid delay between table functions worker threads finished and saving their results into output GTT)
                 VIEW
                   COLLECTION ITERATOR PICKLER FETCH
                     PX BLOCK ITERATOR
                       TABLE ACCESS FULL              (input GTT data streamed into the parallel pipelined table functions via cursor)
        
      - many thanks!

      best regards,
      Frank
        • 1. Re: parallel pipelined table functions - pipe row(...) slowed down if high load
          BluShadow
          That looks like your issue...
          COLLECTION ITERATOR PICKLER FETCH
          What version of the database are you using?
          I think the optimiser was improved in 10.2.0.4 (or was it 11g?) to help prevent the pickler fetch.

          Of course the other issue could be to do with your code... but as you've not shown us that... we can't tell.
          • 2. Re: parallel pipelined table functions - pipe row(...) slowed down if high load
            BluShadow
            Just looked it up in my example library... it was changed in 10.2.0.3.

            Example (some XML processing that caused pickler fetches)...

            Based on response from mdrake on this thread:
            Re: XML file processing into oracle

            Reading XML using a schema...

            (User needs Create any type permission)
            declare
              SCHEMAURL VARCHAR2(256) := 'http://xmlns.example.org/xsd/testcase.xsd';
              XMLSCHEMA VARCHAR2(4000) := '<?xml version="1.0" encoding="UTF-8"?>
                 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xdb="http://xmlns.oracle.com/xdb" xdb:storeVarrayAsTable="true">
                    <xs:element name="cust_order" type="cust_orderType" xdb:defaultTable="CUST_ORDER_TBL"/>
                    <xs:complexType name="groupType" xdb:maintainDOM="false">
                            <xs:sequence>
                                    <xs:element name="item" type="itemType" maxOccurs="unbounded"/>
                            </xs:sequence>
                            <xs:attribute name="id" type="xs:byte" use="required"/>
                    </xs:complexType>
                    <xs:complexType name="itemType" xdb:maintainDOM="false">
                            <xs:simpleContent>
                                    <xs:extension base="xs:string">
                                            <xs:attribute name="id" type="xs:short" use="required"/>
                                            <xs:attribute name="name" type="xs:string" use="required"/>
                                    </xs:extension>
                            </xs:simpleContent>
                    </xs:complexType>
                    <xs:complexType name="cust_orderType" xdb:maintainDOM="false">
                            <xs:sequence>
                                    <xs:element name="group" type="groupType" maxOccurs="unbounded"/>
                            </xs:sequence>
                            <xs:attribute name="cust_id" type="xs:short" use="required"/>
                    </xs:complexType>
                 </xs:schema>';
              INSTANCE  CLOB :=
            '<cust_order cust_id="12345">
              <group id="1">
                <item id="1" name="Standard Mouse">100</item>
                <item id="2" name="Keyboard">100</item>
                <item id="3" name="Memory Module 2Gb">200</item>
                <item id="4" name="Processor 3Ghz">25</item>
                <item id="5" name="Processor 2.4Ghz">75</item>
              </group>
              <group id="2">
                <item id="1" name="Graphics Tablet">15</item>
                <item id="2" name="Keyboard">15</item>
                <item id="3" name="Memory Module 4Gb">15</item>
                <item id="4" name="Processor Quad Core 2.8Ghz">15</item>
              </group>
              <group id="3">
                <item id="1" name="Optical Mouse">5</item>
                <item id="2" name="Ergo Keyboard">5</item>
                <item id="3" name="Memory Module 2Gb">10</item>
                <item id="4" name="Processor Dual Core 2.4Ghz">5</item>
                <item id="5" name="Dual Output Graphics Card">5</item>
                <item id="6" name="28inch LED Monitor">10</item>
                <item id="7" name="Webcam">5</item>
                <item id="8" name="A3 1200dpi Laser Printer">2</item>
              </group>
            </cust_order>';                 
            begin
              dbms_xmlschema.registerSchema
                (
                 schemaurl       => SCHEMAURL
                ,schemadoc       => XMLSCHEMA
                ,local           => TRUE
                ,genTypes        => TRUE
                ,genBean         => FALSE
                ,genTables       => TRUE
                ,ENABLEHIERARCHY => DBMS_XMLSCHEMA.ENABLE_HIERARCHY_NONE
                );
              execute immediate 'insert into CUST_ORDER_TBL values (XMLTYPE(:INSTANCE))' using INSTANCE;
            end;
            /
            
            SQL> desc CUST_ORDER_TBL
            Name                                                                                                                                    Null?    Type
            ----------------------------------------------------------------------------------------------------------------------------------------------- -------- ------------------------------------------------------------------------------------------------
            TABLE of SYS.XMLTYPE(XMLSchema "http://xmlns.example.org/xsd/testcase.xsd" Element "cust_order") STORAGE Object-relational TYPE "cust_orderType222_T"
            
            set autotrace on explain
            set pages 60 lines 164 heading on
            
            col cust_id format a8
            
            
            select extract(object_value,'/cust_order/@cust_id') as cust_id
                  ,grp.id as group_id, itm.id as item_id, itm.inm as item_name, itm.qty as item_qty
            from   CUST_ORDER_TBL
                  ,XMLTABLE('/cust_order/group'
                            passing object_value
                            columns id   number       path '@id'
                                   ,item xmltype      path 'item'
                           ) grp
                  ,XMLTABLE('/item'
                            passing grp.item
                            columns id   number       path '@id'
                                   ,inm  varchar2(30) path '@name'
                                   ,qty  number       path '.'
                           ) itm
            /
            Need at least 10.2.0.3 for performance i.e. to avoid COLLECTION ITERATOR PICKLER FETCH in execution plan...

            On 10.2.0.1:
            Execution Plan
            ----------------------------------------------------------
            Plan hash value: 3741473841
            
            -------------------------------------------------------------------------------------------------------------
            | Id  | Operation                          | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
            -------------------------------------------------------------------------------------------------------------
            |   0 | SELECT STATEMENT                   |                        | 24504 |    89M|   873   (1)| 00:00:11 |
            |   1 |  NESTED LOOPS                      |                        | 24504 |    89M|   873   (1)| 00:00:11 |
            |   2 |   NESTED LOOPS                     |                        |     3 | 11460 |   805   (1)| 00:00:10 |
            |   3 |    TABLE ACCESS FULL               | CUST_ORDER_TBL         |     1 |  3777 |     3   (0)| 00:00:01 |
            |*  4 |    INDEX RANGE SCAN                | SYS_IOT_TOP_774117     |     3 |   129 |     1   (0)| 00:00:01 |
            |   5 |   COLLECTION ITERATOR PICKLER FETCH| XMLSEQUENCEFROMXMLTYPE |       |       |            |       |
            -------------------------------------------------------------------------------------------------------------
            
            Predicate Information (identified by operation id):
            ---------------------------------------------------
            
               4 - access("NESTED_TABLE_ID"="CUST_ORDER_TBL"."SYS_NC0000900010$")
                   filter("SYS_NC_TYPEID$" IS NOT NULL)
            
            Note
            -----
               - dynamic sampling used for this statement
            On 10.2.0.3:
            Execution Plan
            ----------------------------------------------------------
            Plan hash value: 1048233240
            
            ---------------------------------------------------------------------------------------------
            | Id  | Operation               | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
            ---------------------------------------------------------------------------------------------
            |   0 | SELECT STATEMENT        |                   |    17 |   132K|   839   (0)| 00:00:11 |
            |   1 |  NESTED LOOPS           |                   |    17 |   132K|   839   (0)| 00:00:11 |
            |   2 |   MERGE JOIN CARTESIAN  |                   |    17 |   131K|   805   (0)| 00:00:10 |
            |   3 |    TABLE ACCESS FULL    | CUST_ORDER_TBL    |     1 |  3781 |     3   (0)| 00:00:01 |
            |   4 |    BUFFER SORT          |                   |    17 | 70839 |   802   (0)| 00:00:10 |
            |*  5 |     INDEX FAST FULL SCAN| SYS_IOT_TOP_56154 |    17 | 70839 |   802   (0)| 00:00:10 |
            |*  6 |   INDEX UNIQUE SCAN     | SYS_IOT_TOP_56152 |     1 |    43 |     2   (0)| 00:00:01 |
            |*  7 |    INDEX RANGE SCAN     | SYS_C006701       |     1 |       |     0   (0)| 00:00:01 |
            ---------------------------------------------------------------------------------------------
            
            Predicate Information (identified by operation id):
            ---------------------------------------------------
            
               5 - filter("SYS_NC_TYPEID$" IS NOT NULL)
               6 - access("SYS_NTpzENS1H/RwSSC7TVzvlqmQ=="."NESTED_TABLE_ID"="SYS_NTnN5b8Q+8Txi9V
                          w5Ysl6x9w=="."SYS_NC0000600007$")
                   filter("SYS_NC_TYPEID$" IS NOT NULL AND
                          "NESTED_TABLE_ID"="CUST_ORDER_TBL"."SYS_NC0000900010$")
               7 - access("SYS_NTpzENS1H/RwSSC7TVzvlqmQ=="."NESTED_TABLE_ID"="SYS_NTnN5b8Q+8Txi9V
                          w5Ysl6x9w=="."SYS_NC0000600007$")
            
            Note
            -----
               - dynamic sampling used for this statement
            CLEAN UP...
            DROP TABLE CUST_ORDER_TBL purge;
            
            exec dbms_xmlschema.deleteschema('http://xmlns.example.org/xsd/testcase.xsd');
            • 3. Re: parallel pipelined table functions - pipe row(...) slowed down if high load
              user8704911
              Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production     
              PL/SQL Release 11.2.0.3.0 - Production                                           
              CORE     11.2.0.3.0     Production                                                         
              TNS for Linux: Version 11.2.0.3.0 - Production                                   
              NLSRTL Version 11.2.0.3.0 - Production                                           
              I cannot post the code - it's too much.
              Hope it find time to create a demo program to post it here (i'll try).
              I was just wondering if there is something well-known already about "strange increase" for 'pipe row(....)' efforts (time) due to e.g.:
              a) overload situations
              b) inadequate implementation for consuming the results

              Thanks for the hint about
              COLLECTION ITERATOR PICKLER FETCH
              I'll check, what it means and if it can be avoided/improved.

              rgds,
              Frank
              • 4. Re: parallel pipelined table functions - pipe row(...) slowed down if high load
                user8704911
                could it be a problem that my table functions pipe out SQL data objects
                which my main thread is inserting into a GTT?
                Would it be faster to pipe out PL/SQL records instead and insert those into the GTT?
                There is something about it here: http://mwidlake.wordpress.com/2011/08/11/pickler-fetch-what-is-it/
                - talking about "cast into a table"

                ???

                rgds,
                Frank

                Edited by: user8704911 on Nov 28, 2012 3:03 AM
                • 5. Re: parallel pipelined table functions - pipe row(...) slowed down if high load
                  user8704911
                  Here is a test program - pls. see below

                  on my system i have a difference between
                  a) insert into GTT from 10 TFs instance : Elapsed: 00:00:01.593
                  b) insert into GTT from backup GTT      : Elapsed: 00:00:00.500
                  The difference between both seems to depend on the structure/size of the objects my table functions pipe out.
                  When i started with small number of columns in GTT_Result, there was no big discrepancy.
                  After i grew it, the difference increased.
                  Maybe it's a matter of message buffer size?
                  I checked for this:
                  SQL> show parameter  parallel_execution_message_size;
                  
                  NAME                                 TYPE        VALUE
                  ------------------------------------ ----------- ------------------------------
                  parallel_execution_message_size      integer     16384
                  But i don't know, if that size is too small(?!)
                  --
                  -- Test Program to measure performance of parallel pipelined table functions
                  -- inserting results into GTT
                  -- and compare with pure insert into GTT from a backup image
                  -- 
                  -- Frank Bergemann (29/11/2012)
                  --
                  
                  set timing off
                  
                  commit;
                  
                  -- create some GTT for Results piped out by parallel pipelined table functions
                  -- (primary image)
                  
                  drop table GTT_Result;
                  
                  create global temporary table GTT_Result 
                  (
                    a1 number,
                    b1 varchar2(240),
                    c1 timestamp(3) with time zone,
                    d1 date,
                    a2 number,
                    b2 varchar2(240),
                    c2 timestamp(3) with time zone,
                    d2 date,
                    a3 number,
                    b3 varchar2(240),
                    c3 timestamp(3) with time zone,
                    d3 date,
                    a4 number,
                    b4 varchar2(240),
                    c4 timestamp(3) with time zone,
                    d4 date,
                    a5 number,
                    b5 varchar2(240),
                    c5 timestamp(3) with time zone,
                    d5 date,
                    a6 number,
                    b6 varchar2(240),
                    c6 timestamp(3) with time zone,
                    d6 date
                  ) on commit preserve rows;
                  
                  alter table GTT_Result add constraint GTT_RESULT_UK unique(a1);
                  
                  -- create a secondary image to save a job's results
                  -- in order to test for pure transfer from secondary image to primary image
                  -- (for comparing)
                  drop table GTT_Result_Backup;
                  
                  create global temporary table GTT_Result_Backup on commit preserve rows 
                  as (select * from GTT_Result where 0 =1) ;
                  
                  -- create and populate a driving table just pumping a sequence of number to the TFs
                  drop table GTT_Input;
                  
                  create global temporary table GTT_Input  (
                    ctr number) on commit preserve rows;
                  
                  alter table GTT_Input parallel 10; -- using 10 TF instances in parallel
                  truncate table GTT_Input;
                  insert into GTT_Input select level from dual src connect by level <= 100000; commit;
                  select count(*) from GTT_Input;
                  
                  -- drop what to create next
                  drop function TestFunc;
                  drop type t_tons; -- table of numbers
                  drop type Result_List;
                  drop type Result_Object;
                  
                  -- a SQL data type matching the GTT result table
                  -- (objects piped out by TFs)
                  create or replace type Result_Object as object (
                    a1 number,
                    b1 varchar2(240),
                    c1 timestamp(3) with time zone,
                    d1 date,
                    a2 number,
                    b2 varchar2(240),
                    c2 timestamp(3) with time zone,
                    d2 date,
                    a3 number,
                    b3 varchar2(240),
                    c3 timestamp(3) with time zone,
                    d3 date,
                    a4 number,
                    b4 varchar2(240),
                    c4 timestamp(3) with time zone,
                    d4 date,
                    a5 number,
                    b5 varchar2(240),
                    c5 timestamp(3) with time zone,
                    d5 date,
                    a6 number,
                    b6 varchar2(240),
                    c6 timestamp(3) with time zone,
                    d6 date
                  );
                  
                  -- table type
                  create or replace type Result_List is table of Result_Object;
                  
                  -- table for input data feed into the TFs
                  create or replace type t_tons is table of number;
                  
                  -- create table functions
                  create or replace function TestFunc(
                      cur in sys_refcursor)
                  return result_list pipelined
                  parallel_enable(partition cur by any)
                  is
                    tons t_tons;
                    obj  result_object;
                    sid number;
                  begin
                     select userenv('SID') into sid from dual;
                     obj := result_object(
                      0,'A',current_timestamp,trunc(current_timestamp),
                      sid,'B',current_timestamp, trunc(current_timestamp),
                      2,'C',current_timestamp, trunc(current_timestamp),
                      3,'D',current_timestamp, trunc(current_timestamp),
                      4,'E',current_timestamp, trunc(current_timestamp),
                      5,'F',current_timestamp, trunc(current_timestamp));
                     
                    fetch cur bulk collect into tons;
                    
                    if tons.count > 0
                    then
                      for i in tons.first..tons.last
                      loop
                        obj.a1 := tons(i);
                        pipe row(obj);
                      end loop;
                    end if;
                    return;
                  end;
                  
                  -- use parallel pipelined table functions piping out SQL objects
                  truncate table GTT_Result;
                  set timing on
                  insert into GTT_Result dst 
                  select * from table(TestFunc(CURSOR(select /*+ first_rows dynamic_sampling(src) */ ctr from GTT_Input src) )) tf;
                  set timing off
                  commit;
                  -- activate next lines if you want to verify, that multiple TF instances had been invoked
                  -- and how many output records had been generated by each of them
                  -- select a2, count(*) from GTT_Result group by a2;
                  -- select count(*) from GTT_Result;
                  
                  -- fill data of run into some backup image
                  truncate table GTT_Result_Backup;
                  insert into GTT_Result_Backup select * from GTT_Result; commit;
                  commit;
                  
                  -- check time to just transfer the same amount of records from backup GTT image to base GTT image
                  -- (to compare the performance)
                  truncate table GTT_Result;
                  set timing on
                  insert into GTT_Result select * from GTT_Result_Backup;
                  set timing off
                  commit;
                  • 6. Re: parallel pipelined table functions - pipe row(...) slowed down if high load
                    user8704911
                    missed the autotrace output
                    Autotrace aktiviert
                    Zeigt den Ausführungsplan und die Statistiken der Anweisung an.
                    100.000 Zeilen eingefügt.
                    Elapsed: 00:00:01.641
                    Plan hash value: 1494467376
                     
                    ---------------------------------------------------------------------------------------------------------------------------------
                    | Id  | Operation                             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
                    ---------------------------------------------------------------------------------------------------------------------------------
                    |   0 | INSERT STATEMENT                      |            |  8168 |  6971K|    21   (0)| 00:00:01 |        |      |            |
                    |   1 |  LOAD TABLE CONVENTIONAL              | GTT_RESULT |       |       |            |          |        |      |            |
                    |   2 |   PX COORDINATOR                      |            |       |       |            |          |        |      |            |
                    |   3 |    PX SEND QC (RANDOM)                | :TQ10000   |  8168 |  6971K|    21   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
                    |   4 |     VIEW                              |            |  8168 |  6971K|    21   (0)| 00:00:01 |  Q1,00 | PCWP |            |
                    |   5 |      COLLECTION ITERATOR PICKLER FETCH| TESTFUNC   |  8168 |       |    21   (0)| 00:00:01 |  Q1,00 | PCWP |            |
                    |   6 |       PX BLOCK ITERATOR               |            | 98728 |  1253K|     2   (0)| 00:00:01 |  Q1,00 | PCWC |            |
                    |   7 |        TABLE ACCESS FULL              | GTT_INPUT  | 98728 |  1253K|     2   (0)| 00:00:01 |  Q1,00 | PCWP |            |
                    ---------------------------------------------------------------------------------------------------------------------------------
                     
                    Note
                    -----
                       - dynamic sampling used for this statement (level=2)
                    
                       Statistics
                    -----------------------------------------------------------
                                   0  user rollbacks
                                   0  global enqueue releases
                                   0  physical read requests optimized
                                   0  physical write total multi block requests
                                  38  hot buffers moved to head of LRU
                                   0  commit wait performed
                                   0  global undo segment hints helped
                                   0  global undo segment hints were stale
                                   0  IMU commits
                                   1  IMU Flushes
                    Autotrace deaktiviert
                    festgeschrieben.
                    And here's the one for the insert from the backup GTT
                    Autotrace aktiviert
                    Zeigt den Ausführungsplan und die Statistiken der Anweisung an.
                    100.000 Zeilen eingefügt.
                    Elapsed: 00:00:00.532
                    Plan hash value: 34101902
                     
                    ----------------------------------------------------------------------------------------------
                    | Id  | Operation                | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
                    ----------------------------------------------------------------------------------------------
                    |   0 | INSERT STATEMENT         |                   | 84136 |    76M|   464   (1)| 00:00:06 |
                    |   1 |  LOAD TABLE CONVENTIONAL | GTT_RESULT        |       |       |            |          |
                    |   2 |   TABLE ACCESS FULL      | GTT_RESULT_BACKUP | 84136 |    76M|   464   (1)| 00:00:06 |
                    ----------------------------------------------------------------------------------------------
                     
                    Note
                    -----
                       - dynamic sampling used for this statement (level=2)
                    
                       Statistics
                    -----------------------------------------------------------
                                   0  user rollbacks
                                   0  global enqueue releases
                                   0  physical read requests optimized
                                   0  physical write total multi block requests
                                   6  hot buffers moved to head of LRU
                                   0  commit wait performed
                                   0  global undo segment hints helped
                                   0  global undo segment hints were stale
                                   0  IMU commits
                                   1  IMU Flushes
                    Autotrace deaktiviert
                    festgeschrieben.
                    Edited by: user8704911 on Nov 29, 2012 6:23 AM
                    • 7. Re: parallel pipelined table functions - pipe row(...) slowed down if high load
                      user8704911
                      Can somebody confirm my observation? (see demo program and results above)
                      Maybe some hint for database parameters to tune for this?

                      - many thanks!

                      best regards,
                      Frank
                      • 8. Re: parallel pipelined table functions - pipe row(...) slowed down if high load
                        user8704911
                        i just missed to activate parallel dml for inserting into the target (result data) GTT
                        rgds,
                        Frank