This discussion is archived
4 Replies Latest reply: Jul 24, 2013 9:26 AM by Mike Kutz RSS

DBMS_PARALLEL_EXECUTE chunk size

Gwydion Newbie
Currently Being Moderated

Hi,

 

I'd to process data of a table in parallel using DBMS_PARALLEL_EXECUTE. I need to keep a single chunk rather small (about 100 rows) but this doesn't work.

I tried

dbms_parallel_execute.create_chunks_by_rowid(

      task_name   => g_task_name,

      table_owner => user,

      table_name  => 'QUEUE',

      by_row      => TRUE,

      chunk_size  => 100 );

 

However, the chunk size was much larger, 1500 rows or more. I know that chunk size is just an approximation, but even when I tried setting the chunk size to a single row or block, the smallest chunk I could get was about 500 rows. I need small chunks, since I want to send the data later via HTTP and the requests should be rather small. I could split each chunk manually, but I would prefer to avoid this extra work. Any idea?

  • 1. Re: DBMS_PARALLEL_EXECUTE chunk size
    BillyVerreynne Oracle ACE
    Currently Being Moderated

    Perhaps rows/block plays an issue. When chunking, I would think one performance goal would be to read a data block only once - and not multiple times by different parallel processes, each lifting only 1 or 2 rows from the data block. Ideally have a single parallel process read the data block and do all the rows in it - as that means less I/O in total. And I/O is what is horrible expensive.

     

    In which case I would have designed the chunking interface to use data block as boundaries, and not rowids in the same data block as boundary. (kind of like using word boundaries in C/Pascal when defining structs to improve performance)

     

    Perhaps check if something like this is the case?

  • 2. Re: DBMS_PARALLEL_EXECUTE chunk size
    Mike Kutz Expert
    Currently Being Moderated

    I've ran into this similar issue myself.

    However, my chunk sizes were either 10x the number of rows I wanted or 0 rows per chunk.

    Without getting into too much details, the rows moved thus causing a lot of space between min(ROWID) and max(ROWID).

     

    Remember, the algorithm used to find :START_ROWID and :END_ROWID for each chunk is very simplistic and is based on the min/max values.

    As such, it isn't going to do what you want 100% of the time.

     

    I'll assume there are major gaps in your PK thus preventing you from using chunk_by_number.

    You will have to go with the "CHUNK_BY_SQL" route.

    The 'easiest to code' method may be writing a pipelined function that will be used by the 'chunk_by_sql'

     

    PS - 100 rows for HTTP sounds small.

    If this is a web-service, you want to make 1 call for the 100 rows for each chunk, not 100 calls of 1 row each for each chunk.

    This is the same concept as SQL.  Bulk Process, not slow-by-slow.

  • 3. Re: DBMS_PARALLEL_EXECUTE chunk size
    Gwydion Newbie
    Currently Being Moderated

    Exactly, I want to make one HTTP request per 100 rows. Actually, I generate a rather large XML from each row, so the actual request isn't that small.

    I will try using CHUNK_BY_SQL, currently I simply split the chunk into smaller slices within the task. This isn't really a problem, but I needed to implement some extra lines of code.

  • 4. Re: DBMS_PARALLEL_EXECUTE chunk size
    Mike Kutz Expert
    Currently Being Moderated

    I just thought about the SQL statement for CHUNK_BY_SQL.

    I don't think you need to use a pipelined function.

    A single SQL statement with some Analytics should do just nicely.

     

    row_number() to sequentialize the PKs

    mod() to find out the START_ID of each chunk

    lead() to find out the next START_ID (then you subtract 1 to get END_ID)

    max() for the END_ID of the last chunk since LEAD() will produce NULL

     

    I haven't tested it, but this is what I came up with.

    with start_id_list as (
      -- find the first row for each chunk
      select PK, MAX_PK
      from (
        select PK
          ,mod(
              row_number() over (order by PK)
            ,N ) mod_n -- N === # of rows per chunk
          ,max(PK) over () max_pk -- last chunk's END_ID
        from T
      )
    where mod_n=1-- row_number() is 1-based index. you want mod()-1=0
    )
    select PK as start_id
      ,lead(PK,1,max_pk+1) over (order by PK) - 1
       AS END_ID
    from start_id_list

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points