Forum Stats

  • 3,760,452 Users
  • 2,251,709 Discussions
  • 7,871,130 Comments

Discussions

Analytic / Model requirement

Jonathan Lewis
Jonathan Lewis Member Posts: 9,756 Gold Crown
edited Aug 15, 2015 12:44PM in SQL & PL/SQL

This is a puzzle I've set myself following a discussion about making choices between SQL and PL/SQL - so it's not urgent, not important, and not serious.

If I query dba_extents for a given table (e.g. sys.source$) the extent information looks like this:

select file_id, block_id, blocks

from dba_extents

where owner = 'SYS'

and segment_name = 'SOURCE$'

order by file_id, block_id

;

   FILE_ID   BLOCK_IDBLOCKS

---------- ---------- ----------

1504     8
8168     8
8176     8
8192     8
8288     8
8440     8
1 10072     8

...

1 77568   128
1 77696   128
1 77824   128
1 78080   128
1 89984   128

...

1 90752  1024

80 rows selected.

I have a piece of code which reads the exent list, joins it to a list of numbers to enumerate every block in each extent, sorts the blocks by file_id and block_id, applies an ntile(12) to result set, then picks the first and last block in each tile to produce an output which is essentially 12 rows of (first_file_id, first_block_id, last_file_id, last_block_id) - which I can convert to a covering set of rowid ranges for the table.  (This is essentially what dbms_parallel_execute does when you create rowid chunks - except it uses PL/SQL to do it).

My SQL does exactly the job needed, but is significantly slower than the PL/SQL equivalent - we're only talking a few seconds across the board for very large objects, so the difference is irrelevant for real production purposes - largely, I think, because I expand the size of the initial result set from the number of extents to the number of blocks then shrink it back down again while the PL/SQL can simply walk through the extent definitions doing simple arithmetic.

I'm sure there's a MODEL clause way of avoiding the explosion, and I'd love to see it if someone has the time, but I keep thinking I'm close to an analytic solution but can't quite get there. So if anyone can come up with a solution that would be even better than a model solution - failing that, can someone prove it can't be done efficiently in simple analytic SQL.

UPDATE:  I forgot to state explicitly that the point of doing the block explosion and ntile() was that it was a simple strategy for getting the same number (+/-1) of block in every rowid range.

Regards

Jonathan Lewis

Message was edited by: Jonathan Lewis

Martin PreissStew Ashtonchris227Sven W.Pavan Kumar

Best Answer

  • Stew Ashton
    Stew Ashton Member Posts: 2,861 Gold Trophy
    edited Jul 1, 2015 3:07AM Accepted Answer

    The solution:

    [Update: please see https://stewashton.wordpress.com/2015/07/01/splitting-a-table-into-rowid-ranges-of-equal-size/ for an updated, slightly cleaner solution. Thanks to Chris227, I learned what the WIDTH_BUCKET function is for!]

    with data as (
      select (select blocks / 12 from my_segments) blocks_per_chunk,
      object_id
      from my_objects
    )
    , extents as (
      select
      nvl(sum(blocks) over(
        order by file_id, block_id
        rows between unbounded preceding and 1 preceding
      ),0) cumul_start_blocks,
      sum(blocks) over(order by file_id, block_id) - 1 cumul_end_blocks,
      blocks, block_id, file_id,
      data.*
      from my_extents, data
    )
    , extents_with_chunks as (
      select
      trunc(cumul_start_blocks / blocks_per_chunk) first_chunk,
      trunc((cumul_end_blocks) / blocks_per_chunk) last_chunk,
      round(trunc(cumul_start_blocks / blocks_per_chunk)*blocks_per_chunk) first_chunk_blocks,
      round(trunc((cumul_end_blocks+1.0001) / blocks_per_chunk)*blocks_per_chunk)-1 last_chunk_blocks,
      e.* from extents e
    )
    , expanded_extents as (
      select first_chunk + level -1  chunk,
      cumul_start_blocks, file_id, block_id,
      case level when 1 then cumul_start_blocks
          else round((first_chunk + level -1)*blocks_per_chunk)
        end start_blocks,
        case first_chunk + level -1 when last_chunk then cumul_end_blocks
          else round((first_chunk + level)*blocks_per_chunk)-1
        end end_blocks 
      from (
        select * from extents_with_chunks
        where first_chunk_blocks = cumul_start_blocks
          or last_chunk_blocks = cumul_end_blocks
          or first_chunk < last_chunk
      )
      connect by cumul_start_blocks = prior cumul_start_blocks
      and first_chunk + level -1 <= last_chunk
      and prior sys_guid() is not null
    )
    select chunk,
    min(file_id) first_file_id,
    max(file_id) last_file_id,
    min(block_id + start_blocks - cumul_start_blocks)
      keep (dense_rank first order by cumul_start_blocks) first_block_id,
    max(block_id + end_blocks - cumul_start_blocks)
      keep (dense_rank last order by cumul_start_blocks) last_block_id,
    max(end_blocks) + 1 - min(start_blocks) blocks
    from expanded_extents
    group by chunk
    order by chunk;
    
    
    Martin Preisschris227
«13456

Answers

  • Sven W.
    Sven W. Member Posts: 10,533 Gold Crown
    edited Jun 22, 2015 9:13AM

    Could you share the pl/sql code so that we can compare the performance/effiency on our own system?

    I guess your sql looks a little like this one. It uses the analytic ntile function and I guess it is close to your version.

    select nt, min(file_id), max(file_id), min(block_id), max(block_id)
    from (select file_id, block_id, blocks, ntile(12) over (order by file_id, block_id) nt
          from dba_extents
          where owner = 'SYS'
          and segment_name = 'SOURCE$'
          ) v
    group by nt
    order by nt
    ;
    
    

    Maybe there is a way to improve the ntile mechanism. There could be a different way to calculate it ourselfs if we have the whole range available at the start.

    If the goal is to divide the 12 tiles evenly distributed among the blocks. then some additional work is needed. Like an cumulative sum over the blocks. It requires an additional sort, which slows down performance even more.

    select nt, min(file_id), max(file_id), min(block_id), max(block_id), max(cummulative_block) - min(cummulative_block) blocks_in_tile
    from (select file_id, block_id, blocks, cummulative_block, ntile(12) over (order by cummulative_block) nt
          from (
             select file_id, block_id, blocks,  sum(blocks) over (order by file_id, block_id) cummulative_block
             from dba_extents
             where owner = 'SYS'
             and segment_name = 'SOURCE$'
             )
       ) v
    group by nt
    order by nt;
    

    However the tile distribution doesn't look perfectly even. Not sure at first glance what is the reason for that.

    NT    MIN(FILE_ID)    MAX(FILE_ID)    MIN(BLOCK_ID)    MAX(BLOCK_ID)    BLOCKS_IN_TILE

    1    1    1    1496    9368    56

    2    1    1    9392    9728    56

    3    1    1    9856    20992    768

    4    1    1    22528    26240    768

    5    1    1    26752    60032    768

    6    1    1    64384    70272    768

    7    1    1    70400    72704    768

    8    1    1    72832    73984    768

    9    1    1    74624    81536    768

    10    1    1    81664    83712    768

    11    1    1    83968    84736    768

    12    1    1    84864    272640    6144

    Regards

    Sven

  • Stew Ashton
    Stew Ashton Member Posts: 2,861 Gold Trophy
    edited Jun 22, 2015 9:21AM

    I'll work on this too, but first DBMS_PARALLEL_EXECUTE doesn't do the same thing. It never produces a ROWID range that covers more than one extent.

  • Stew Ashton
    Stew Ashton Member Posts: 2,861 Gold Trophy
    edited Jun 22, 2015 9:32AM

    This is a type of "bin fitting" problem. An approximate version can be solved by the MODEL clause or MATCH_RECOGNIZE.

    Here is the MATCH_RECOGNIZE solution. I don't split extents up to get 12 exactly equal ranges, so it's imperfect. Also, I don't bother to calculate the ROWIDs since you knew how to do that before I knew what WHERE meant.

    select * from (
      select file_id, block_id, blocks, sum(blocks) over()/12 sum_blocks
      from dba_extents
      where owner = 'SYS'
      and segment_name = 'SOURCE$'
      order by file_id, block_id
    )
    match_recognize(
      order by file_id, block_id
      measures first(file_id) first_file_id,
        first(block_id) first_block_id,
        last(file_id) last_file_id,
        last(block_id) last_block_id,
        sum_blocks sum_blocs_wanted,
        sum(blocks) sum_blocks_found
      pattern(a+)
      define a as sum(blocks) <= sum_blocks
    );
    
    Jonathan Lewis
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,756 Gold Crown
    edited Jun 22, 2015 9:58AM

    Sven,

    Thanks for the offering; unfortunately the requirement is to spread the blocks evenly, you code spreads the extents by extent count - which is why the second code sample has uneven block counts, you've got extents of 8, 128, and 1024 blocks with 6 extents per chunk except for the first two chunks which have 7.

    Regards

    Jonathan Lewis

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,756 Gold Crown
    edited Jun 22, 2015 10:00AM

    Stew,

    Thanks for the correction - maybe it's a memory of some version of parallel execution. I know I've seen Oracle produce rowid ranges that cross extent boundaries (even across file boundaries and spanning the wrong object).  Wherever I've seen it, and even if I haven't, my target for the purposes of the exercise is 12 rowid ranges with identical (+/-1) blocks.

    Regards

    Jonathan Lewis.

  • Sven W.
    Sven W. Member Posts: 10,533 Gold Crown
    edited Jun 22, 2015 10:06AM

    Just noticed that the NTILE function is not so helpful for this type of problem. So I used a total sum calculation, just as Stew did with the Modul clause and put it into an analytic/aggregate solution.

    select nt, min(file_id), max(file_id), min(block_id), max(block_id)
          , max(c_block) sum_blocks
          , max(total_blocks) total_blocks
    from (select file_id, block_id, blocks, c_block, total_blocks
              , trunc(c_block / (total_blocks / 12)) nt
          from (
            select file_id, block_id, blocks
                ,sum(blocks) over (order by file_id, block_id) c_block
                ,sum(blocks) over () total_blocks
            from dba_extents
            where owner = 'SYS'
            and segment_name = 'SOURCE$'
      )) v
    group by nt
    order by nt
    
    
    

    The result is a much better distribution than my previous version using ntile.

    Jonathan Lewis
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,756 Gold Crown
    edited Jun 22, 2015 10:29AM

    Stew,

    Another feature for me to learn (but at least it's 12c, so I've got some excuse) ! I haven't even got to the model clause yet.

    This gave me 13 chunks rather than 12, and I think it would have got into trouble if I had hit the 1024 block extents in ASSM, but otherwise did a good job.

    I had to change last(block_id) to last(block_id + blocks - 1) - I'm not sure if that was the most appropriate change, but it wasn't covering the data until I did.

    Thanks

    Jonathan Lewis

  • chris227
    chris227 Member Posts: 3,517 Bronze Crown
    edited Jun 22, 2015 10:37AM

    I am ashemed to ask, but

    Would you mind giving some testdata and expected output?

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,756 Gold Crown
    edited Jun 22, 2015 12:57PM

    Background:

    I started with a copy of sys.source$ called test_user.t1, then queried dba_extents for test_user.t1 in my code, but I've copied the extent information into a table called extents to give you starting point. Table t1 is in a tablespace using ASSM, with system extent allocation, and two (otherwise empty) data files.  Here's the extent list as a set of inserts:

    create table extents(

        file_id number(2,0),

        block_id number(10,0),

        blocks number(6,0)

    );

    insert into extents values( 7, 128, 8);

    insert into extents values( 7, 136, 8);

    insert into extents values( 7, 144, 8);

    insert into extents values( 7, 152, 8);

    insert into extents values( 7, 160, 8);

    insert into extents values( 7, 168, 8);

    insert into extents values( 7, 176, 8);

    insert into extents values( 7, 184, 8);

    insert into extents values( 7, 192, 8);

    insert into extents values( 7, 200, 8);

    insert into extents values( 7, 208, 8);

    insert into extents values( 7, 216, 8);

    insert into extents values( 7, 224, 8);

    insert into extents values( 7, 232, 8);

    insert into extents values( 7, 240, 8);

    insert into extents values( 7, 248, 8);

    insert into extents values( 7, 256, 128);

    insert into extents values( 7, 384, 128);

    insert into extents values( 7, 512, 128);

    insert into extents values( 7, 640, 128);

    insert into extents values( 7, 768, 128);

    insert into extents values( 7, 896, 128);

    insert into extents values( 7, 1024, 128);

    insert into extents values( 7, 1152, 128);

    insert into extents values( 7, 1280, 128);

    insert into extents values( 7, 1408, 128);

    insert into extents values( 7, 1536, 128);

    insert into extents values( 7, 1664, 128);

    insert into extents values( 7, 1792, 128);

    insert into extents values( 7, 1920, 128);

    insert into extents values( 7, 2048, 128);

    insert into extents values( 7, 2176, 128);

    insert into extents values( 7, 2304, 128);

    insert into extents values( 7, 2432, 128);

    insert into extents values( 7, 2560, 128);

    insert into extents values( 7, 2688, 128);

    insert into extents values( 7, 2816, 128);

    insert into extents values( 7, 2944, 128);

    insert into extents values( 7, 3072, 128);

    insert into extents values( 7, 3200, 128);

    insert into extents values( 7, 3328, 128);

    insert into extents values( 7, 3456, 128);

    insert into extents values( 7, 3584, 128);

    insert into extents values( 7, 3712, 128);

    insert into extents values( 7, 3840, 128);

    insert into extents values( 7, 3968, 128);

    insert into extents values( 7, 4096, 128);

    insert into extents values( 7, 4224, 1024);

    insert into extents values( 7, 5248, 1024);

    insert into extents values( 6, 128, 128);

    insert into extents values( 6, 256, 128);

    insert into extents values( 6, 384, 128);

    insert into extents values( 6, 512, 128);

    insert into extents values( 6, 640, 128);

    insert into extents values( 6, 768, 128);

    insert into extents values( 6, 896, 128);

    insert into extents values( 6, 1024, 128);

    insert into extents values( 6, 1152, 128);

    insert into extents values( 6, 1280, 128);

    insert into extents values( 6, 1408, 128);

    insert into extents values( 6, 1536, 128);

    insert into extents values( 6, 1664, 128);

    insert into extents values( 6, 1792, 128);

    insert into extents values( 6, 1920, 128);

    insert into extents values( 6, 2048, 128);

    insert into extents values( 6, 2176, 128);

    insert into extents values( 6, 2304, 128);

    insert into extents values( 6, 2432, 128);

    insert into extents values( 6, 2560, 128);

    insert into extents values( 6, 2688, 128);

    insert into extents values( 6, 2816, 128);

    insert into extents values( 6, 2944, 128);

    insert into extents values( 6, 3072, 128);

    insert into extents values( 6, 3200, 128);

    insert into extents values( 6, 3328, 128);

    insert into extents values( 6, 3456, 128);

    insert into extents values( 6, 3584, 128);

    insert into extents values( 6, 3712, 128);

    insert into extents values( 6, 3840, 128);

    insert into extents values( 6, 3968, 128);

    insert into extents values( 6, 4096, 128);

    insert into extents values( 6, 4224, 1024);

    commit;

    Here's the SQL statement I have at present:

    set linesize 180

    set trimspool on

    set pagesize 0

    define 1 = TEST_USER

    define 2 = T1

    define 3 = 12

    define  m_owner = '&1'

    define    m_segment = '&2'

    define    m_tiles = &3

    spool temp_count.sql

    with extents as (

        select    file_id, block_id, blocks

        from    dba_extents

        where    owner = upper('&m_owner')

        and    segment_name = upper('&m_segment')

    ),

    expansion as (

        select    --+ materialize

            rownum id

        from dual

        connect by

            level <= (select max(blocks) from extents)

    ),

    expanded_blocks as (

        select

            ext.file_id, ext.block_id, ext.blocks,

            ext.block_id + exp.id - 1    individual_block

        from

            extents        ext,

            expansion    exp

        where  

            exp.id <= ext.blocks

    ),

    tiled as (

        select

            file_id, block_id, individual_block,

            ntile(&3) over (order by file_id, individual_block) tile

        from

            expanded_blocks

    ),

    ranges as(

        select

            distinct

            first_value(file_id) over (

                partition by tile order by file_id, individual_block

                rows between unbounded preceding and  unbounded following

            ) first_file,

            first_value(individual_block) over (

                partition by tile order by file_id, individual_block

                rows between unbounded preceding and  unbounded following

            ) first_block,

            last_value(file_id) over (

                partition by tile order by file_id, individual_block

                rows between unbounded preceding and  unbounded following

            ) last_file,

            last_value(individual_block) over (

                partition by tile order by file_id, individual_block

                rows between unbounded preceding and  unbounded following

            ) last_block

        from

            tiled

    ),

    rowids as (

        select

            first_file, first_block, last_file, last_block,

            dbms_rowid.rowid_create(

                rowid_type    =>    0, -- dbms_rowid.rowid_type_restricted,

                object_number    =>    0, -- dbms_rowid.rowid_object_undefined,

                relative_fno    =>    first_file,

                block_number    =>    first_block,

                row_number    =>    0

            )        start_rowid,

            dbms_rowid.rowid_create(

                rowid_type    =>    0, -- dbms_rowid.rowid_type_restricted,

                object_number    =>    0, -- dbms_rowid.rowid_object_undefined,

                relative_fno    =>    last_file,

                block_number    =>    last_block,

                row_number    =>    4095

            )        end_rowid

        from

            ranges

    )

    select

        'select /*+ rowid(s) */ count(*) ct from &m_owner..&m_segment s where rowid between chartorowid(''' ||

        dbms_rowid.rowid_to_extended(

            old_rowid    => start_rowid,

            schema_name    => upper('&1'),

            object_name    => upper('&2'),

            conversion_type    => 0 -- dbms_rowid.rowid_convert_internal

        )    ||

        ''') and chartorowid(''' ||

        dbms_rowid.rowid_to_extended(

            old_rowid    => end_rowid,

            schema_name    => upper('&1'),

            object_name    => upper('&2'),

            conversion_type    => 0 -- dbms_rowid.rowid_convert_internal

        )    ||

        ''') union all'

    from

        rowids

    order by

        first_file, first_block

    ;

    Here's the final result (which I edit  to remove the last union all, then run to count the rows in the table):

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAGAAAACAAAA') and chartorowid('AAAv3DAAGAAAAQqA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAGAAAAQrAAA') and chartorowid('AAAv3DAAGAAAAfVA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAGAAAAfWAAA') and chartorowid('AAAv3DAAGAAAAuAA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAGAAAAuBAAA') and chartorowid('AAAv3DAAGAAAA8rA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAGAAAA8sAAA') and chartorowid('AAAv3DAAGAAABLWA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAGAAABLXAAA') and chartorowid('AAAv3DAAHAAAAKBA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAHAAAAKCAAA') and chartorowid('AAAv3DAAHAAAAYsA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAHAAAAYtAAA') and chartorowid('AAAv3DAAHAAAAnXA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAHAAAAnYAAA') and chartorowid('AAAv3DAAHAAAA2BA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAHAAAA2CAAA') and chartorowid('AAAv3DAAHAAABErA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAHAAABEsAAA') and chartorowid('AAAv3DAAHAAABTVA//') union all

    select /*+ rowid(s) */ count(*) ct from TEST_USER.T1 s where rowid between chartorowid('AAAv3DAAHAAABTWAAA') and chartorowid('AAAv3DAAHAAABh/A//') union all

    The result if I stop at the ROWIDs subquery

         6     128      6   1066 00000080.0000.0006 0000042A.0FFF.0006
         6    1067      6   2005 0000042B.0000.0006 000007D5.0FFF.0006
         6    2006      6   2944 000007D6.0000.0006 00000B80.0FFF.0006
         6    2945      6   3883 00000B81.0000.0006 00000F2B.0FFF.0006
         6    3884      6   4822 00000F2C.0000.0006 000012D6.0FFF.0006
         6    4823      7    641 000012D7.0000.0006 00000281.0FFF.0007
         7     642      7   1580 00000282.0000.0007 0000062C.0FFF.0007
         7    1581      7   2519 0000062D.0000.0007 000009D7.0FFF.0007
         7    2520      7   3457 000009D8.0000.0007 00000D81.0FFF.0007
         7    3458      7   4395 00000D82.0000.0007 0000112B.0FFF.0007
         7    4396      7   5333 0000112C.0000.0007 000014D5.0FFF.0007
         7    5334      7   6271 000014D6.0000.0007 0000187F.0FFF.0007

    The result if I stop at the RANGES subquery - note that one range crosses two files.

    FIRST_FILE FIRST_BLOCK  LAST_FILE LAST_BLOCK

    ---------- ----------- ---------- ----------

             6         128          6       1066

             6        1067          6       2005

             6        2006          6       2944

             6        2945          6       3883

             6        3884          6       4822

             6        4823          7        641

             7         642          7       1580

             7        1581          7       2519

             7        2520          7       3457

             7        3458          7       4395

             7        4396          7       5333

             7        5334          7       6271

    Regards

    Jonathan Lewis

  • Sven W.
    Sven W. Member Posts: 10,533 Gold Crown
    edited Jun 22, 2015 2:07PM

    I think the base issue with this query is the usage of NTILE function. The data (blocks) is not evenly distributed among the rows. However NTILE works only for rows, therefor you need the expanded_blocks to get an even distribution. There is certainly a way to avoid that expansion+ntile.

    My last example used "trunc(c_block / (total_blocks / 12)) nt" to do something simliar. However it will not split or calculate the exact block.

    If we have an extent with 1024 blocks then all 1024 blocks will be put into one bucket. However it should be possible to find out which blocks to split.

    For example using something like min(mod(c_block, total_block / 12)).

    example

    with exts as (select file_id, block_id, blocks
                        ,sum(blocks) over (order by file_id, block_id) c_block
                        ,sum(blocks) over () total_blocks
                 from dba_extents
                 where owner = 'SYS'
                 and segment_name = 'SOURCE$'
             )
        ,exts_enhanced as
         (select file_id, block_id, blocks, c_block, total_blocks
               ,trunc(c_block / (total_blocks / 12)) nt
               ,mod(c_block,total_blocks / 12)  rest
               from exts
               )
    select nt, min(file_id), max(file_id), min(block_id), max(block_id), min(rest)
         , sum(ee.blocks) blocks_per_tile
    from exts_enhanced ee
    group by nt
    order by nt;
    

    The "rest" is the overlapping blocks that didn't make it from one tile to the next. This part might need to be added to the previouse tile and removed from the current tile row.

    I doubt that it is the proper logic yet, but the base picture for me seems to be to avoid expanding all the blocks, just to be able to work with ntile.

This discussion has been closed.