This discussion is archived
2 Replies Latest reply: Mar 27, 2012 7:41 PM by 798578 RSS

nested table, performance

798578 Newbie
Currently Being Moderated
I have a need to use a collection as a column in a table, i.e nested table. (this is 10.2.0.3)

The use case being that a user could choose multiple items for a given thing and we want to store those choices.
OF course there is more the data model but this example suffices.

What I would love to be able to do is use the submultiset operator, because it does what I want.
But it performs very very poorly. In the below example it won't even return.

What I am driving at is that I need to be able to query this data in a performant way. Using a table of type is key, it make thing in java very easy as it is an array there and can be passed directly to the table and back out again. Also result sets we want it back with 1 row and then a collection of choices as one of the columns that come back. i.e. we want the results back, but don't want a row for each result.

I'll post this and then reply to it with my findings to aid in reading this.
First here is a script that illustrates all this...

/**** begin script ****/
create or replace type multiselect_type as table of varchar2(100);
/


create table multi_selection (
multi_selection_id number constraint multi_selection_pk primary key,
user_selection multiselect_type
)
nested table user_selection store as multi_selection_user_selection;


create index multi_selection_idx1
on multi_selection_user_selection(column_value);

declare
v_selection multiselect_type;
begin
for i in 1..100000 loop
if i between 1 and 5000 then
v_selection:=multiselect_type('red');
elsif i between 5001 and 50000 then
v_selection:=multiselect_type('red','green');
elsif i between 50001 and 50500 then
v_selection:=multiselect_type('yellow');
elsif i between 50501 and 90000 then
v_selection:=null;
else
v_selection:=multiselect_type('red','green','yellow');
end if;

insert into multi_selection
(multi_selection_id, user_selection)
values
(i,v_selection);
end loop;
commit;
end;
/

begin dbms_stats.gather_table_stats(ownname=>user,tabname=>'multi_selection',cascade=>true);end;
/

set autotrace on
prompt The next 2 sqls will not run, just shown for reference run at your own risk (they never return)

prompt select count(*) from multi_selection ms
prompt where multiselect_type('yellow') submultiset of ms.user_selection;

prompt select count(*) from multi_selection ms
prompt where multiselect_type('yellow') member of ms.user_selection;

select count(*) from multi_selection ms, table(ms.user_selection) user_choice
where 'yellow' in user_choice.column_value;

select count(*) from multi_selection ms
where exists (select 1 from multi_selection ms2, table(ms2.user_selection) user_choice
     where 'yellow' in user_choice.column_value
     and ms2.multi_selection_id = ms.multi_selection_id);
     
select count(*)
from multi_selection ms, (select distinct ms2.multi_selection_id
               from multi_selection ms2, table(ms2.user_selection) user_choice
               where 'yellow' in user_choice.column_value) uc
where ms.multi_selection_id = uc.multi_selection_id;     
/**** end script ****/
  • 1. Re: nested table, performance
    798578 Newbie
    Currently Being Moderated
    so the first query I wanted to use that fit the bill.
    i.e. it doesn't explode the data out but can do my comparison is this.

    select count(*) from multi_selection ms where multiselect_type('yellow') in user_selection;
    or
    select count(*) from multi_selection ms where multiselect_type('yellow') submultiset of ms.user_selection;

    neither will even return they just churn and churn and churn. In my more real world example I can get them to return but they touch way too many data blocks.

    this.
    select count(*) from multi_selection ms, table(ms.user_selection) user_choice
    where 'yellow' in user_choice.column_value;
    is fast, it explodes my data out but only hits ~123 blocks.

    I wish I could do a distinct on this but can not as a distinct on the nested table results causes an oracle error.
    ORA-00932: inconsistent datatypes: expected - got MULTISELECT_TYPE
    nor can I do any grouping on it

    Other options involve either an inline view, or a subquery.

    --30K blocks
    select count(*) from multi_selection ms
    where exists (select 1 from multi_selection ms2, table(ms2.user_selection) user_choice
         where 'yellow' in user_choice.column_value
         and ms2.multi_selection_id = ms.multi_selection_id);

    --11K blocks     
    select count(*)
    from multi_selection ms, (select distinct ms2.multi_selection_id
                   from multi_selection ms2, table(ms2.user_selection) user_choice
                   where 'yellow' in user_choice.column_value) uc
    where ms.multi_selection_id = uc.multi_selection_id;     

    Doing a classic relational example is actually worse performance wise.
    i.e.
    create table multi_selection2 (
    multi_selection_id number constraint multi_selection_rdbms_pk primary key
    );
    create table multi_selection2_choices (
    multi_selection_id number
    constraint multi_selection2_choices_fk1
    references multi_selection2(multi_selection_id),
    user_selection varchar2(100),
    constraint multi_selection2_choices_pk primary key (multi_selection_id, user_selection)
    );
    create unique index multi_selection2_choices_idx1 on multi_selection2_choices(user_selection, multi_selection_id);
    insert /*+APPEND*/ into multi_selection2
    (multi_selection_id)
    select multi_selection_id from multi_selection;
    commit;

    insert /*+APPEND*/ into multi_selection2_choices
    (multi_selection_id, user_selection)
    select ms.multi_selection_id, user_choice.column_value
    from multi_selection ms, table(ms.user_selection) user_choice;
    commit;

    begin dbms_stats.gather_table_stats(ownname=>user, tabname=>'multi_selection2',cascade=>true);end;
    /
    begin dbms_stats.gather_table_stats(ownname=>user, tabname=>'multi_selection2_choices',cascade=>true);end;
    /
    set autotrace on
    select count(*) from multi_selection2 ms
    where exists (select 1 from multi_selection2_choices msc
    where msc.multi_selection_id = ms.multi_selection_id
    and msc.user_selection = 'yellow');

    That results in 100K blocks accessed.

    /******** QUESTION **********/
    My question to the community is this. I know most avoid using any sort of object in the db, but in my case it actually makes the most sense for how the data is to be brought back and stored.

    Is there a better way to get what I want, i.e. use the index on the nested table without exploding out the results into rows (like the inline view / subquery does).
    123 blocks i pretty impressive performance for this, and I would like to tap that level of performance but get the results like the subquery/inline views return.

    Perhaps there is a way to get submultiset to use the index?
    thx
  • 2. Re: nested table, performance
    798578 Newbie
    Currently Being Moderated
    one last note.
    I can do this with my relational tables and get a lot of what I want.

    select MS.*, cursor (select user_selection from MULTI_SELECTION2_CHOICES b where B.MULTI_SELECTION_ID = ms.multi_selection_id)
    from MULTI_SELECTION2 MS, (select distinct MULTI_SELECTION_ID from MULTI_SELECTION2_CHOICES MSC
    where MSC.USER_SELECTION in ('blue','yellow')) c
    where ms.multi_selection_id = c.multi_selection_id;

    as this hits ~400 blocks and does the data level like I want.
    It's just it would be nice if I can keep this all as my collection since in java/jdbc that directly translates to an ARRAY type that can be passed around in java.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points