This discussion is archived
1 2 Previous Next 20 Replies Latest reply: Nov 20, 2012 1:16 PM by Roger Ford Go to original post RSS
  • 15. Re: Tune this "contains" query in Oracle 11g EE 11.2.0.3.0
    932383 Newbie
    Currently Being Moderated
    I had tried this, but it didn't make any difference. Thanks.
  • 16. Re: Tune this "contains" query in Oracle 11g EE 11.2.0.3.0
    932383 Newbie
    Currently Being Moderated
    I came up with the script to reproduce the problem. The problem can be reproduced with only 200,000 rows. Can anyone help?

    drop table items;
    
    create table items (
       "ITEM_NAME"                varchar2(100 byte),
        "ITEM_NUMBER"              varchar2(100 byte),
        "DESCRIPTION"              varchar2(4000 byte),
        "OWNER" number
    );
    /
    
    begin
      FOR Lcntr IN 1..100000
      loop
         insert into items (item_name, item_number, description, owner) values (dbms_random.string('A', 10), dbms_random.string('A', 10), dbms_random.string('L', 8) || ' ' || dbms_random.string('A', 4) || dbms_random.string('A', 5)  || ' ' || dbms_random.string('A', 10), dbms_random.value(1,10) );
      end loop;
    end;
    /
    
    begin
      FOR Lcntr IN 1..100000
      loop
         insert into items (item_name, item_number, description, owner) values (dbms_random.string('A', 10), dbms_random.string('A', 10), dbms_random.string('L', 8) || ' ' || dbms_random.string('A', 4) || '111'  || dbms_random.string('A', 5)  || ' ' || dbms_random.string('A', 10), 1234 );
      end loop;
    end;
    /
    
    commit;
    
    execute ctx_ddl.drop_preference('ENG_WORDLIST');
    execute ctx_ddl.create_preference('ENG_WORDLIST', 'BASIC_WORDLIST');
    execute ctx_ddl.set_attribute('ENG_WORDLIST','PREFIX_INDEX','TRUE');
    execute ctx_ddl.set_attribute('ENG_WORDLIST','PREFIX_MIN_LENGTH',1);
    execute ctx_ddl.set_attribute('ENG_WORDLIST','PREFIX_MAX_LENGTH',10);
    execute ctx_ddl.set_attribute('ENG_WORDLIST','SUBSTRING_INDEX','TRUE');
    execute ctx_ddl.set_attribute('ENG_WORDLIST','WILDCARD_MAXTERMS', 0);
    
    -- Create a lexer based on basic_lexer
    execute ctx_ddl.drop_preference('ENG_LEXER');
    EXECUTE CTX_DDL.CREATE_PREFERENCE ('ENG_LEXER', 'BASIC_LEXER');
    
    -- set special characters "@-_" as part of the words when they get indexed.
    EXECUTE CTX_DDL.SET_ATTRIBUTE ('ENG_LEXER', 'PRINTJOINS', '@-_');
    
    execute ctx_ddl.drop_preference('item_mult_preference');
    
    execute ctx_ddl.create_preference('items_multi_preference', 'MULTI_COLUMN_DATASTORE');
    execute ctx_ddl.set_attribute('items_multi_preference', 'columns', 'item_name, description,item_number');
    
    drop index items_text_index;
    create index items_text_index on items(description) indextype is ctxsys.context filter by owner parameters('LEXER ENG_LEXER WORDLIST ENG_WORDLIST STOPLIST CTXSYS.EMPTY_STOPLIST datastore items_multi_preference MEMORY 1024M') ;
    
    --it takes 600 seconds returning zero count.
    select count(*) from items where contains (description, '%111%') > 0 and owner = 12345;
    
    --it takes less than 1 second
    select count(*) from items where contains (description, '111%') > 0 and owner = 12345;
    Edited by: 929380 on Nov 19, 2012 3:33 PM
  • 17. Re: Tune this "contains" query in Oracle 11g EE 11.2.0.3.0
    Roger Ford Expert
    Currently Being Moderated
    In this testcase, your %111% expands to half a million tokens. Each of these tokens requires its own memory structure, which then needs to be OR'd with the other half miliion tokens, before being ANDed with the SDATA information for the owner. That's a massive amount of internal processing.

    We used to limit WILDCARD_MAXTERMS to 15000, then 50000 to avoid this sort of problem. But people requested that it be unlimited - hence the issue we see here.

    This is obviously an artificial and contrived example. What's the real problem you're trying to solve? Does it require double-truncated searches on numeric strings?
  • 18. Re: Tune this "contains" query in Oracle 11g EE 11.2.0.3.0
    932383 Newbie
    Currently Being Moderated
    First of all, this is not an artificial problem. We do have a real business problem here. The application we have is cloud based and has a lot of tenants which are distinguished by owner. I use the script with artificial data to simplify the reproducible steps. Second of all, using sdata doesn't solve the performance issue as I would have hoped for, even when there is no row having owner as 12345 in my example. There is a definitely filtering order issue here. It seems that the optimizer always does the filtering on description first, then on owner. In this case, it should analyze the data for description and owner and make a smarter decision on which one should go first.
    --it takes about 9 minutes in my environment even though there is no row having owner as 12345.
    select count(*) from items where contains (description, '(sdata (owner = 12345)) and ' || :g_desc) > 0
    Edited by: 929380 on Nov 20, 2012 7:52 AM

    Edited by: 929380 on Nov 20, 2012 7:53 AM
  • 19. Re: Tune this "contains" query in Oracle 11g EE 11.2.0.3.0
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    Thanks for providing the test case; It was helpful in illustrating the problem. Although in theory using sdata to combine the text columns with the owner column into one index should be efficient, as you discovered it does not know which to evaluate first. As a workaround, you can create a separate regular index on the owner column, in which case you might as well remove the filter by from the context index. When I tested the following on my system, it returned the result quickly, using a plan that accessed both indexes with some bitmap conversion and such. Please test it on your system and let us know what results you get.
    -- remove filter by from context index:
    drop index items_text_index
    /
    create index items_text_index 
    on items(description) 
    indextype is ctxsys.context 
    parameters
      ('LEXER      ENG_LEXER 
        WORDLIST   ENG_WORDLIST 
        STOPLIST   CTXSYS.EMPTY_STOPLIST 
        datastore  items_multi_preference 
        MEMORY     1024M')
    /
    -- add regular index on owner:
    create index owner_idx on items (owner)
    /
    -- gather statistics:
    exec dbms_stats.gather_table_stats (user, 'ITEMS')
    -- revised query:
    set timing on
    set autotrace on explain
    select count(*) 
    from   items
    where  owner = 12345
    and    contains (description, '%111%') > 0
    /
  • 20. Re: Tune this "contains" query in Oracle 11g EE 11.2.0.3.0
    Roger Ford Expert
    Currently Being Moderated
    I appreciate you must have a real business case, I'm just wondering how closely it resembles this test. For example, if you're searching real words then it's very unlikely that any "double truncated" search will expand to half a million unique words. On the other hand, if you're searching code values of some sort then it might be possible - but in that case there may be a better solution. Oracle Text is primarily designed for searching text - the further away from the "searching for words within a document" type of search, the less well it works without some clever thinking about the model involved.

    SDATA is designed as a filter, it can't be used as a primary driver within a query to then attempt to do a word match on the resulting rows. A query such as the one Barbara suggests is usually better if the numeric value is highly selective. And if there are relatively few values for "owner" then you should consider range partitioning the table on the owner column, and creating a local index. That will improve things greatly - in fact this query should return instantly (since the index partition for owner = 12345 will be empty).
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points