7 Replies Latest reply: Nov 28, 2012 11:27 AM by Barbara Boehmer RSS

    Contains return all the records when the query string matches the columns

    932383
      I used the multi_column_datastore preference and created an index on three columns (item_name, description,owner_part_number). Now if I do a search:
      select * from items where contains(description, 'description') > 0;
      It returns all the rows in items table, but not all the rows have "description" as a word. I guess Oracle text assumes the query intends to get all the rows as the query string matches one of the column names. My question is whether Oracle Text has any preference settings to alter this behavior?
      execute ctx_ddl.create_preference('item_multi_preference', 'MULTI_COLUMN_DATASTORE');
      execute ctx_ddl.set_attribute('item_multi_preference', 'columns', 'item_name, description,owner_part_number');
      create index item_text_index on items(description) indextype is ctxsys.context filter by owner parameters('LEXER ENG_LEXER WORDLIST ENG_WORDLIST STOPLIST CTXSYS.EMPTY_STOPLIST datastore item_multi_preference MEMORY 1024M');
      Thanks.

      Jun Gao
        • 1. Re: Contains return all the records when the query string matches the columns
          Roger Ford-Oracle
          OT creates "section tags" around the column data.

          Your queries are picking up the section tag, because it hasn't been told to ignore them.

          If you use a section group based on AUTO_SECTION_GROUP it should sort this.
          execute ctx_ddl.create_section_group ('autosec', 'AUTO_SECTION_GROUP');
          
          create index ... parameters( '... section group autosec ... ')
          • 2. Re: Contains return all the records when the query string matches the columns
            Barbara Boehmer
            It looks like a basic_section_group fixes the problem as well, as demonstrated below and I believe a basic_section_group may be more efficient than auto_section_group.
            SCOTT@orcl_11gR2> -- recreation of problem:
            SCOTT@orcl_11gR2> drop table items
              2  /
            
            Table dropped.
            
            SCOTT@orcl_11gR2> create table items (
              2       "ITEM_NAME"             varchar2(100 byte),
              3        "ITEM_NUMBER"              varchar2(100 byte),
              4        "DESCRIPTION"              varchar2(4000 byte),
              5        "OWNER" number
              6  )
              7  /
            
            Table created.
            
            SCOTT@orcl_11gR2> begin
              2    FOR Lcntr IN 1..100
              3    loop
              4         insert into items (item_name, item_number, description, owner)
              5         values (dbms_random.string('A', 10),
              6              dbms_random.string('A', 10),
              7              dbms_random.string('L', 8) || ' '
              8              || dbms_random.string('A', 4)
              9              || dbms_random.string('A', 5)  || ' '
             10              || dbms_random.string('A', 10),
             11              dbms_random.value(1,10) );
             12    end loop;
             13  end;
             14  /
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> begin
              2    FOR Lcntr IN 1..100
              3    loop
              4         insert into items (item_name, item_number, description, owner)
              5         values (dbms_random.string('A', 10),
              6              dbms_random.string('A', 10),
              7              dbms_random.string('L', 8) || ' '
              8              || dbms_random.string('A', 4) || '111'
              9              || dbms_random.string('A', 5)  || ' '
             10              || dbms_random.string('A', 10), 1234 );
             11    end loop;
             12  end;
             13  /
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> exec ctx_ddl.drop_preference('ENG_WORDLIST');
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> begin
              2    ctx_ddl.create_preference('ENG_WORDLIST', 'BASIC_WORDLIST');
              3    ctx_ddl.set_attribute('ENG_WORDLIST','PREFIX_INDEX','TRUE');
              4    ctx_ddl.set_attribute('ENG_WORDLIST','PREFIX_MIN_LENGTH',1);
              5    ctx_ddl.set_attribute('ENG_WORDLIST','PREFIX_MAX_LENGTH',10);
              6    ctx_ddl.set_attribute('ENG_WORDLIST','SUBSTRING_INDEX','TRUE');
              7    ctx_ddl.set_attribute('ENG_WORDLIST','WILDCARD_MAXTERMS', 0);
              8  end;
              9  /
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> execute ctx_ddl.drop_preference('ENG_LEXER');
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> begin
              2    CTX_DDL.CREATE_PREFERENCE ('ENG_LEXER', 'BASIC_LEXER');
              3    CTX_DDL.SET_ATTRIBUTE ('ENG_LEXER', 'PRINTJOINS', '@-_');
              4  end;
              5  /
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> execute ctx_ddl.drop_preference('items_multi_preference');
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> begin
              2    ctx_ddl.create_preference('items_multi_preference', 'MULTI_COLUMN_DATASTORE');
              3    ctx_ddl.set_attribute('items_multi_preference', 'columns', 'item_name, description,item_number');
              4  end;
              5  /
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> create index items_text_index
              2  on items(description)
              3  indextype is ctxsys.context
              4  parameters
              5    ('LEXER         ENG_LEXER
              6        WORDLIST   ENG_WORDLIST
              7        STOPLIST   CTXSYS.EMPTY_STOPLIST
              8        datastore  items_multi_preference
              9        MEMORY     1024M')
             10  /
            
            Index created.
            
            SCOTT@orcl_11gR2> create index owner_idx on items (owner)
              2  /
            
            Index created.
            
            SCOTT@orcl_11gR2> exec dbms_stats.gather_table_stats (user, 'ITEMS')
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> select count(*)
              2  from   items
              3  where  contains (description, 'description') > 0
              4  /
            
              COUNT(*)
            ----------
                   200
            
            1 row selected.
            
            SCOTT@orcl_11gR2> -- correction of problem:
            SCOTT@orcl_11gR2> exec ctx_ddl.drop_section_group ('items_sec')
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> begin
              2    ctx_ddl.create_section_group ('items_sec', 'basic_section_group');
              3    ctx_ddl.add_field_section ('items_sec', 'item_name', 'item_name', true);
              4    ctx_ddl.add_field_section ('items_sec', 'description', 'description', true);
              5  end;
              6  /
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> drop index items_text_index
              2  /
            
            Index dropped.
            
            SCOTT@orcl_11gR2> create index items_text_index
              2  on items(description)
              3  indextype is ctxsys.context
              4  parameters
              5    ('LEXER         ENG_LEXER
              6        WORDLIST   ENG_WORDLIST
              7        STOPLIST   CTXSYS.EMPTY_STOPLIST
              8        datastore  items_multi_preference
              9        MEMORY     1024M
             10        section group items_sec')
             11  /
            
            Index created.
            
            SCOTT@orcl_11gR2> select count(*)
              2  from   items
              3  where  contains (description, 'description') > 0
              4  /
            
              COUNT(*)
            ----------
                     0
            
            1 row selected.
            • 3. Re: Contains return all the records when the query string matches the columns
              Ebalthes-Oracle
              By default MULTI_COLUMN_DATASTORE Attribute "delimiter" is set to COLUMN_NAME_TAG, thus the expected results with all rows.

              To avoid this behavior specify the delimiter as NEWLINE or use section groups.

              execute ctx_ddl.set_attribute('item_multi_preference','delimiter','NEWLINE');
              • 4. Re: Contains return all the records when the query string matches the columns
                Roger Ford-Oracle
                I'd forgotten about that option, thanks Edwin.

                Barbara - I wondered if BASIC_SECTION_GROUP would work as well, but it was late when I replied so I didn't try it. You're quite right that it would be more effiicient - at least in terms of index since we wouldn't need to store the various section tag postings, but there should be no difference in simple query performance.
                • 5. Re: Contains return all the records when the query string matches the columns
                  Barbara Boehmer
                  If you use the newline delimiter with the multi_column_datastore, then you will not be able to search using a section group (of any kind) and WITHIN, as there will be no section tags to search within.
                  • 6. Re: Contains return all the records when the query string matches the columns
                    932383
                    Thanks all for your responses.

                    We don't support the section group and WITHIN, would using the newline delimiter have better performance? or it's no difference from other options?

                    Jun Gao
                    • 7. Re: Contains return all the records when the query string matches the columns
                      Barbara Boehmer
                      I believe it should take less index storage space with the newline delimiter instead of section tags.