7 Replies Latest reply: May 17, 2012 4:49 PM by 876250 RSS

    Wild Card search not working as intended

    876250
      Hi Gurus,

      Wild Card search is not working as intended, Could you please help me figure out where I am going wrong.
      -- creating a table 
      create table test_sh4 (text1 clob,text2 clob,text3 clob);
      
      -- inserting some data into table
      declare
           var1 clob := 'Garmin Approach G5 Golf GPS';
           var2 clob := 'Give your game a boost of confidence with Approach G5, a rugged, waterproof, touchscreen golf GPS packed with thousands of preloaded golf course maps. Approach uses a high-sensitivity GPS receiver to measure individual shot distances and show the exact yardage to fairways, hazards and greens. Approach displays and updates your exact position on stunningly detailed, preloaded course maps throughout the United States and Canada. Approach''s highly sensitive GPS receiver pinpoints your position and removes guesswork from your game. And as you move, Approach automatically updates your position, so you''ll always know your yardage. There are no subscriptions or setup fees, and Approach is compliant with USGA and R&A rules.';     
      begin
           insert into test_sh4 values (var1,var2,'y');
           commit;
      end;     
      
      /
      
      -- creating preferences 
      
      begin
         ctx_ddl.create_preference ('test_mcd', 'multi_column_datastore'); -- utilizing the same index for multiple columns
          ctx_ddl.set_attribute
            ('test_mcd',
             'columns',
             'regexp_replace(text1,''[[:punct:]*|[[:space:]*]'') nd1, -- virtual column to eliminate white spaces and punctuation marks  in text1 column 
              text1 text1,  
              regexp_replace(text2,''[[:punct:]*|[[:space:]*]'') nd2,  -- virtual column to eliminate white spaces and punctuation marks   in text2 column 
              text2 text2');
          ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
          ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&'); --translating special characters as white space using lexer
          ctx_ddl.create_section_group ('test_sg', 'basic_section_group'); -- creating section group to search within sections.
          ctx_ddl.add_field_section ('test_sg', 'text1', 'text1', true);
          ctx_ddl.add_field_section ('test_sg', 'nd1', 'nd1', true);
          ctx_ddl.add_field_section ('test_sg', 'text2', 'text2', true);
          ctx_ddl.add_field_section ('test_sg', 'nd2', 'nd2', true);
           ctx_ddl.create_preference('SUBSTRING_PREF', 'BASIC_WORDLIST');
          ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
           ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_INDEX','TRUE');
           ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_MIN_LENGTH', '3');
           ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_MAX_LENGTH', '8');
           ctx_ddl.create_preference('textstore', 'BASIC_STORAGE');
          ctx_ddl.set_attribute('textstore', 'I_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K next 64K)');
          ctx_ddl.set_attribute('textstore', 'K_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K next 64K)');
           ctx_ddl.set_attribute('textstore', 'R_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K) lob (data) store as (cache)');
          ctx_ddl.set_attribute('textstore', 'N_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K)');
          ctx_ddl.set_attribute('textstore', 'I_INDEX_CLAUSE','tablespace CAPT_INDEX storage (initial 64K) compress 2');
          ctx_ddl.set_attribute('textstore', 'P_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K)');
      end;
      /
      
      
      create index capone.IX_test_sh4  on test_sh4 (text3)
      indextype is ctxsys.context
      parameters
          ('datastore test_mcd
            lexer          test_lex1
            section group  test_sg  
             wordlist SUBSTRING_PREF  
             SYNC ( ON COMMIT)
             storage textstore
             MEMORY 50M '
             );
             
      -- following query does not give any results 
      
      SQL> SELECT SCORE(1) score,t.*
        2  FROM test_sh4 t WHERE CONTAINS (text3,' <query>
        3  <textquery>
        4  <progression>
        5  <seq>(%GOLFCOURSEMAP%) within nd1</seq>
        6  <seq>(%GOLFCOURSEMAP%) within nd2</seq>
        7  </progression>
        8  </textquery>
        9  <score datatype="FLOAT" algorithm="default"/>
       10  </query>',1) >1   ORDER BY score(1) DESC   ;
      
      no rows selected
      
      following query using regular expression function gives results.
      
      SQL> select count(*) from  test_sh4 where upper( regexp_replace(text2,'[[:punct:]*|[[:space:]*]') )  like '%GOLFCOURSEMAP%';
      
        COUNT(*)
      ----------
               1
      Please help me figure out why nd2 preference is not working as intended.

      regexp_replace(text2,''[[:punct:]*|[[:space:]*]'') nd2, -- virtual column to eliminate white spaces in text2 column
        • 1. Re: Wild Card search not working as intended
          Roger Ford-Oracle
          If I understand this correctly, you're removing all the word-spacing characters from the text, turning it into one long word, then expecting a double-truncated wild-card search to work.

          It won't.

          Tokens are limited to 64 characters in length, so you'll only get the first 64 characters of the text indexed.

          Even if there wasn't such a limitation, this would be a really bad way to use Oracle Text. Oracle Text indexes at the word level - that's how it manages to be faster than "like". By removing all the boundaries between words you're removing any chance Oracle Text has of efficiently indexing your text.
          • 2. Re: Wild Card search not working as intended
            876250
            Thanks Roger for the quick reply.

            But one of our requirement is to remove all white spaces and punctuation marks and then do a wild card search like the example which I gave above. Is there anyway I can achieve the same ?
            • 3. Re: Wild Card search not working as intended
              Roger Ford-Oracle
              Not with Oracle Text, I'm afraid. You'll have to use LIKE and do full-table scans. There just isn't any efficient way to index "any random substring" of a long string.
              • 5. Re: Wild Card search not working as intended
                Barbara Boehmer
                873247,

                I have to wonder if you understand how Oracle text works and what it is capable of. Do you understand, for example, that it can search for the individual words without eliminating white spaces or punctuations? Please see the simplified demonstration below. A lot of people mistakenly think that they have to use wildcards to find words within a string, as if they were using LIKE, but this is not so. Using CONTAINS is more like using INSTR, but with an index and scoring and lots of other options.
                SCOTT@orcl_11gR2> -- table:
                SCOTT@orcl_11gR2> create table test_sh4 (text1 clob,text2 clob,text3 clob);
                
                Table created.
                
                SCOTT@orcl_11gR2> -- data:
                SCOTT@orcl_11gR2> set define off
                SCOTT@orcl_11gR2> declare
                  2    var1 clob := 'Garmin Approach G5 Golf GPS';
                  3    var2 clob := 'Give your game a boost of confidence with Approach G5, a rugged,
                  4                 waterproof, touchscreen golf GPS packed with thousands of preloaded
                  5                 golf course maps. Approach uses a high-sensitivity GPS receiver to
                  6                 measure individual shot distances and show the exact yardage to
                  7                 fairways, hazards and greens. Approach displays and updates your
                  8                 exact position on stunningly detailed, preloaded course maps throughout
                  9                 the United States and Canada. Approach''s highly sensitive GPS receiver
                 10                 pinpoints your position and removes guesswork from your game. And as
                 11                 you move, Approach automatically updates your position, so you''ll
                 12                 always know your yardage. There are no subscriptions or setup fees,
                 13                 and Approach is compliant with USGA and R&A rules.';
                 14  begin
                 15    insert into test_sh4 values (var1,var2,'y');
                 16  end;
                 17  /
                
                PL/SQL procedure successfully completed.
                
                SCOTT@orcl_11gR2> -- datastore and section group:
                SCOTT@orcl_11gR2> begin
                  2    ctx_ddl.create_preference ('test_mcd', 'multi_column_datastore');
                  3    ctx_ddl.set_attribute ('test_mcd', 'columns', 'text1 text1, text2 text2');
                  4    ctx_ddl.create_section_group ('test_sg', 'basic_section_group');
                  5    ctx_ddl.add_field_section ('test_sg', 'text1', 'text1', true);
                  6    ctx_ddl.add_field_section ('test_sg', 'text2', 'text2', true);
                  7  end;
                  8  /
                
                PL/SQL procedure successfully completed.
                
                SCOTT@orcl_11gR2> --index:
                SCOTT@orcl_11gR2> create index IX_test_sh4 on test_sh4 (text3)
                  2  indextype is ctxsys.context
                  3  parameters
                  4        ('datastore test_mcd
                  5          section group  test_sg');
                
                Index created.
                
                SCOTT@orcl_11gR2> -- query:
                SCOTT@orcl_11gR2> VARIABLE search_string VARCHAR2(100)
                SCOTT@orcl_11gR2> EXEC :search_string := 'GOLF COURSE MAP'
                
                PL/SQL procedure successfully completed.
                
                SCOTT@orcl_11gR2> COLUMN text2 FORMAT A45 WORD_WRAPPED
                SCOTT@orcl_11gR2> SELECT SCORE(1) score, t.*
                  2  FROM   test_sh4 t
                  3  WHERE  CONTAINS
                  4             (text3,
                  5              '<query>
                  6              <textquery>
                  7                <progression>
                  8                  <seq>(' || REPLACE (:search_string, ' ', '% ') || '%) within text1</seq>
                  9                  <seq>(' || REPLACE (:search_string, ' ', '% ') || '%) within text2</seq>
                 10                </progression>
                 11              </textquery>
                 12              <score datatype="FLOAT" algorithm="default"/>
                 13            </query>',
                 14              1) > 0
                 15  ORDER  BY score(1) DESC;
                
                     SCORE
                ----------
                TEXT1
                --------------------------------------------------------------------------------
                TEXT2
                ---------------------------------------------
                TEXT3
                --------------------------------------------------------------------------------
                       1.5
                Garmin Approach G5 Golf GPS
                Give your game a boost of confidence with
                Approach G5, a rugged,
                waterproof, touchscreen golf GPS packed with
                thousands of preloaded
                golf course maps. Approach uses a
                high-sensitivity GPS receiver to
                measure individual shot distances and show
                the exact yardage to
                fairways, hazards and greens. Approach
                displays and updates your
                exact position on stunningly detailed,
                preloaded course maps throughout
                the United States and Canada. Approach's
                highly sensitive GPS receiver
                pinpoints your position and removes guesswork
                from your game. And as
                you move, Approach automatically updates your
                position, so you'll
                always know your yardage. There are no
                subscriptions or setup fees,
                and Approach is compliant with USGA and R&A
                rules.
                y
                
                
                1 row selected.
                • 6. Re: Wild Card search not working as intended
                  876250
                  Thanks Barbara,


                  I kind of understood what we can achieve using Oracle text. What we were trying to achieve is, Even if user inputs GOLFCOURSEMAP as the search string, Results should be returned that was the reason we were trying to eliminate white spaces and punctuation marks and storing as a virtual column. I had got this technique from your earlier posts {thread:id=2254805} and {thread:id=2319384}, That time I forgot to mention that we were using wild card search as well.



                  Thanks & Regards,
                  Vikas
                  • 7. Re: Wild Card search not working as intended
                    876250
                    ndata would have handled this scenario well but our database version is 11.1.0.7 and cannot use that feature here.