7 Replies Latest reply: May 8, 2012 7:55 AM by Roger Ford-Oracle RSS

    single quote in basic lexer preference

    876250
      Hi Gurus,

      Using basic_lexer preference I wanted to consider columns having single quotes *'* also as white space. Please help me to achieve it.
      Original Code
      -------------------------
          ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
          ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&');
      
      I tried using q operator
      
      SQL>  begin
       ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
       ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|\'-_+&q'');\
       end;  2    3    4
        5  /
      Enter value for q:
      
      
      I also tried 
      
      SQL>
      SQL>  begin
       ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
       ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&\'');
       end;  2    3    4
        5  /
      ERROR:
      ORA-01756: quoted string not properly terminated
      
      I tried this but it is not indexing as intended.
      
      SQL>  begin
       ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
       ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&'''); --translating special characters as white space using lexer
       end;  2    3    4
        5  /
      
      PL/SQL procedure successfully completed
      please help me here.
        • 1. Re: single quote in basic lexer preference
          876250
          Also is there any way to convey to lexer saying consider only Alphanumeric characters and consider any other characters as *"whitespace"*
          • 2. Re: single quote in basic lexer preference
            Barbara Boehmer
            It looks like the last method works, as shown below. Note that if you are trying to search for a single letter to the right of the quote mark, that such individual letters are stop words, unless you use an empty stoplist or a user-defined stoplist that does not include them. If this does not solve your problem, then please post a test case, as I have done below, that includes table structure, sample data, index creation, and query that demonstrates the problem.
            SCOTT@orcl_11gR2> create table test_tab (test_col  clob)
              2  /
            
            Table created.
            
            SCOTT@orcl_11gR2> insert all
              2  into test_tab (test_col) values ('It isn''t Oracle''s fault.')
              3  into test_tab (test_col) values ('other data without quotes')
              4  select * from dual
              5  /
            
            2 rows created.
            
            SCOTT@orcl_11gR2> select * from test_tab
              2  /
            
            TEST_COL
            --------------------------------------------------------------------------------
            It isn't Oracle's fault.
            other data without quotes
            
            2 rows selected.
            
            SCOTT@orcl_11gR2> begin
              2    ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
              3    ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&''');
              4  end;
              5  /
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> create index test_idx on test_tab (test_col)
              2  indextype is ctxsys.context
              3  parameters
              4    ('lexer        test_lex1
              5        stoplist  ctxsys.empty_stoplist')
              6  /
            
            Index created.
            
            SCOTT@orcl_11gR2> select token_text from dr$test_idx$i
              2  /
            
            TOKEN_TEXT
            ----------------------------------------------------------------
            DATA
            FAULT
            ISN
            IT
            ORACLE
            OTHER
            QUOTES
            S
            T
            WITHOUT
            
            10 rows selected.
            
            SCOTT@orcl_11gR2> select * from test_tab where contains (test_col, 'isn t') > 0
              2  /
            
            TEST_COL
            --------------------------------------------------------------------------------
            It isn't Oracle's fault.
            
            1 row selected.
            
            SCOTT@orcl_11gR2> select * from test_tab where contains (test_col, 'Oracle s') > 0
              2  /
            
            TEST_COL
            --------------------------------------------------------------------------------
            It isn't Oracle's fault.
            
            1 row selected.
            • 3. Re: single quote in basic lexer preference
              Barbara Boehmer
              873247 wrote:
              Also is there any way to convey to lexer saying consider only Alphanumeric characters and consider any other characters as *"whitespace"*
              Just by declaring all of the non-alphanumeric characters as whitespace. There may be some overriding conflicts where, by default, such things have other meanings that may need to be modified as well, such as punctuations.
              • 4. Re: single quote in basic lexer preference
                876250
                Thanks a lot Barabara for quick reply.


                Let me paste my code here.
                create table test_sh4 (text1 clob,text2 clob,text3 clob);
                 
                
                insert  into test_sh4  values ('ORACLE''s','It isn''t Oracle''s fault.','y');
                
                
                 
                begin
                   ctx_ddl.create_preference ('test_mcd', 'multi_column_datastore'); -- utilizing the same index for multiple columns
                    ctx_ddl.set_attribute
                      ('test_mcd',
                       'columns',
                       'replace (text1, '' '', '''') nd1, -- virtual column to eliminate white spaces in text1 column 
                        text1 text1,  
                        replace (text2, '' '', '''') nd2,  -- virtual column to eliminate white spaces in text2 column 
                        text2 text2');
                    ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
                    ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&'''); --translating special characters as white space using lexer
                    ctx_ddl.create_section_group ('test_sg', 'basic_section_group'); -- creating section group to search within sections.
                    ctx_ddl.add_field_section ('test_sg', 'text1', 'text1', true);
                    ctx_ddl.add_field_section ('test_sg', 'nd1', 'nd1', true);
                    ctx_ddl.add_field_section ('test_sg', 'text2', 'text2', true);
                    ctx_ddl.add_field_section ('test_sg', 'nd2', 'nd2', true);
                     ctx_ddl.create_preference('SUBSTRING_PREF', 'BASIC_WORDLIST');
                    ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
                     ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_INDEX','TRUE');
                     ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_MIN_LENGTH', '3');
                     ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_MAX_LENGTH', '8');
                     ctx_ddl.create_preference('textstore', 'BASIC_STORAGE');
                    ctx_ddl.set_attribute('textstore', 'I_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
                    ctx_ddl.set_attribute('textstore', 'K_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
                    ctx_ddl.set_attribute('textstore', 'R_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
                    ctx_ddl.set_attribute('textstore', 'N_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
                    ctx_ddl.set_attribute('textstore', 'I_INDEX_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
                    ctx_ddl.set_attribute('textstore', 'P_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
                end;
                
                SQL>  create index IX_test_sh4  on test_sh4 (text3)
                  2  indextype is ctxsys.context
                  3  parameters
                  4      ('datastore test_mcd
                  5        lexer          test_lex1
                  6        section group  test_sg
                  7       wordlist SUBSTRING_PREF MEMORY 50M
                  8       SYNC ( ON COMMIT)
                  9       storage textstore'
                 10       );
                
                
                
                SQL> select count(*) from (
                 SELECT SCORE(1) score,t.* FROM test_sh4 t
                WHERE CONTAINS (text3, 'ORACLE S' ,1) > 1
                order by score desc
                )  2    3    4    5
                  6  /
                
                  COUNT(*)
                ----------
                         1
                
                I was thinking that combination of lexer preference and virtual column nd1 when indexed will take care of following  part. But it is not 
                
                SQL> select count(*) from (
                 SELECT SCORE(1) score,t.* FROM test_sh4 t
                WHERE CONTAINS (text3, 'ORACLES' ,1) > 1
                order by score desc
                )  2    3    4    5
                  6  /
                
                  COUNT(*)
                ----------
                         0
                
                
                I had to correct the code to 
                
                    ctx_ddl.set_attribute
                      ('test_mcd',
                       'columns',
                       'regexp_replace(text1,''[[:punct:]*|[[:space:]*]'') nd1,
                        text1 text1,  
                        regexp_replace(text2,''[[:punct:]*|[[:space:]*]'') nd2,  
                        text2 text2');
                
                Now it is returning rows as expected.
                The requirement was to consider any special characters as white spaces, Can we do that in Lexer preference as the logic we are performing iin virtual columns nd1 and nd2.
                • 5. Re: single quote in basic lexer preference
                  Barbara Boehmer
                  It looks like the replaces for the multi_column_datastore are applied before the whitespaces for the lexer, so I think you have found the best solution that you can.
                  • 6. Re: single quote in basic lexer preference
                    876250
                    Thanks Barbara.
                    • 7. Re: single quote in basic lexer preference
                      Roger Ford-Oracle
                      Sorry, I've come late to this discussion, but all non-alphunumeric characters are treated as break characters by default. Break characters and whitespace are pretty much synonymous, so it should work by default.

                      Is that actually what you want? Or do you want all punctuation removed so that "oracle's" is indexed as "oracles"?

                      Bear in mind that "s" is normally a stopword, which might confuse your tests.