This discussion is archived
7 Replies Latest reply: May 8, 2012 5:55 AM by Roger Ford RSS

single quote in basic lexer preference

876250 Newbie
Currently Being Moderated
Hi Gurus,

Using basic_lexer preference I wanted to consider columns having single quotes *'* also as white space. Please help me to achieve it.
Original Code
-------------------------
    ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
    ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&');

I tried using q operator

SQL>  begin
 ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
 ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|\'-_+&q'');\
 end;  2    3    4
  5  /
Enter value for q:


I also tried 

SQL>
SQL>  begin
 ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
 ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&\'');
 end;  2    3    4
  5  /
ERROR:
ORA-01756: quoted string not properly terminated

I tried this but it is not indexing as intended.

SQL>  begin
 ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
 ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&'''); --translating special characters as white space using lexer
 end;  2    3    4
  5  /

PL/SQL procedure successfully completed
please help me here.
  • 1. Re: single quote in basic lexer preference
    876250 Newbie
    Currently Being Moderated
    Also is there any way to convey to lexer saying consider only Alphanumeric characters and consider any other characters as *"whitespace"*
  • 2. Re: single quote in basic lexer preference
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    It looks like the last method works, as shown below. Note that if you are trying to search for a single letter to the right of the quote mark, that such individual letters are stop words, unless you use an empty stoplist or a user-defined stoplist that does not include them. If this does not solve your problem, then please post a test case, as I have done below, that includes table structure, sample data, index creation, and query that demonstrates the problem.
    SCOTT@orcl_11gR2> create table test_tab (test_col  clob)
      2  /
    
    Table created.
    
    SCOTT@orcl_11gR2> insert all
      2  into test_tab (test_col) values ('It isn''t Oracle''s fault.')
      3  into test_tab (test_col) values ('other data without quotes')
      4  select * from dual
      5  /
    
    2 rows created.
    
    SCOTT@orcl_11gR2> select * from test_tab
      2  /
    
    TEST_COL
    --------------------------------------------------------------------------------
    It isn't Oracle's fault.
    other data without quotes
    
    2 rows selected.
    
    SCOTT@orcl_11gR2> begin
      2    ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
      3    ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&''');
      4  end;
      5  /
    
    PL/SQL procedure successfully completed.
    
    SCOTT@orcl_11gR2> create index test_idx on test_tab (test_col)
      2  indextype is ctxsys.context
      3  parameters
      4    ('lexer        test_lex1
      5        stoplist  ctxsys.empty_stoplist')
      6  /
    
    Index created.
    
    SCOTT@orcl_11gR2> select token_text from dr$test_idx$i
      2  /
    
    TOKEN_TEXT
    ----------------------------------------------------------------
    DATA
    FAULT
    ISN
    IT
    ORACLE
    OTHER
    QUOTES
    S
    T
    WITHOUT
    
    10 rows selected.
    
    SCOTT@orcl_11gR2> select * from test_tab where contains (test_col, 'isn t') > 0
      2  /
    
    TEST_COL
    --------------------------------------------------------------------------------
    It isn't Oracle's fault.
    
    1 row selected.
    
    SCOTT@orcl_11gR2> select * from test_tab where contains (test_col, 'Oracle s') > 0
      2  /
    
    TEST_COL
    --------------------------------------------------------------------------------
    It isn't Oracle's fault.
    
    1 row selected.
  • 3. Re: single quote in basic lexer preference
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    873247 wrote:
    Also is there any way to convey to lexer saying consider only Alphanumeric characters and consider any other characters as *"whitespace"*
    Just by declaring all of the non-alphanumeric characters as whitespace. There may be some overriding conflicts where, by default, such things have other meanings that may need to be modified as well, such as punctuations.
  • 4. Re: single quote in basic lexer preference
    876250 Newbie
    Currently Being Moderated
    Thanks a lot Barabara for quick reply.


    Let me paste my code here.
    create table test_sh4 (text1 clob,text2 clob,text3 clob);
     
    
    insert  into test_sh4  values ('ORACLE''s','It isn''t Oracle''s fault.','y');
    
    
     
    begin
       ctx_ddl.create_preference ('test_mcd', 'multi_column_datastore'); -- utilizing the same index for multiple columns
        ctx_ddl.set_attribute
          ('test_mcd',
           'columns',
           'replace (text1, '' '', '''') nd1, -- virtual column to eliminate white spaces in text1 column 
            text1 text1,  
            replace (text2, '' '', '''') nd2,  -- virtual column to eliminate white spaces in text2 column 
            text2 text2');
        ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
        ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&'''); --translating special characters as white space using lexer
        ctx_ddl.create_section_group ('test_sg', 'basic_section_group'); -- creating section group to search within sections.
        ctx_ddl.add_field_section ('test_sg', 'text1', 'text1', true);
        ctx_ddl.add_field_section ('test_sg', 'nd1', 'nd1', true);
        ctx_ddl.add_field_section ('test_sg', 'text2', 'text2', true);
        ctx_ddl.add_field_section ('test_sg', 'nd2', 'nd2', true);
         ctx_ddl.create_preference('SUBSTRING_PREF', 'BASIC_WORDLIST');
        ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
         ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_INDEX','TRUE');
         ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_MIN_LENGTH', '3');
         ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_MAX_LENGTH', '8');
         ctx_ddl.create_preference('textstore', 'BASIC_STORAGE');
        ctx_ddl.set_attribute('textstore', 'I_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
        ctx_ddl.set_attribute('textstore', 'K_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
        ctx_ddl.set_attribute('textstore', 'R_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
        ctx_ddl.set_attribute('textstore', 'N_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
        ctx_ddl.set_attribute('textstore', 'I_INDEX_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
        ctx_ddl.set_attribute('textstore', 'P_TABLE_CLAUSE','tablespace TEST_INDEX storage (initial 64K)');
    end;
    
    SQL>  create index IX_test_sh4  on test_sh4 (text3)
      2  indextype is ctxsys.context
      3  parameters
      4      ('datastore test_mcd
      5        lexer          test_lex1
      6        section group  test_sg
      7       wordlist SUBSTRING_PREF MEMORY 50M
      8       SYNC ( ON COMMIT)
      9       storage textstore'
     10       );
    
    
    
    SQL> select count(*) from (
     SELECT SCORE(1) score,t.* FROM test_sh4 t
    WHERE CONTAINS (text3, 'ORACLE S' ,1) > 1
    order by score desc
    )  2    3    4    5
      6  /
    
      COUNT(*)
    ----------
             1
    
    I was thinking that combination of lexer preference and virtual column nd1 when indexed will take care of following  part. But it is not 
    
    SQL> select count(*) from (
     SELECT SCORE(1) score,t.* FROM test_sh4 t
    WHERE CONTAINS (text3, 'ORACLES' ,1) > 1
    order by score desc
    )  2    3    4    5
      6  /
    
      COUNT(*)
    ----------
             0
    
    
    I had to correct the code to 
    
        ctx_ddl.set_attribute
          ('test_mcd',
           'columns',
           'regexp_replace(text1,''[[:punct:]*|[[:space:]*]'') nd1,
            text1 text1,  
            regexp_replace(text2,''[[:punct:]*|[[:space:]*]'') nd2,  
            text2 text2');
    
    Now it is returning rows as expected.
    The requirement was to consider any special characters as white spaces, Can we do that in Lexer preference as the logic we are performing iin virtual columns nd1 and nd2.
  • 5. Re: single quote in basic lexer preference
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    It looks like the replaces for the multi_column_datastore are applied before the whitespaces for the lexer, so I think you have found the best solution that you can.
  • 6. Re: single quote in basic lexer preference
    876250 Newbie
    Currently Being Moderated
    Thanks Barbara.
  • 7. Re: single quote in basic lexer preference
    Roger Ford Expert
    Currently Being Moderated
    Sorry, I've come late to this discussion, but all non-alphunumeric characters are treated as break characters by default. Break characters and whitespace are pretty much synonymous, so it should work by default.

    Is that actually what you want? Or do you want all punctuation removed so that "oracle's" is indexed as "oracles"?

    Bear in mind that "s" is normally a stopword, which might confuse your tests.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points