6 Replies Latest reply: May 23, 2012 2:55 PM by 876250 RSS

    Dealing with alternate spelling in English

    876250
      Hi Gurus,

      How can we retrieve words like Colur and Labour when we use color and labor as search terms other than using stem operator.

      Thanks,
      Vikas
        • 1. Re: Dealing with alternate spelling in English
          BSalesRashid
          You mean in a select statement ?
          • 2. Re: Dealing with alternate spelling in English
            876250
            Please find code below.
            -- creating a table 
            create table test_sh4 (text1 clob,text2 clob,text3 clob);
             
            -- inserting some data into table
            INSERT INTO TEST_SH4 VALUES ('colour','colour  labour','y');
            INSERT INTO TEST_SH4 VALUES ('color','color  labor','y');
            commit;
             
            -- creating preferences 
             
            begin
               ctx_ddl.create_preference ('test_mcd', 'multi_column_datastore'); -- utilizing the same index for multiple columns
                ctx_ddl.set_attribute
                  ('test_mcd',
                   'columns',
                   'regexp_replace(text1,''[[:punct:]*|[[:space:]*]'') nd1, -- virtual column to eliminate white spaces and punctuation marks  in text1 column 
                    text1 text1,  
                    regexp_replace(text2,''[[:punct:]*|[[:space:]*]'') nd2,  -- virtual column to eliminate white spaces and punctuation marks   in text2 column 
                    text2 text2');
                ctx_ddl.create_preference ('test_lex1', 'basic_lexer');
                ctx_ddl.set_attribute ('test_lex1', 'whitespace', '/\|-_+&'); --translating special characters as white space using lexer
                ctx_ddl.create_section_group ('test_sg', 'basic_section_group'); -- creating section group to search within sections.
                ctx_ddl.add_field_section ('test_sg', 'text1', 'text1', true);
                ctx_ddl.add_field_section ('test_sg', 'nd1', 'nd1', true);
                ctx_ddl.add_field_section ('test_sg', 'text2', 'text2', true);
                ctx_ddl.add_field_section ('test_sg', 'nd2', 'nd2', true);
                 ctx_ddl.create_preference('SUBSTRING_PREF', 'BASIC_WORDLIST');
                ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
                 ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_INDEX','TRUE');
                 ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_MIN_LENGTH', '3');
                 ctx_ddl.set_attribute('SUBSTRING_PREF','PREFIX_MAX_LENGTH', '8');
                 ctx_ddl.create_preference('textstore', 'BASIC_STORAGE');
                ctx_ddl.set_attribute('textstore', 'I_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K next 64K)');
                ctx_ddl.set_attribute('textstore', 'K_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K next 64K)');
                 ctx_ddl.set_attribute('textstore', 'R_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K) lob (data) store as (cache)');
                ctx_ddl.set_attribute('textstore', 'N_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K)');
                ctx_ddl.set_attribute('textstore', 'I_INDEX_CLAUSE','tablespace CAPT_INDEX storage (initial 64K) compress 2');
                ctx_ddl.set_attribute('textstore', 'P_TABLE_CLAUSE','tablespace CAPT_DATA storage (initial 64K)');
            end;
            / 
             
             
            create index capone.IX_test_sh4  on test_sh4 (text3)
            indextype is ctxsys.context
            parameters
                ('datastore test_mcd
                  lexer          test_lex1
                  section group  test_sg  
                   wordlist SUBSTRING_PREF  
                   SYNC ( ON COMMIT)
                   storage textstore
                   MEMORY 50M '
                   );
                   
             -- In normal scenario colour doesn't come up
             
            SQL> set lines 100;
            SQL>    SELECT SCORE(1) SCORE, T.*
              2   FROM   test_sh4 t
              3   WHERE  CONTAINS
              4           (text3,
              5            '<query>
              6               <textquery>
              7                 <progression>
              8                   <seq> color within text2</seq>
              9                 </progression>
             10               </textquery>
             11               <score datatype="FLOAT" algorithm="default"/>
             12             </query>',
             13            1) > 0
             14   ORDER  BY score(1) DESC;
            
            
                 SCORE TEXT1                TEXT2                T
            ---------- -------------------- -------------------- -
                3.9031 color                color  labor         y
            
                 
             -- If we use stem operator it comes up but it will also retrun lot of other irelevant results
                 
            SQL>    SELECT SCORE(1) SCORE, T.*
              2   FROM   test_sh4 t
             WHERE  CONTAINS
              3    4              (text3,
              5            '<query>
              6               <textquery>
              7                 <progression>
              8                   <seq> $color within text2</seq>
              9                 </progression>
             10               </textquery>
             11               <score datatype="FLOAT" algorithm="default"/>
             12             </query>',
             13            1) > 0
             14   ORDER  BY score(1) DESC;
            
                 SCORE TEXT1                TEXT2                T
            ---------- -------------------- -------------------- -
                3.9031 colour               colour  labour       y
                3.9031 color                color  labor         y
                 
                 Is there any way apart from stem operators to get these words.
            • 3. Re: Dealing with alternate spelling in English
              Roger Ford-Oracle
              There's no direct support for alternate spelling in English. You could use the fuzzy operator, which would probably come up with some false hits, or you could define COLOUR and COLOR as synonyms, and use the syn() operator to find either spelling.

              We do provide alternate spelling support for German. Germany through a process of spelling reform rather more recently than the reforms which Noah Webster inconsistently applied to American English. In theory the same mechanism could be used, and I raised an enhancement request to this effect some years back, but as yet it hasn't made it into the product.
              • 5. Re: Dealing with alternate spelling in English
                Barbara Boehmer
                Please see the following discussion of the same problem, that contains some examples.

                Ignore spelling differences?
                • 6. Re: Dealing with alternate spelling in English
                  876250
                  Thanks Barbara.

                  Thanks,
                  Vikas