This discussion is archived
6 Replies Latest reply: Jul 20, 2012 1:46 PM by Sky13 RSS

Oracle Text Query of abbreviated word / name

Richard C Evans Newbie
Currently Being Moderated
I'm new to Oracle Text so please excuse the (probably) simple question. I want to be able to create a search that excludes (includes?) special characters and/or spaces between an abbreviated name. I'm not sure if it's possible but I would like to be able to return all of the below results if someone queried for "ABC" in one form or another.

Would this be something I'd add to a thesaurus? I see there is a STOPLIST but I'm not sure if there is the opposite of a stoplist.

Thanks in advance!

Regards,
Rich

set def off;

drop table docs;

CREATE TABLE docs (id NUMBER PRIMARY KEY, text VARCHAR2(200));

INSERT INTO docs VALUES(1, 'ABC are my favorite letters.');
INSERT INTO docs VALUES(2, 'My favorite letters are A,B,C');
INSERT INTO docs VALUES(3, 'The best letters are A.B.C.');
INSERT INTO docs VALUES(4, 'Three of the word letters are A-B-C.');
INSERT INTO docs VALUES(5, 'A B C are great letters.');
INSERT INTO docs VALUES(6, 'AB and C are easy letters to remember');
INSERT INTO docs VALUES(7, 'What if we used A, B, & C?');

commit;

begin
ctx_ddl.drop_preference('english_lexar');
end;
/

begin
ctx_ddl.create_preference('english_lexar', 'BASIC_LEXER');
ctx_ddl.set_attribute('english_lexar', 'printjoins', '_-');
ctx_ddl.set_attribute('english_lexar', 'skipjoins', '-.');
--ctx_ddl.set_attribute ( 'english_lexar', 'index_themes', 'YES');
ctx_ddl.set_attribute ( 'english_lexar', 'index_text', 'YES'); 
ctx_ddl.set_attribute ( 'english_lexar', 'index_stems', 'SPANISH'); 
ctx_ddl.set_attribute ( 'english_lexar', 'mixed_case', 'YES');
ctx_ddl.set_attribute ( 'english_lexar', 'base_letter', 'YES');
end;
/

begin
 ctx_ddl.drop_preference('STEM_FUZZY_PREF');
end;
/

begin 
  ctx_ddl.create_preference('STEM_FUZZY_PREF', 'BASIC_WORDLIST'); 
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_MATCH','ENGLISH');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_SCORE','0');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','FUZZY_NUMRESULTS','5000');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','SUBSTRING_INDEX','TRUE');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','PREFIX_INDEX','TRUE');
  ctx_ddl.set_attribute('STEM_FUZZY_PREF','STEMMER','ENGLISH');
end; 
/

begin
 ctx_ddl.drop_preference('wildcard_pref');
end;
/

begin 
    Ctx_Ddl.create_Preference('wildcard_pref', 'BASIC_WORDLIST'); 
    ctx_ddl.set_attribute('wildcard_pref', 'wildcard_maxterms', 100) ;
end; 
/

DROP index myindex;

create index myindex on docs (text) 
  indextype is ctxsys.context 
  parameters ( 'LEXER english_lexar Wordlist wildcard_pref' ); 

EXEC CTX_DDL.SYNC_INDEX('myindex', '2M');

SELECT SCORE(1), id, text FROM docs WHERE CONTAINS(text, 'ABC', 1) > 0;
It may be that my SQL statement isn't taking advantage of the Text options -- i.e. I'm forgetting something obvious :)
  • 1. Re: Oracle Text Query of abbreviated word / name
    Roger Ford Expert
    Currently Being Moderated
    Oracle Text indexes are "word based". You can change what defines a "word" by setting PRINTJOINS, SKIPJOINS etc, but you will never get ABC to match A B C. You can choose to make "A-B-C" match either "ABC" or "A B C" but not both.

    The only exception to this is NDATA sections (in 11gR2), designed primarily for searching names, where different segmentation is allowed for example "Marylou De Bon" might well match "Mary Lou DeBon".

    Depending on what you actually want to do, NDATA might help you.
  • 2. Re: Oracle Text Query of abbreviated word / name
    Richard C Evans Newbie
    Currently Being Moderated
    Thank you for the feedback Roger! I'll check out NDATA
  • 3. Re: Oracle Text Query of abbreviated word / name
    Richard C Evans Newbie
    Currently Being Moderated
    I was hoping I could add a word to the list. For example, if we knew our customers often-times wrote A-B-C then we would like to be able to index that to the other "spellings" (A.B.C. ABC, etc.).

    A better example might be WalMart. Some folks may write Wal*Mart because that's what the logo looks like. Others may write Wal-Mart, Walmart, WalMart, WALMART, or even add a space: Wal Mart. I can make it case-insensitive so the WALMART and WalMart are OK but I'm not sure hot to get Wal-Mart, Wal*Mart, and Wal Mart to all return in the search results. It sounds like NDATA may be what I'm looking for?

    Did I explain that well? I'm not sure it's clear in my head yet :-)
  • 4. Re: Oracle Text Query of abbreviated word / name
    Roger Ford Expert
    Currently Being Moderated
    Indexes are case-insensitive by default, so let's ignore that.

    You can make wal-mart and wal*mart match walmart by defining "-" and "*" as SKIPJOINS characters. However, you cannot make wal mart match walmart, other than by using NDATA.

    NDATA does seem to work - any variation of wal mart walmart wal*mart and wal-mart do manage to match both walmart and wal mart. See example:
    SQL> create table testcase (text varchar2(2000));
    Table created.
    
    SQL> insert into testcase values ('<nd>walmart</nd>');
    1 row created.
    
    SQL> insert into testcase values ('<nd>wal mart</nd>');
    1 row created.
    
    SQL> exec ctx_ddl.drop_section_group('tcsg')
    PL/SQL procedure successfully completed.
    
    SQL> exec ctx_ddl.create_section_group('tcsg', 'xml_section_group')
    PL/SQL procedure successfully completed.
    
    SQL> exec ctx_ddl.add_ndata_section('tcsg', 'nd', 'nd')
    PL/SQL procedure successfully completed.
    
    SQL> create index testcase_index on testcase(text)
      2  indextype is ctxsys.context
      3  parameters ('section group tcsg')
      4  /
    Index created.
    
    SQL> select * from testcase where contains (text, 'ndata(nd, wal mart)') > 0;
    
    TEXT
    --------------------------------------------------------------------------------
    <nd>walmart</nd>
    <nd>wal mart</nd>
    
    SQL> select * from testcase where contains (text, 'ndata(nd, wal-mart)') > 0;
    
    TEXT
    --------------------------------------------------------------------------------
    <nd>walmart</nd>
    <nd>wal mart</nd>
    
    SQL> select * from testcase where contains (text, 'ndata(nd, wal*mart)') > 0;
    
    TEXT
    --------------------------------------------------------------------------------
    <nd>walmart</nd>
    <nd>wal mart</nd>
    
    SQL> select * from testcase where contains (text, 'ndata(nd, walmart)') > 0;
    
    TEXT
    --------------------------------------------------------------------------------
    <nd>walmart</nd>
    <nd>wal mart</nd>
    Edited by: Roger Ford on Jun 21, 2012 10:22 AM
  • 5. Re: Oracle Text Query of abbreviated word / name
    Richard C Evans Newbie
    Currently Being Moderated
    Roger, Thanks very much for the prompt follow-up and the example! I'll explore NDATA more.

    Edited by: Richard C Evans on Jun 21, 2012 10:39 AM
  • 6. Re: Oracle Text Query of abbreviated word / name
    Sky13 Newbie
    Currently Being Moderated
    Hi Richard,
    I am new to using "Text" so I am in this forum to post my own question. I was looking at some threads to try and learn a little more so I did not sound so lost. I was reading yours and thought this might help.
    select 'WallMart' as TEXT, soundex('WallMart') as soundex_value from dual
    union all
    select 'WalMart' as TEXT, soundex('WalMart') as soundex_value from dual 
    union all
    select 'Wall-Mart' as TEXT, soundex('Wall-Mart') as soundex_value from dual
    union all
    select 'Wal-Mart' as TEXT, soundex('Wal-Mart') as soundex_value from dual
    union all
    select 'Wall*Mart' as TEXT, soundex('Wall*Mart') as soundex_value from dual
    union all
    select 'Wal*Mart' as TEXT, soundex('Wal*Mart') as soundex_value from dual
    union all
    select 'Wal Mart' as TEXT, soundex('Wal Mart') as soundex_value from dual;
    
    TEXT,SOUNDEX_VALUE
    WallMart,M456
    WalMart,M456
    Wall-Mart,M456
    Wal-Mart,M456
    Wall*Mart,M456
    Wal*Mart,M456
    Wal Mart,M456
    I have used the Soudnex function in Oracle with great success. In the early days I even wrote my own for Paradox, dbase and a few others :)

    It works great with a function based index.

    Edited by: Sky13 on Jul 20, 2012 1:45 PM
    Fixed Mal to Wal

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points