This discussion is archived
2 Replies Latest reply: Feb 11, 2013 5:51 AM by orausern RSS

Is Oracle Text the right solution for this need of a specific search!

orausern Explorer
Currently Being Moderated
Hi ,

We are on Oracle 11.2.0.2 on Solaris 10. We have the need to be able to do search on data that are having diacritical marks and we should be able to do the serach ignoring this diacritical marks. That is the requirement. Now I got to hear that Oracle Text has a preference called BASIC_LEXER which can bypass the diacritical marks and so solely due to this feature I implemented Oracle Text and just for this diacritical search and no other need.

I mean I set up preference like this:
  ctxsys.ctx_ddl.create_preference ('cust_lexer', 'BASIC_LEXER');
  ctxsys.ctx_ddl.set_attribute ('cust_lexer', 'base_letter', 'YES'); -- removes diacritics


With this I set up like this:

CREATE TABLE TEXT_TEST
(
  NAME  VARCHAR2(255 BYTE)
);

--created Oracle Text index

CREATE INDEX TEXT_TEST_IDX1 ON TEXT_TEST
(NAME)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
--sample data to illustrate the problem

Insert into TEXT_TEST
   (NAME)
 Values
   ('muller');
Insert into TEXT_TEST
   (NAME)
 Values
   ('müller');
Insert into TEXT_TEST
   (NAME)
 Values
   ('MULLER');
Insert into TEXT_TEST
   (NAME)
 Values
   ('MÜLLER');
Insert into TEXT_TEST
   (NAME)
 Values
   ('PAUL HERNANDEZ');
Insert into TEXT_TEST
   (NAME)
 Values
   ('CHRISTOPHER Phil');
COMMIT;

--Now there is an alternative solution that is there,  instead of thee Oracle Text which is just a plain function given below (and it seems to work neat for my simple need of removing diacritical characters effect in search)
--I need to evaluate which is better given my specific needs -the function below or Oracle Text.

CREATE OR REPLACE FUNCTION remove_dia(p_value IN VARCHAR2, p_doUpper IN VARCHAR2 := 'Y')
RETURN VARCHAR2 DETERMINISTIC
IS
OUTPUT_STR VARCHAR2(4000);
begin
IF (p_doUpper = 'Y') THEN
   OUTPUT_STR := UPPER(p_value);
ELSE
   OUTPUT_STR := p_value;
END IF;

OUTPUT_STR := TRANSLATE(OUTPUT_STR,'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ', 'AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy');

RETURN (OUTPUT_STR);
end;
/



--now I query for which name stats with  a P%:
--Below query gets me unexpected result of one row as I am using Oracle Text where each word is parsed for search using CONTAINS...
SQL> select * from text_test where contains(name,'P%')>0;

NAME
--------------------------------------------------------------------------------
PAUL HERNANDEZ
CHRISTOPHER Phil

--Below query gets me the right and expected result of one row...
SQL> select * from text_test where name like 'P%';

NAME
--------------------------------------------------------------------------------
PAUL HERNANDEZ


--Below query gets me the right and expected result of one row...
SQL>  select * from text_test where remove_dia(name) like remove_dia('P%');

NAME
--------------------------------------------------------------------------------
PAUL HERNANDEZ
My entire need was only to be able to do a search that bypasses diacritical characters. To implement Oracle Text for that reason, I am wondering if that was the right choice! More so when I am now finding that the functionality of LIKE is not available in Oracle Text - the Oracle text search are based on tokens or words and they are different from output of the LIKE operator. So may be should I have just used a simple function like below and used that for my purpose instead of using Oracle Text:

This function (remove_dia) just removes the diacritical characters and may be for my need this is all that is needed. Can someone help to review that given my need I am better of not using Oracle Text? I need to continue using the functionality of Like operator and also need to bypass diacritical characters so the simple function that I have meets my need whereas Oracle Text causes a change in behaviour of search queries.

Thanks,
OrauserN
  • 1. Re: Is Oracle Text the right solution for this need of a specific search!
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    If all you need is LIKE functionality and you do not need any of the complex search capabilities of Oracle Text, then I would not use Oracle Text. I would create a function-based index on your name column that uses your function that removes the diacritical marks, so that your searches will be faster. Please see the demonstration below.
    SCOTT@orcl_11gR2> CREATE TABLE TEXT_TEST
      2    (NAME  VARCHAR2(255 BYTE))
      3  /
    
    Table created.
    
    SCOTT@orcl_11gR2> Insert all
      2  into TEXT_TEST (NAME) Values ('muller')
      3  into TEXT_TEST (NAME) Values ('müller')
      4  into TEXT_TEST (NAME) Values ('MULLER')
      5  into TEXT_TEST (NAME) Values ('MÜLLER')
      6  into TEXT_TEST (NAME) Values ('PAUL HERNANDEZ')
      7  into TEXT_TEST (NAME) Values ('CHRISTOPHER Phil')
      8  select * from dual
      9  /
    
    6 rows created.
    
    SCOTT@orcl_11gR2> CREATE OR REPLACE FUNCTION remove_dia
      2    (p_value   IN VARCHAR2,
      3       p_doUpper IN VARCHAR2 := 'Y')
      4    RETURN VARCHAR2 DETERMINISTIC
      5  IS
      6    OUTPUT_STR VARCHAR2(4000);
      7  begin
      8    IF (p_doUpper = 'Y') THEN
      9        OUTPUT_STR := UPPER(p_value);
     10    ELSE
     11        OUTPUT_STR := p_value;
     12    END IF;
     13    RETURN
     14        TRANSLATE
     15          (OUTPUT_STR,
     16           'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ',
     17           'AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy');
     18  end;
     19  /
    
    Function created.
    
    SCOTT@orcl_11gR2> show errors
    No errors.
    SCOTT@orcl_11gR2> CREATE INDEX text_test_remove_dia_name
      2  ON text_test (remove_dia (name))
      3  /
    
    Index created.
    
    SCOTT@orcl_11gR2> set autotrace on explain
    SCOTT@orcl_11gR2> select * from text_test
      2  where  remove_dia (name) like remove_dia ('mü%')
      3  /
    
    NAME
    ------------------------------------------------------------------------------------------------------------------------
    muller
    müller
    MULLER
    MÜLLER
    
    4 rows selected.
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3139591283
    
    ---------------------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                           |     1 |  2131 |     2   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| TEXT_TEST                 |     1 |  2131 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | TEXT_TEST_REMOVE_DIA_NAME |     1 |       |     1   (0)| 00:00:01 |
    ---------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('mü%'))
           filter("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('mü%'))
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    SCOTT@orcl_11gR2> select * from text_test
      2  where  remove_dia (name) like remove_dia ('P%')
      3  /
    
    NAME
    ------------------------------------------------------------------------------------------------------------------------
    PAUL HERNANDEZ
    
    1 row selected.
    
    
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 3139591283
    
    ---------------------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |                           |     1 |  2131 |     2   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| TEXT_TEST                 |     1 |  2131 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | TEXT_TEST_REMOVE_DIA_NAME |     1 |       |     1   (0)| 00:00:01 |
    ---------------------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('P%'))
           filter("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('P%'))
    
    Note
    -----
       - dynamic sampling used for this statement (level=2)
    
    SCOTT@orcl_11gR2>

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points