2 Replies Latest reply: Feb 11, 2013 7:51 AM by orausern RSS

    Is Oracle Text the right solution for this need of a specific search!

    orausern
      Hi ,

      We are on Oracle 11.2.0.2 on Solaris 10. We have the need to be able to do search on data that are having diacritical marks and we should be able to do the serach ignoring this diacritical marks. That is the requirement. Now I got to hear that Oracle Text has a preference called BASIC_LEXER which can bypass the diacritical marks and so solely due to this feature I implemented Oracle Text and just for this diacritical search and no other need.

      I mean I set up preference like this:
        ctxsys.ctx_ddl.create_preference ('cust_lexer', 'BASIC_LEXER');
        ctxsys.ctx_ddl.set_attribute ('cust_lexer', 'base_letter', 'YES'); -- removes diacritics
      
      
      With this I set up like this:
      
      CREATE TABLE TEXT_TEST
      (
        NAME  VARCHAR2(255 BYTE)
      );
      
      --created Oracle Text index
      
      CREATE INDEX TEXT_TEST_IDX1 ON TEXT_TEST
      (NAME)
      INDEXTYPE IS CTXSYS.CONTEXT
      PARAMETERS('LEXER cust_lexer WORDLIST cust_wl SYNC (ON COMMIT)');
      --sample data to illustrate the problem
      
      Insert into TEXT_TEST
         (NAME)
       Values
         ('muller');
      Insert into TEXT_TEST
         (NAME)
       Values
         ('müller');
      Insert into TEXT_TEST
         (NAME)
       Values
         ('MULLER');
      Insert into TEXT_TEST
         (NAME)
       Values
         ('MÜLLER');
      Insert into TEXT_TEST
         (NAME)
       Values
         ('PAUL HERNANDEZ');
      Insert into TEXT_TEST
         (NAME)
       Values
         ('CHRISTOPHER Phil');
      COMMIT;
      
      --Now there is an alternative solution that is there,  instead of thee Oracle Text which is just a plain function given below (and it seems to work neat for my simple need of removing diacritical characters effect in search)
      --I need to evaluate which is better given my specific needs -the function below or Oracle Text.
      
      CREATE OR REPLACE FUNCTION remove_dia(p_value IN VARCHAR2, p_doUpper IN VARCHAR2 := 'Y')
      RETURN VARCHAR2 DETERMINISTIC
      IS
      OUTPUT_STR VARCHAR2(4000);
      begin
      IF (p_doUpper = 'Y') THEN
         OUTPUT_STR := UPPER(p_value);
      ELSE
         OUTPUT_STR := p_value;
      END IF;
      
      OUTPUT_STR := TRANSLATE(OUTPUT_STR,'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ', 'AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy');
      
      RETURN (OUTPUT_STR);
      end;
      /
      
      
      
      --now I query for which name stats with  a P%:
      --Below query gets me unexpected result of one row as I am using Oracle Text where each word is parsed for search using CONTAINS...
      SQL> select * from text_test where contains(name,'P%')>0;
      
      NAME
      --------------------------------------------------------------------------------
      PAUL HERNANDEZ
      CHRISTOPHER Phil
      
      --Below query gets me the right and expected result of one row...
      SQL> select * from text_test where name like 'P%';
      
      NAME
      --------------------------------------------------------------------------------
      PAUL HERNANDEZ
      
      
      --Below query gets me the right and expected result of one row...
      SQL>  select * from text_test where remove_dia(name) like remove_dia('P%');
      
      NAME
      --------------------------------------------------------------------------------
      PAUL HERNANDEZ
      My entire need was only to be able to do a search that bypasses diacritical characters. To implement Oracle Text for that reason, I am wondering if that was the right choice! More so when I am now finding that the functionality of LIKE is not available in Oracle Text - the Oracle text search are based on tokens or words and they are different from output of the LIKE operator. So may be should I have just used a simple function like below and used that for my purpose instead of using Oracle Text:

      This function (remove_dia) just removes the diacritical characters and may be for my need this is all that is needed. Can someone help to review that given my need I am better of not using Oracle Text? I need to continue using the functionality of Like operator and also need to bypass diacritical characters so the simple function that I have meets my need whereas Oracle Text causes a change in behaviour of search queries.

      Thanks,
      OrauserN
        • 1. Re: Is Oracle Text the right solution for this need of a specific search!
          Barbara Boehmer
          If all you need is LIKE functionality and you do not need any of the complex search capabilities of Oracle Text, then I would not use Oracle Text. I would create a function-based index on your name column that uses your function that removes the diacritical marks, so that your searches will be faster. Please see the demonstration below.
          SCOTT@orcl_11gR2> CREATE TABLE TEXT_TEST
            2    (NAME  VARCHAR2(255 BYTE))
            3  /
          
          Table created.
          
          SCOTT@orcl_11gR2> Insert all
            2  into TEXT_TEST (NAME) Values ('muller')
            3  into TEXT_TEST (NAME) Values ('müller')
            4  into TEXT_TEST (NAME) Values ('MULLER')
            5  into TEXT_TEST (NAME) Values ('MÜLLER')
            6  into TEXT_TEST (NAME) Values ('PAUL HERNANDEZ')
            7  into TEXT_TEST (NAME) Values ('CHRISTOPHER Phil')
            8  select * from dual
            9  /
          
          6 rows created.
          
          SCOTT@orcl_11gR2> CREATE OR REPLACE FUNCTION remove_dia
            2    (p_value   IN VARCHAR2,
            3       p_doUpper IN VARCHAR2 := 'Y')
            4    RETURN VARCHAR2 DETERMINISTIC
            5  IS
            6    OUTPUT_STR VARCHAR2(4000);
            7  begin
            8    IF (p_doUpper = 'Y') THEN
            9        OUTPUT_STR := UPPER(p_value);
           10    ELSE
           11        OUTPUT_STR := p_value;
           12    END IF;
           13    RETURN
           14        TRANSLATE
           15          (OUTPUT_STR,
           16           'ÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝàáâãäåçèéêëìíîïñòóôõöøùúûüýÿ',
           17           'AAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy');
           18  end;
           19  /
          
          Function created.
          
          SCOTT@orcl_11gR2> show errors
          No errors.
          SCOTT@orcl_11gR2> CREATE INDEX text_test_remove_dia_name
            2  ON text_test (remove_dia (name))
            3  /
          
          Index created.
          
          SCOTT@orcl_11gR2> set autotrace on explain
          SCOTT@orcl_11gR2> select * from text_test
            2  where  remove_dia (name) like remove_dia ('mü%')
            3  /
          
          NAME
          ------------------------------------------------------------------------------------------------------------------------
          muller
          müller
          MULLER
          MÜLLER
          
          4 rows selected.
          
          
          Execution Plan
          ----------------------------------------------------------
          Plan hash value: 3139591283
          
          ---------------------------------------------------------------------------------------------------------
          | Id  | Operation                   | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
          ---------------------------------------------------------------------------------------------------------
          |   0 | SELECT STATEMENT            |                           |     1 |  2131 |     2   (0)| 00:00:01 |
          |   1 |  TABLE ACCESS BY INDEX ROWID| TEXT_TEST                 |     1 |  2131 |     2   (0)| 00:00:01 |
          |*  2 |   INDEX RANGE SCAN          | TEXT_TEST_REMOVE_DIA_NAME |     1 |       |     1   (0)| 00:00:01 |
          ---------------------------------------------------------------------------------------------------------
          
          Predicate Information (identified by operation id):
          ---------------------------------------------------
          
             2 - access("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('mü%'))
                 filter("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('mü%'))
          
          Note
          -----
             - dynamic sampling used for this statement (level=2)
          
          SCOTT@orcl_11gR2> select * from text_test
            2  where  remove_dia (name) like remove_dia ('P%')
            3  /
          
          NAME
          ------------------------------------------------------------------------------------------------------------------------
          PAUL HERNANDEZ
          
          1 row selected.
          
          
          Execution Plan
          ----------------------------------------------------------
          Plan hash value: 3139591283
          
          ---------------------------------------------------------------------------------------------------------
          | Id  | Operation                   | Name                      | Rows  | Bytes | Cost (%CPU)| Time     |
          ---------------------------------------------------------------------------------------------------------
          |   0 | SELECT STATEMENT            |                           |     1 |  2131 |     2   (0)| 00:00:01 |
          |   1 |  TABLE ACCESS BY INDEX ROWID| TEXT_TEST                 |     1 |  2131 |     2   (0)| 00:00:01 |
          |*  2 |   INDEX RANGE SCAN          | TEXT_TEST_REMOVE_DIA_NAME |     1 |       |     1   (0)| 00:00:01 |
          ---------------------------------------------------------------------------------------------------------
          
          Predicate Information (identified by operation id):
          ---------------------------------------------------
          
             2 - access("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('P%'))
                 filter("SCOTT"."REMOVE_DIA"("NAME") LIKE "REMOVE_DIA"('P%'))
          
          Note
          -----
             - dynamic sampling used for this statement (level=2)
          
          SCOTT@orcl_11gR2>