8 Replies Latest reply: Jun 28, 2011 3:53 PM by Roger Ford-Oracle RSS

    ORACLE TEXT INDEX ON VARCHAR2 COLUMN

    693241
      Hello All,
      I find a search in our application very slow so i thought of using ORACLE TEXT CTXCAT index based search but i find certain inconsistencies . How can this be avoided....The following query should not return result if i can replace with oracle text but i find few values....why is that...i have also given few sample results below....

      SELECT first_name
      FROM uc_partner_ms
      WHERE
      Upper(first_name) LIKE '%WIE%'
      minus
      SELECT first_name
      FROM uc_partner_ms
      WHERE CATSEARCH (first_name,'*wie*', null) > 0

      RESULTS ....

      Hans-Werner Mrowiec
      Heinz Oesterwiemann GmbH
      Helmut Froitzheim GmbH, Neuwied
      Heribert Schwies
      Hermann Twieling GmbH & Co. KG
      Horst Breitwieser
      Horst-Dieter Swie

      The script used for creating index is

      begin
      ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
      ctx_ddl.set_attribute ( 'mylex', 'index_themes', 'NO');
      ctx_ddl.set_attribute ( 'mylex', 'mixed_case', 'NO');
      end;

      CREATE INDEX partner_index ON uc_partner_ms (first_name)
      INDEXTYPE IS CTXSYS.CTXCAT
      parameters ( 'LEXER mylex' );


      Where am i wrong i could not guess a trend in the results other than all being in lower case.....
        • 1. Re: ORACLE TEXT INDEX ON VARCHAR2 COLUMN
          Barbara Boehmer
          Catsearch does not support leading wildcards. As a workaround, you can use a query template with context grammar. Please see the reproduction and solution below.
          SCOTT@orcl_11g> -- test environment:
          SCOTT@orcl_11g> CREATE TABLE uc_partner_ms
            2    (first_name  VARCHAR2 (60))
            3  /
          
          Table created.
          
          SCOTT@orcl_11g> SET DEFINE OFF
          SCOTT@orcl_11g> INSERT ALL
            2  INTO uc_partner_ms VALUES ('Hans-Werner Mrowiec')
            3  INTO uc_partner_ms VALUES ('Heinz Oesterwiemann GmbH')
            4  INTO uc_partner_ms VALUES ('Helmut Froitzheim GmbH, Neuwied')
            5  INTO uc_partner_ms VALUES ('Heribert Schwies')
            6  INTO uc_partner_ms VALUES ('Hermann Twieling GmbH & Co. KG')
            7  INTO uc_partner_ms VALUES ('Horst Breitwieser')
            8  INTO uc_partner_ms VALUES ('Horst-Dieter Swie')
            9  SELECT * FROM DUAL
           10  /
          
          7 rows created.
          
          SCOTT@orcl_11g> begin
            2    ctx_ddl.create_preference('mylex', 'BASIC_LEXER');
            3    ctx_ddl.set_attribute ( 'mylex', 'index_themes', 'NO');
            4    ctx_ddl.set_attribute ( 'mylex', 'mixed_case', 'NO');
            5  end;
            6  /
          
          PL/SQL procedure successfully completed.
          
          SCOTT@orcl_11g> CREATE INDEX partner_index ON uc_partner_ms (first_name)
            2  INDEXTYPE IS CTXSYS.CTXCAT
            3  parameters ( 'LEXER mylex' )
            4  /
          
          Index created.
          
          SCOTT@orcl_11g> -- reproduction:
          SCOTT@orcl_11g> SELECT first_name
            2  FROM uc_partner_ms
            3  WHERE
            4  Upper(first_name) LIKE '%WIE%'
            5  minus
            6  SELECT first_name
            7  FROM uc_partner_ms
            8  WHERE CATSEARCH (first_name,'*wie*', null) > 0
            9  /
          
          FIRST_NAME
          ------------------------------------------------------------
          Hans-Werner Mrowiec
          Heinz Oesterwiemann GmbH
          Helmut Froitzheim GmbH, Neuwied
          Heribert Schwies
          Hermann Twieling GmbH & Co. KG
          Horst Breitwieser
          Horst-Dieter Swie
          
          7 rows selected.
          
          SCOTT@orcl_11g> -- solution:
          SCOTT@orcl_11g> SELECT first_name
            2  FROM uc_partner_ms
            3  WHERE
            4  Upper(first_name) LIKE '%WIE%'
            5  minus
            6  SELECT first_name
            7  FROM   uc_partner_ms
            8  WHERE  CATSEARCH
            9             (first_name,
           10              '<query>
           11              <textquery grammar="CONTEXT">
           12                %wie%
           13              </textquery>
           14            </query>',
           15              null) > 0
           16  /
          
          no rows selected
          
          SCOTT@orcl_11g> 
          • 2. Re: ORACLE TEXT INDEX ON VARCHAR2 COLUMN
            693241
            Hello,
            In first place thanks for te solution.
            1) What is the difference <query> tag brings in the query?
            2) Is it a good idea to replace catsearch for like operator with leading %?
            3) Also i have read in 10g the maximum number of rows returned is only 15000. is it true ?

            Thanks
            Vijay G
            • 3. Re: ORACLE TEXT INDEX ON VARCHAR2 COLUMN
              Barbara Boehmer
              1) What is the difference <query> tag brings in the query?
              That is the QUERY TEMPLATE that allows you to use the CONTEXT GRAMMAR on a CTXCAT index as if you were using CONTAINS. There are examples in the documentation here:

              http://download.oracle.com/docs/cd/B19306_01/text.102/b14218/csql.htm#i1000002

              2) Is it a good idea to replace catsearch for like operator with leading %?
              The wildcard character for a CATSEARCH operator on a CTXCAT index is the asterisk. The wildcard character for a CONTAINS operator on a CONTEXT index is percent sign. CATSEARCH does not support leading wildcards, so you can use a word followed by an asterisk, but not preceded by an asterisk. The percent sign is not a valid wildcard with CATSEARCH. However, when you use the query template and specify GRAMMAR="CONTEXT". then it is as if you were using a CONTAINS operator with a CONTEXT index, so you can use a leading wildcard and the percent sign is the wildcard. You can use LIKE with percent signs before and after, but it is likely to be very slow, because it cannot use an index. If usage of leading wildcards is going to be a frequent requirement, then you would probably be better off with a CONTEXT index with a BASIC_WORDLIST with SUBSTRING_INDEX set to true for best speed.

              3) Also i have read in 10g the maximum number of rows returned is only 15000. is it true ?
              No. You were probably reading about the WILDCARD_MAXTERMS limit of 15,000 in 10g. That is the maximum number of distinct tokens (not rows) that a wildcard can match without producing an error. In 11g the maximum is 50,000. The defaults are lower. If you intend to allow a lot of wildcard searches that will match a lot of tokens, then you should probably set the WILDCARD_MAXTERMS to the maximum. However, there will always be some queries that are too general, so you should plan to trap the error and return an understandable message to your user, instructing them that their search is too broad and to enter a more specific search value.
              • 4. Re: ORACLE TEXT INDEX ON VARCHAR2 COLUMN
                693241
                Hello,
                Yeah I have read it.
                i will try implementing the index to maximum extend .
                Thanks for the timely and precious reply.

                Thanks
                Vijay G
                • 5. Re: ORACLE TEXT INDEX ON VARCHAR2 COLUMN
                  868762
                  Why not use SELECT first_name FROM uc_partner_ms WHERE CATSEARCH(first_name, '**wie*', null) > 0
                  ???
                  • 6. Re: ORACLE TEXT INDEX ON VARCHAR2 COLUMN
                    Barbara Boehmer
                    user3267003,

                    I see this is your first post. Welcome to the OTN forums and to the Text sub-forum.

                    The online documentation states that catsearch does not support leading wildcards or "left truncation" as they call it, providing examples that you can search for term* or a*b but not term.  So, I was quite surprised to find that using two leading wildcards as *term does work, as shown in my test below. Is this just a workaround that you figured out or did I miss it somewhere in the documentation?

                    -- test:
                     
                    SCOTT@orcl_11gR2> -- test environment with table, data, and index:
                    SCOTT@orcl_11gR2> CREATE TABLE uc_partner_ms
                      2    (first_name  VARCHAR2 (60))
                      3  /
                    
                    Table created.
                    
                    SCOTT@orcl_11gR2> SET DEFINE OFF
                    SCOTT@orcl_11gR2> INSERT ALL
                      2  INTO uc_partner_ms VALUES ('Hans-Werner Mrowiec')
                      3  INTO uc_partner_ms VALUES ('Heinz Oesterwiemann GmbH')
                      4  INTO uc_partner_ms VALUES ('Helmut Froitzheim GmbH, Neuwied')
                      5  INTO uc_partner_ms VALUES ('Heribert Schwies')
                      6  INTO uc_partner_ms VALUES ('Hermann Twieling GmbH & Co. KG')
                      7  INTO uc_partner_ms VALUES ('Horst Breitwieser')
                      8  INTO uc_partner_ms VALUES ('Horst-Dieter Swie')
                      9  INTO uc_partner_ms VALUES ('other')
                     10  SELECT * FROM DUAL
                     11  /
                    
                    8 rows created.
                    
                    SCOTT@orcl_11gR2> CREATE INDEX partner_index ON uc_partner_ms (first_name)
                      2  INDEXTYPE IS CTXSYS.CTXCAT
                      3  /
                    
                    Index created.
                    
                    SCOTT@orcl_11gR2> -- catsearch does not support search with one leading wildcard:
                    SCOTT@orcl_11gR2> SELECT first_name
                      2  FROM   uc_partner_ms
                      3  WHERE  CATSEARCH (first_name, '*wie*', null) > 0
                      4  /
                    
                    no rows selected
                    
                    SCOTT@orcl_11gR2> -- catsearch with two leading wildcards suprisingly works:
                    SCOTT@orcl_11gR2> SELECT first_name
                      2  FROM   uc_partner_ms
                      3  WHERE  CATSEARCH (first_name, '**wie*', null) > 0
                      4  /
                    
                    FIRST_NAME
                    ------------------------------------------------------------
                    Hans-Werner Mrowiec
                    Heinz Oesterwiemann GmbH
                    Helmut Froitzheim GmbH, Neuwied
                    Heribert Schwies
                    Hermann Twieling GmbH & Co. KG
                    Horst Breitwieser
                    Horst-Dieter Swie
                    
                    7 rows selected.
                    
                    SCOTT@orcl_11gR2>
                    • 7. Re: ORACLE TEXT INDEX ON VARCHAR2 COLUMN
                      871869
                      Hi. I have got some problems with CATSEARCH and need your help.
                      I have got something like that:

                      CREATE TABLE EXAMPLE (ID INTEGER, ZNAK_PISMA VARCHAR2(100));

                      INSERT INTO EXAMPLE VALUES(1, 'Adm-VI-08121/1/08');
                      INSERT INTO EXAMPLE VALUES(2, 'Adm-VI-08121/1/08');
                      INSERT INTO EXAMPLE VALUES(3, 'Adm.VI.08121/23/09');
                      INSERT INTO EXAMPLE VALUES(4, 'Adm.VI.08121/22/09');
                      INSERT INTO EXAMPLE VALUES(5, 'Adm.VI.08121/24/09');
                      INSERT INTO EXAMPLE VALUES(6, 'Adm.VI.08121/24/09');
                      INSERT INTO EXAMPLE VALUES(7, 'Adm.VI.08121/23/09');
                      INSERT INTO EXAMPLE VALUES(8, 'Adm.VI.08121/23/09');
                      INSERT INTO EXAMPLE VALUES(9, 'Adm.VI.08121/7/10');
                      INSERT INTO EXAMPLE VALUES(10, 'Adm.VI.08121/7/10');

                      CREATE INDEX ctx_example ON EXAMPLE(ZNAK_PISMA) INDEXTYPE IS CTXSYS.CTXCAT;

                      After:
                      SELECT * FROM EXAMPLE WHERE CATSEARCH(ZNAK_PISMA, 'Adm-VI-08121', NULL) > 0;

                      Oracle returns all records from table EXAMPLE, but I didn't use any wildcards in my query. Can someone tell me, how can I fix this?

                      This one query also returns all records:
                      SELECT * FROM EXAMPLE WHERE CATSEARCH(ZNAK_PISMA, '"Adm-VI-08121"', NULL) > 0;
                      • 8. Re: ORACLE TEXT INDEX ON VARCHAR2 COLUMN
                        Roger Ford-Oracle
                        "." and "-" are break characters by default, so all of those strings are broken up into
                        Adm VI 08121
                        followed by the other digits.

                        You can define "." and "-" as join characters, thus:
                        CREATE TABLE EXAMPLE (ID INTEGER, ZNAK_PISMA VARCHAR2(100));
                        
                        INSERT INTO EXAMPLE VALUES(1, 'Adm-VI-08121/1/08');
                        INSERT INTO EXAMPLE VALUES(2, 'Adm-VI-08121/1/08');
                        INSERT INTO EXAMPLE VALUES(3, 'Adm.VI.08121/23/09');
                        INSERT INTO EXAMPLE VALUES(4, 'Adm.VI.08121/22/09');
                        INSERT INTO EXAMPLE VALUES(5, 'Adm.VI.08121/24/09');
                        INSERT INTO EXAMPLE VALUES(6, 'Adm.VI.08121/24/09');
                        INSERT INTO EXAMPLE VALUES(7, 'Adm.VI.08121/23/09');
                        INSERT INTO EXAMPLE VALUES(8, 'Adm.VI.08121/23/09');
                        INSERT INTO EXAMPLE VALUES(9, 'Adm.VI.08121/7/10');
                        INSERT INTO EXAMPLE VALUES(10, 'Adm.VI.08121/7/10');
                        
                        exec ctx_ddl.create_preference('mylex', 'basic_lexer')
                        exec ctx_ddl.set_attribute('mylex', 'printjoins', '.-')
                        
                        CREATE INDEX ctx_example ON EXAMPLE(ZNAK_PISMA) INDEXTYPE IS CTXSYS.CTXCAT
                        PARAMETERS ('lexer mylex');
                        
                        SELECT * FROM EXAMPLE WHERE CATSEARCH(ZNAK_PISMA, 'Adm-VI-08121', NULL) > 0;
                        Which gives you
                        SQL>  SELECT * FROM EXAMPLE WHERE CATSEARCH(ZNAK_PISMA, 'Adm-VI-08121', NULL) > 0;
                        
                             ID ZNAK_PISMA
                        ---------- --------------------------------------------------
                              1 Adm-VI-08121/1/08
                              2 Adm-VI-08121/1/08
                        You need to decide whether or not you should include "/" in the list of join characters.