7 Replies Latest reply: Dec 19, 2012 6:00 AM by Roger Ford-Oracle RSS

    Oracle Text and MINUS character

    S-Max
      Hi all,

      I have following problem:
      - I have created a Oracle Text Index on the VARCHAR2 column:
      BEGIN
        ctx_ddl.create_preference('SUBSTRING_PREF','BASIC_WORDLIST');
        ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
      END;
      /
      
      CREATE INDEX IDX_TEXT_1 ON MY_TABLE
      (COLUMN1)
      INDEXTYPE IS CTXSYS.CONTEXT
      PARAMETERS('wordlist SUBSTRING_PREF memory 50m')
      NOPARALLEL;
      - I execute the following SELECT:
      SELECT mt.*
      FROM  MY_TABLE mt
      WHERE  contains(mt.COLUMN1, 'test%') > 0;
      It returns all records where the column1 contains the entry "test" + something else.
      BUT the records where column1 contains the entries "my-test" + something else or "owr-test" + something else, a.s.o.

      It should return only the records with "test" + something else entries.

      How can I change the Index or query to achieve it?

      Best regards
        • 1. Re: Oracle Text and MINUS character
          Barbara Boehmer
          By default, the hyphen is treated as a break character and the words on either side are indexed as separate tokens. To change this behavior, you can create a lexer, set the printjoins attribute of the lexer to include the hyphen, then use that lexer in your index parameters. Then, strings of characters containing a hyphen will be indexed as one token, including the hyphen. Please see the example below.
          SCOTT@orcl_11gR2> CREATE TABLE my_table (column1  VARCHAR2(60))
            2  /
          
          Table created.
          
          SCOTT@orcl_11gR2> INSERT ALL
            2  INTO my_table VALUES ('test')
            3  INTO my_table VALUES ('testing')
            4  INTO my_table VALUES ('my-test')
            5  INTO my_table VALUES ('owr-test')
            6  SELECT * FROM DUAL
            7  /
          
          4 rows created.
          
          SCOTT@orcl_11gR2> BEGIN
            2    ctx_ddl.create_preference('SUBSTRING_PREF','BASIC_WORDLIST');
            3    ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
            4    CTX_DDL.CREATE_PREFERENCE ('test_lex', 'BASIC_LEXER');
            5    CTX_DDL.SET_ATTRIBUTE ('test_lex', 'PRINTJOINS', '-');
            6  END;
            7  /
          
          PL/SQL procedure successfully completed.
          
          SCOTT@orcl_11gR2> CREATE INDEX IDX_TEXT_1 ON MY_TABLE (COLUMN1)
            2  INDEXTYPE IS CTXSYS.CONTEXT
            3  PARAMETERS
            4    ('wordlist  SUBSTRING_PREF
            5        LEXER        test_lex
            6        memory    50m')
            7  NOPARALLEL
            8  /
          
          Index created.
          
          SCOTT@orcl_11gR2> SELECT token_text FROM dr$idx_text_1$i
            2  /
          
          TOKEN_TEXT
          ----------------------------------------------------------------
          MY-TEST
          OWR-TEST
          TEST
          TESTING
          
          4 rows selected.
          
          SCOTT@orcl_11gR2> SELECT mt.*
            2  FROM   MY_TABLE mt
            3  WHERE  contains (mt.COLUMN1, 'test%') > 0
            4  /
          
          COLUMN1
          ------------------------------------------------------------
          test
          testing
          
          2 rows selected.
          • 2. Re: Oracle Text and MINUS character
            S-Max
            Hello Barbara,

            thank you for your answer!
            This works. But now I have another question: the entries "my test" or "owr test" (with blank between the words) will be found as well. How can I define the index, that only the entries "test%" (not "%test%") will be returned?

            Best regards
            • 3. Re: Oracle Text and MINUS character
              Roger Ford-Oracle
              By using a standard B-tree index. A CONTEXT index is designed for word-based searches. If you just want to search for strings starting with certain characters, you're using the wrong technology.

              You CAN force such queries in a CONTEXT index, but it's not what it's designed for.
              • 4. Re: Oracle Text and MINUS character
                S-Max
                Hi Roger,

                thank you. How can I define the Text Index for my purpose?

                Regards
                • 5. Re: Oracle Text and MINUS character
                  Roger Ford-Oracle
                  So do you ONLY want to find strings that start with a particular word or partial word? Or do you need to do other types of search?
                  • 6. Re: Oracle Text and MINUS character
                    S-Max
                    The search should return the entries that starts with the search string. In my example this can be the entries like "test" or "test abcd" or "testabcd", but not "abcd test" or "abcdtest".
                    • 7. Re: Oracle Text and MINUS character
                      Roger Ford-Oracle
                      You could mark the start of the string with some special characters. You could do that with the MULTI_COLUMN_DATASTORE like this:
                      DROP TABLE my_table
                      /
                      CREATE TABLE my_table (column1  VARCHAR2(60))
                      / 
                      INSERT ALL
                       INTO my_table VALUES ('test')
                       INTO my_table VALUES ('testing')
                       INTO my_table VALUES ('my-test')
                       INTO my_table VALUES ('owr-test')
                       SELECT * FROM DUAL
                      / 
                      
                      exec ctx_ddl.drop_preference('SUBSTRING_PREF')
                      exec ctx_ddl.drop_preference('mcds')
                      
                      BEGIN
                         ctx_ddl.create_preference('SUBSTRING_PREF','BASIC_WORDLIST');
                         ctx_ddl.set_attribute('SUBSTRING_PREF','SUBSTRING_INDEX','TRUE');
                         CTX_DDL.CREATE_PREFERENCE ('test_lex', 'BASIC_LEXER');
                         CTX_DDL.SET_ATTRIBUTE ('test_lex', 'PRINTJOINS', '-');
                      END;
                      / 
                      
                      BEGIN
                        ctx_ddl.create_preference('mcds', 'MULTI_COLUMN_DATASTORE');
                        ctx_ddl.set_attribute('mcds', 'COLUMNS', '''XZX''||COLUMN1');
                      END;
                      /
                      
                      CREATE INDEX IDX_TEXT_1 ON MY_TABLE (COLUMN1)
                       INDEXTYPE IS CTXSYS.CONTEXT
                       PARAMETERS
                         ('wordlist  SUBSTRING_PREF
                             LEXER        test_lex
                               DATASTORE mcds
                            memory    50m')
                       NOPARALLEL
                      / 
                       
                      SELECT token_text FROM dr$idx_text_1$i
                      / 
                      SELECT mt.*
                        FROM   MY_TABLE mt
                        WHERE  contains (mt.COLUMN1, 'XZXtest%') > 0
                      /