4 Replies Latest reply: Dec 3, 2013 3:16 PM by Andi Heusser RSS

    Query for text that includes reserved keywords

    Andi Heusser

      I'm trying to figure out how I can make sure that a user defined text query that may contain reserved key words (e.g. NOT, ABOUT, WITHIN) actually ends up searching for those strings rather than having the special behavior applied. I've tried various ways to escape then but nothing seems to work or I don't understand why Oracle returns what it returns. So here's a little example I used to try and test this out:

       

      drop index tbl_test_text_index;

      drop table tbl_test;

       

      create table tbl_test (tid number, text clob);

      create index tbl_test_text_index on tbl_test(text) indextype is CTXSYS.CONTEXT PARAMETERS ('SYNC (EVERY "sysdate+(30/(24*60*60))")');

       

      insert into tbl_test (tid, text) values (1, 'hi');

      insert into tbl_test (tid, text) values (2, 'hi2');

      insert into tbl_test (tid, text) values (3, 'hi 2');

      insert into tbl_test (tid, text) values (4, 'hi not');

      insert into tbl_test (tid, text) values (5, 'hi again');

      insert into tbl_test (tid, text) values (6, 'hi back');

      insert into tbl_test (tid, text) values (7, 'hi again');

      insert into tbl_test (tid, text) values (8, 'bye');

       

      Then I've tried to create a search query that would end up returning only row with id 4 'hi not'.

       

      When I try

           select * from tbl_test where contains(text, 'hi not') > 0;

      then this obviously fails because 'not' is a reserved keyword.

       

      When I try

           select * from tbl_test where contains(text, '"hi not"') > 0;

           select * from tbl_test where contains(text, 'hi "not"') > 0;

           select * from tbl_test where contains(text, 'hi {not}') > 0;

      they all return all rows except row 2 and 8.

       

      When I try

           select * from tbl_test where contains(text, 'hi again') > 0;

           select * from tbl_test where contains(text, 'hi "again"') > 0;

           select * from tbl_test where contains(text, 'hi {again}') > 0;

      they all return just the two rows I'd expect which is rows 5 and 7.

       

      The Oracle version I'm using is:

      Oracle Database 11g Release 11.2.0.1.0 - 64bit Production                       

      PL/SQL Release 11.2.0.1.0 - Production                                          

      CORE 11.2.0.1.0 Production                                                        

      TNS for Linux: Version 11.2.0.1.0 - Production                                  

      NLSRTL Version 11.2.0.1.0 - Production 

       

      I've also tried the same on Oracle XE

      Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production       

      PL/SQL Release 11.2.0.2.0 - Production                                          

      CORE 11.2.0.2.0 Production                                                        

      TNS for Linux: Version 11.2.0.2.0 - Production                                  

      NLSRTL Version 11.2.0.2.0 - Production  

       

      Why is this not working as expected, i.e. only return rows with 'hi' followed by 'not'? Is this a bug in Oracle?

        • 1. Re: Query for text that includes reserved keywords
          Roger Ford-Oracle

          No, it's not a bug. "not" is a stopword as well as a reserved word, so is ignored when you search for it.

           

          hi {not}

           

          will be transformed to just

           

          hi

           

          See Stopword Transformations In Oracle Text

          • 2. Re: Query for text that includes reserved keywords
            Barbara Boehmer

            Roger beat me to responding, but I might as well add to that and post the example I put together.  If you do not use a stoplist, then Oracle uses a default stoplist.  The following example uses an empty stoplist.  There are two ways to escape things, either by enclosing words or phrases with { and } or escaping each problem character with \.  The following uses { and } around each word and also demonstrates a method of automatically putting such around each word in a bind variable.  However, you may not want to do that with every word.  So, you might want to write a function to loop through a table of just reserved words and enclose those or some such thing.

             

            SCOTT@orcl> -- version"

            SCOTT@orcl> select banner from v$version;

             

            BANNER

            --------------------------------------------------------------------------------

            Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

            PL/SQL Release 11.2.0.1.0 - Production

            CORE    11.2.0.1.0    Production

            TNS for 64-bit Windows: Version 11.2.0.1.0 - Production

            NLSRTL Version 11.2.0.1.0 - Production

             

            5 rows selected.

             

            SCOTT@orcl> -- table and data for testing:

            SCOTT@orcl> create table tbl_test (tid number, text clob);

             

            Table created.

             

            SCOTT@orcl> insert all

              2  into tbl_test (tid, text) values (1, 'hi')

              3  into tbl_test (tid, text) values (2, 'hi2')

              4  into tbl_test (tid, text) values (3, 'hi 2')

              5  into tbl_test (tid, text) values (4, 'hi not')

              6  into tbl_test (tid, text) values (5, 'hi again')

              7  into tbl_test (tid, text) values (6, 'hi back')

              8  into tbl_test (tid, text) values (7, 'hi again')

              9  into tbl_test (tid, text) values (8, 'bye')

            10  select * from dual;

             

            8 rows created.

             

            SCOTT@orcl> -- index that uses an empty stoplist:

            SCOTT@orcl> create index tbl_test_text_index on tbl_test(text) indextype is CTXSYS.CONTEXT

              2    PARAMETERS

              3       ('SYNC     (EVERY "sysdate+(30/(24*60*60))")

              4         STOPLIST CTXSYS.EMPTY_STOPLIST');

             

            Index created.

             

            SCOTT@orcl> -- tokens indexed:

            SCOTT@orcl> select token_text from dr$tbl_test_text_index$i

              2  /

             

            TOKEN_TEXT

            ----------------------------------------------------------------

            2

            AGAIN

            BACK

            BYE

            HI

            HI2

            NOT

             

            7 rows selected.

             

            SCOTT@orcl> column text format a60

            SCOTT@orcl> -- simple query:

            SCOTT@orcl> select * from tbl_test where  contains (text, '{hi} {not}') > 0;

             

                   TID TEXT

            ---------- ------------------------------------------------------------

                     4 hi not

             

            1 row selected.

             

            SCOTT@orcl> -- usage of bind variable with method that escapes all words:

            SCOTT@orcl> variable search_string varchar2(100)

            SCOTT@orcl> exec :search_string := 'hi not'

             

            PL/SQL procedure successfully completed.

             

            SCOTT@orcl> select * from tbl_test where  contains (text, '{' || replace (:search_string, ' ', '} {') || '}') > 0;

             

                   TID TEXT

            ---------- ------------------------------------------------------------

                     4 hi not

             

            1 row selected.

            • 3. Re: Query for text that includes reserved keywords
              Andi Heusser

              Ah well, I should've tried some of the other reserved keywords as well . Just happened to try 'not' first.

               

              So when I tried adding data with 'within' or 'about' and did the search with braces around those words, then it worked as expected. Just have to figure out how to deal with the 'not' now.

               

              Thanks.

              • 4. Re: Query for text that includes reserved keywords
                Andi Heusser

                Yeah I just realized that myself as I was reading through the documentation and once I defined my own, empty stoplist, it worked as expected. I have to see if we really want no stop words at all but at least I know now what the issue was.

                 

                Thanks.