6 Replies Latest reply: Oct 16, 2012 5:07 AM by James OConnor RSS

    Match ANY word

    James OConnor
      Is there a way to match ANY word in a string using CONTAINS?
      - e.g. Given a set of columns holding variations of text borrowed from above

      'you can optionally edit your post'
      'you can optionally preview your post'
      'you can optionally delete your post'

      and then match all 3 with something syntactically like

      and CONTAINS(<column>, 'optionally % your post') >0

      I predict using % is a bad idea as expansion will kill it as the index grows, but is there an alternative that will match a single word gap (so not using NEAR)?
        • 1. Re: Match ANY word
          damorgan
          I do not have experience with CONTAINS but you can do it with normal SQL.

          http://docs.oracle.com/cd/E11882_01/server.112/e26088/conditions005.htm#SQLRF52122
          http://www.morganslibrary.org/reference/conditions.html
          • 2. Re: Match ANY word
            Barbara Boehmer
            Although not intended for the purpose, using a stopword has the effect of requiring that it match any word. You could use an existing stopword like "the" or you could create your own stoplist and your own stopword for the purpose like "anyword", as demonstrated below. The stopword is not indexed, but its location is recorded, so a word must be present in that spot. I suspect it might be faster than % or NEAR or other methods.
            SCOTT@orcl_11gR2> CREATE TABLE test_tab
              2    (test_col  VARCHAR2(60))
              3  /
            
            Table created.
            
            SCOTT@orcl_11gR2> INSERT ALL
              2  INTO test_tab VALUES ('you can optionally edit your post')
              3  INTO test_tab VALUES ('you can optionally preview your post')
              4  INTO test_tab VALUES ('you can optionally delete your post')
              5  INTO test_tab VALUES ('some other data')
              6  SELECT * FROM DUAL
              7  /
            
            4 rows created.
            
            SCOTT@orcl_11gR2> BEGIN
              2    CTX_DDL.CREATE_STOPLIST ('test_stoplist', 'BASIC_STOPLIST');
              3    CTX_DDL.ADD_STOPWORD ('test_stoplist', 'ANYWORD');
              4  END;
              5  /
            
            PL/SQL procedure successfully completed.
            
            SCOTT@orcl_11gR2> CREATE INDEX test_idx
              2  ON test_tab (test_col)
              3  INDEXTYPE IS CTXSYS.CONTEXT
              4  PARAMETERS ('STOPLIST test_stoplist')
              5  /
            
            Index created.
            
            SCOTT@orcl_11gR2> SELECT token_text FROM dr$test_idx$i
              2  /
            
            TOKEN_TEXT
            ----------------------------------------------------------------
            CAN
            DATA
            DELETE
            EDIT
            OPTIONALLY
            OTHER
            POST
            PREVIEW
            SOME
            YOU
            YOUR
            
            11 rows selected.
            
            SCOTT@orcl_11gR2> SELECT * FROM test_tab
              2  WHERE  CONTAINS (test_col, 'optionally ANYWORD your post') > 0
              3  /
            
            TEST_COL
            ------------------------------------------------------------
            you can optionally edit your post
            you can optionally preview your post
            you can optionally delete your post
            
            3 rows selected.
            • 3. Re: Match ANY word
              James OConnor
              Nice little trick, the only problem I see with it right now is if it's potentially a noise word you are trying to skip.
              e.g. adding the row

              INSERT INTO test_tab VALUES ('you can optionally have your post')

              This wouldn't get matched using the stopword trick, but hopefully it turns out to be an edge case scenario. We will never really know in advance what those crazy/pesky users will get up to when you aren't looking!

              Thanks for the detailed reply though, it is very much appreciated as it's better than using % or nothing, and so simple to implement. I think we will try this for now and see how it goes.
              • 4. Re: Match ANY word
                Roger Ford-Oracle
                Yes, it would. A stopword in a query matches any word - stopword or non-stopword - in the text. I agree it's a nice trick.
                • 5. Re: Match ANY word
                  Barbara Boehmer
                  The following just demonstrates that, like Roger said, it will work with a noise word like "have", as long as that word is in your stoplist.
                  SCOTT@orcl_11gR2> CREATE TABLE test_tab
                    2    (test_col  VARCHAR2(60))
                    3  /
                  
                  Table created.
                  
                  SCOTT@orcl_11gR2> INSERT ALL
                    2  INTO test_tab VALUES ('you can optionally edit your post')
                    3  INTO test_tab VALUES ('you can optionally preview your post')
                    4  INTO test_tab VALUES ('you can optionally delete your post')
                    5  INTO test_tab VALUES ('some other data')
                    6  INTO test_tab VALUES ('you can optionally have your post')
                    7  SELECT * FROM DUAL
                    8  /
                  
                  5 rows created.
                  
                  SCOTT@orcl_11gR2> BEGIN
                    2    CTX_DDL.CREATE_STOPLIST ('test_stoplist', 'BASIC_STOPLIST');
                    3    CTX_DDL.ADD_STOPWORD ('test_stoplist', 'ANYWORD');
                    4    CTX_DDL.ADD_STOPWORD ('test_stoplist', 'have');
                    5  END;
                    6  /
                  
                  PL/SQL procedure successfully completed.
                  
                  SCOTT@orcl_11gR2> CREATE INDEX test_idx
                    2  ON test_tab (test_col)
                    3  INDEXTYPE IS CTXSYS.CONTEXT
                    4  PARAMETERS ('STOPLIST test_stoplist')
                    5  /
                  
                  Index created.
                  
                  SCOTT@orcl_11gR2> SELECT token_text FROM dr$test_idx$i
                    2  /
                  
                  TOKEN_TEXT
                  ----------------------------------------------------------------
                  CAN
                  DATA
                  DELETE
                  EDIT
                  OPTIONALLY
                  OTHER
                  POST
                  PREVIEW
                  SOME
                  YOU
                  YOUR
                  
                  11 rows selected.
                  
                  SCOTT@orcl_11gR2> SELECT * FROM test_tab
                    2  WHERE  CONTAINS (test_col, 'optionally ANYWORD your post') > 0
                    3  /
                  
                  TEST_COL
                  ------------------------------------------------------------
                  you can optionally edit your post
                  you can optionally preview your post
                  you can optionally delete your post
                  you can optionally have your post
                  
                  4 rows selected.
                  • 6. Re: Match ANY word
                    James OConnor
                    I apologise for wasting your time here, I am sure that when I tried it yesterday it wasn't working when I added a sentence with have, which was just a word chosen at random from the default stopwords list. When I ran thru it again today though it worked exactly as you say.
                    My best guess is I had added the row but not committed it (I use sqldeveloper, and don't have it set to autocommit, so make that mistake a lot !)

                    Thanks again for your rapid help, its very much appreciated.