4 Replies Latest reply on Sep 7, 2006 11:39 AM by 531177

    "and" and Ampersand or special characters

    531177
      I have words in my database that contain statements like fish and chips, fish & chips.

      If your looking for the fish & chips it will return fine with "fish and chips" , "fish \& chips",fish & chips", "fish \and chips"

      However, if your looking for fish and chips it works fine with all of the ones listed above except the \& and I'm unsure why it does not return because special words are supposed to be escaped

      I would like to be able to type in either of the above statements and get both of them back.

      I know that if you use the default stoplist it doesn't index the "and" so I used an empty stoplist and it still does not work. I tried to use the thesaurus to link the word and to & but that didn't work either. I also set my preferences to use the printjoin for the & so that means that it will be index. Maybe I'm not understanding something.

      Can someone please help?

      Thanks
      Jeff
        • 1. Re: "and" and Ampersand or special characters
          Barbara Boehmer
          "And" and "&" have special meaning in Oracle Text. If you search for "word1 and word2" or "word1 & word2" it means that it should find anything that contains both word1 and word2, not the phrase "word1 and word2", so it might also find "word1 other words word2". When you escape a character, it tells it to ignore the special meaning and search for that exact character. So, if you search for "word1 \& word2", it searches for anything containing the exact phrase "word1 & word2". It is doing exactly what you asked it to.
          • 2. Re: "and" and Ampersand or special characters
            531177
            Barbara
            Thank You for you answer. I'm still having trouble with this situation though.

            If I do this query
            SELECT DISTINCT wordnumber, textwords
            FROM sometable
            WHERE CONTAINS (TeXTWORDS,' salad \& Soup',1) > 0

            Results- I did expect this
            ==========================
            WORDNUMBER TEXTWORDS
            ---------- ------------------------------
            6 salad & Soup

            SELECT DISTINCT wordnumber, textwords
            FROM sometable
            WHERE CONTAINS (TeXTWORDS,' salad \and Soup',1) > 0

            Results--I didn't expect because it is not an exact match
            ==========================
            WORDNUMBER TEXTWORDS
            ---------- ------------------------------
            6 salad & Soup

            It works for both the \and and the \& but what I understand from your message earlier it should only return results for the first query because it is an exact match.

            Now if I have

            SELECT DISTINCT wordnumber, textwords
            FROM sometable
            WHERE CONTAINS (TeXTWORDS,'BEAR \AND RABBIT',1) > 0


            Results --Which I expect this because it is an exact match
            ===============
            WORDNUMBER TEXTWORDS
            ---------- ------------------------------
            11 Bear and Rabbit


            SELECT DISTINCT wordnumber, textwords
            FROM sometable
            WHERE CONTAINS (TeXTWORDS,'BEAR \& RABBIT',1) > 0

            Results-Which I expected because there is not an exact match
            ===========================
            no rows selected

            It seems like the apersand is related to the word "and"
            like salad & soup = salad and soup , salad & soup

            but "and" is not related to ampersand
            bear and rabbit= bear and rabbit but is not equal to bear & rabbit

            I hope I make sense and this is not something I'm overlooking.

            Also I changed the lexer to
            begin
            ctx_ddl.create_preference('PREF_SOUNDEX', 'BASIC_WORDLIST');
            ctx_ddl.set_attribute('PREF_SOUNDEX','PREFIX_INDEX','TRUE');
            ctx_ddl.set_attribute('PREF_SOUNDEX','PREFIX_MIN_LENGTH',3);
            ctx_ddl.set_attribute('PREF_SOUNDEX','PREFIX_MAX_LENGTH',15);
            ctx_ddl.create_preference('MAF_LEXER','BASIC_LEXER');
            ctx_ddl.set_attribute('MAF_LEXER','printjoins','-()/.,');
            end;

            and
            I don't have anything in my thesaurus to link & to and because I don't think it works.

            Thanks Very Much
            • 3. Re: "and" and Ampersand or special characters
              Barbara Boehmer
              You were correct in your original thinking that "and" and "&" must not be stopwords if you want to search for them and "&" must also be a printjoin. Then they can be tokenized, indexed, and searchable. The backslash is only good for escaping one character at a time. You can use curly braces to escape a whole word. You can set "and" and "&" as synonyms in a thesaurus. When you search using syn, it rewrites the query to use curly braces, so you don't need to include them. You can use syn to search for "and" or "&", but it gets a bit tricky when you try to combine searching for a synonym of a word with special meaning with multiple words. I have demonstrated usage of some functions for that below. I set things up so that all you have to do is pass in a search string and it will return contains queries that match all possible combinations of synonyms of all words contained in the search string, automatically escaping all of them.
              SCOTT@10gXE> -- test table:
              SCOTT@10gXE> CREATE TABLE SomeTable (TextWords VARCHAR2 (60))
                2  /

              Table created.

              SCOTT@10gXE> -- test data:
              SCOTT@10gXE> SET DEFINE OFF SCAN OFF
              SCOTT@10gXE> INSERT ALL
                2  INTO SomeTable VALUES ('salad soup')
                3  INTO SomeTable VALUES ('salad and soup')
                4  INTO SomeTable VALUES ('salad & soup')
                5  INTO SomeTable VALUES ('salad bread soup')
                6  INTO SomeTable VALUES ('salad and bread and soup')
                7  INTO SomeTable VALUES ('salad & bread & soup')
                8  INTO SomeTable VALUES ('soup salad')
                9  INTO SomeTable VALUES ('soup and salad')
              10  INTO SomeTable VALUES ('soup & salad')
              11  INTO SomeTable VALUES ('soup bread salad')
              12  INTO SomeTable VALUES ('soup and bread and salad')
              13  INTO SomeTable VALUES ('soup & bread & salad')
              14  SELECT * FROM DUAL
              15  /

              12 rows created.

              SCOTT@10gXE> -- preferences:
              SCOTT@10gXE> BEGIN
                2    CTX_DDL.create_preference ('PREF_SOUNDEX', 'BASIC_WORDLIST');
                3        CTX_DDL.set_attribute      ('PREF_SOUNDEX', 'PREFIX_INDEX', 'TRUE');
                4        CTX_DDL.set_attribute      ('PREF_SOUNDEX', 'PREFIX_MIN_LENGTH', 3);
                5        CTX_DDL.set_attribute      ('PREF_SOUNDEX', 'PREFIX_MAX_LENGTH', 15);
                6    CTX_DDL.create_preference ('MAF_LEXER',       'BASIC_LEXER');
                7        CTX_DDL.set_attribute      ('MAF_LEXER',       'printjoins', '-()/.,&');
                8  END;
                9  /

              PL/SQL procedure successfully completed.

              SCOTT@10gXE> -- index:
              SCOTT@10gXE> CREATE INDEX your_index_name ON SomeTable (TextWords)
                2  INDEXTYPE IS CTXSYS.CONTEXT
                3  PARAMETERS
                4    ('WORDLIST pref_soundex
                5        LEXER       maf_lexer
                6        STOPLIST CTXSYS.EMPTY_STOPLIST')
                7  /

              Index created.

              SCOTT@10gXE> -- This is what is tokenized, indexed, and searchable;
              SCOTT@10gXE> -- Everything else is ignored:
              SCOTT@10gXE> SELECT DISTINCT token_text FROM dr$your_index_name$i
                2  /

              TOKEN_TEXT
              ----------------------------------------------------------------
              &
              AND
              BRE
              BREA
              BREAD
              SAL
              SALA
              SALAD
              SOU
              SOUP

              10 rows selected.

              SCOTT@10gXE> -- thesaurus that makes "and" and "&" synonyms:
              SCOTT@10gXE> EXEC CTX_THES.CREATE_THESAURUS ('your_thes')

              PL/SQL procedure successfully completed.

              SCOTT@10gXE> EXEC CTX_THES.CREATE_RELATION ('your_thes', 'and', 'SYN', '&')

              PL/SQL procedure successfully completed.

              SCOTT@10gXE> SELECT CTX_THES.SYN ('and', 'your_thes') FROM DUAL
                2  /

              CTX_THES.SYN('AND','YOUR_THES')
              ----------------------------------------------------------------------------------------------------
              {AND}|{&}

              SCOTT@10gXE> SELECT CTX_THES.SYN ('&', 'your_thes') FROM DUAL
                2  /

              CTX_THES.SYN('&','YOUR_THES')
              ----------------------------------------------------------------------------------------------------
              {&}|{AND}

              SCOTT@10gXE> -- queries:
              SCOTT@10gXE> SELECT * FROM SomeTable
                2  WHERE  CONTAINS (TextWords, 'salad {and} soup', 1) > 0
                3  /

              TEXTWORDS
              ------------------------------------------------------------
              salad and soup

              SCOTT@10gXE> SELECT * FROM SomeTable
                2  WHERE  CONTAINS (TextWords, 'salad {&} soup', 1) > 0
                3  /

              TEXTWORDS
              ------------------------------------------------------------
              salad & soup

              SCOTT@10gXE> SELECT * FROM SomeTable
                2  WHERE  CONTAINS (TextWords, '(salad {and} soup) OR (salad {&} soup)', 1) > 0
                3  /

              TEXTWORDS
              ------------------------------------------------------------
              salad and soup
              salad & soup

              SCOTT@10gXE> SELECT * FROM SomeTable
                2  WHERE  CONTAINS (TextWords, 'SYN (and, your_thes)', 1) > 0
                3  /

              TEXTWORDS
              ------------------------------------------------------------
              salad and soup
              salad & soup
              salad and bread and soup
              salad & bread & soup
              soup and salad
              soup & salad
              soup and bread and salad
              soup & bread & salad

              8 rows selected.

              SCOTT@10gXE> SELECT * FROM SomeTable
                2  WHERE  CONTAINS (TextWords, 'SYN (&, your_thes)', 1) > 0
                3  /

              TEXTWORDS
              ------------------------------------------------------------
              salad and soup
              salad & soup
              salad and bread and soup
              salad & bread & soup
              soup and salad
              soup & salad
              soup and bread and salad
              soup & bread & salad

              8 rows selected.

              SCOTT@10gXE> -- selecting all combinations of all synonyms in a standard way
              SCOTT@10gXE> -- using functions (that you could put in a package if you like)
              SCOTT@10gXE> -- and a bind variable for the string of words to search:
              SCOTT@10gXE> CREATE OR REPLACE FUNCTION remove_extra_seps
                2    (p_string    IN VARCHAR2,
                3       p_sep         IN VARCHAR2 DEFAULT ' ')
                4    RETURN VARCHAR2
                5  AS
                6    v_string VARCHAR2 (32767) := LTRIM (RTRIM (p_string, p_sep), p_sep);
                7  BEGIN
                8    WHILE INSTR (v_string, p_sep || p_sep) > 0 LOOP
                9 
              10        v_string := REPLACE (v_string, p_sep || p_sep, p_sep);
              11    END LOOP;
              12    RETURN v_string;
              13  END remove_extra_seps;
              14  /

              Function created.

              SCOTT@10gXE> SHOW ERRORS
              No errors.
              SCOTT@10gXE> CREATE OR REPLACE FUNCTION list_element
                2    (p_string    VARCHAR2,
                3       p_element   INTEGER,
                4       p_separator VARCHAR2 DEFAULT ' ')
                5    RETURN         VARCHAR2
                6  AS
                7    v_string      VARCHAR2 (32767);
                8  BEGIN
                9    v_string := remove_extra_seps (p_string, p_separator) || p_separator;
              10    FOR i IN 1 .. p_element - 1 LOOP
              11        v_string := SUBSTR (v_string,
              12                      INSTR (v_string, p_separator)
              13                        + LENGTH (p_separator));
              14    END LOOP;
              15    RETURN SUBSTR (v_string, 1, INSTR (v_string, p_separator) - 1);
              16  END list_element;
              17  /

              Function created.

              SCOTT@10gXE> SHOW ERRORS
              No errors.
              SCOTT@10gXE> CREATE OR REPLACE FUNCTION get_syns
                2    (p_words          IN VARCHAR2,
                3       p_num_syns_per_word IN NUMBER DEFAULT 100) -- allows up to 100 synonyms per word
                4    RETURN VARCHAR2
                5  AS
                6    V_words VARCHAR2 (32767);
                7  BEGIN
                8    FOR r IN
                9        (SELECT '('  || SYS_CONNECT_BY_PATH (syns, ' ')  || ')' AS syns
              10         FROM      (SELECT word_num, syn_num, syns
              11              FROM      (SELECT word_num, syn_num,
              12                       list_element (synonyms, syn_num, '|') AS syns
              13                   FROM      (SELECT ROWNUM AS word_num,
              14                            CTX_THES.SYN (list_element (p_words, ROWNUM, ' '),
              15                                       'your_thes') AS synonyms
              16                        FROM      DUAL
              17                        CONNECT BY LEVEL <= (LENGTH (remove_extra_seps (p_words, ' '))
              18                                - LENGTH (REPLACE (remove_extra_seps (p_words, ' '), ' ', ''))) + 1),
              19                       (SELECT ROWNUM AS syn_num
              20                        FROM      DUAL
              21                        CONNECT BY LEVEL <= p_num_syns_per_word)
              22                   WHERE  synonyms <> '{}')
              23              WHERE  syns IS NOT NULL)
              24         WHERE  LEVEL = (LENGTH (remove_extra_seps (p_words, ' '))
              25             - LENGTH ( REPLACE (remove_extra_seps (p_words, ' '), ' ', ''))) + 1
              26         START  WITH word_num = 1
              27         CONNECT BY PRIOR word_num = word_num - 1)
              28    LOOP
              29        v_words := v_words || ' OR ' || r.syns;
              30    END LOOP;
              31    RETURN LTRIM (v_words, ' OR ');
              32  END get_syns;
              33  /

              Function created.

              SCOTT@10gXE> SHOW ERRORS
              No errors.
              SCOTT@10gXE> VARIABLE g_words VARCHAR2 (60)
              SCOTT@10gXE> EXEC :g_words := 'salad and soup'

              PL/SQL procedure successfully completed.

              SCOTT@10gXE> SELECT get_syns (:g_words) FROM DUAL
                2  /

              GET_SYNS(:G_WORDS)
              ----------------------------------------------------------------------------------------------------
              ( {SALAD} {AND} {SOUP}) OR ( {SALAD} {&} {SOUP})

              SCOTT@10gXE> SELECT * FROM SomeTable
                2  WHERE  CONTAINS (TextWords, get_syns (:g_words), 1) > 0
                3  /

              TEXTWORDS
              ------------------------------------------------------------
              salad and soup
              salad & soup

              SCOTT@10gXE>
              • 4. Re: "and" and Ampersand or special characters
                531177
                Thank you very much. Things are much clearer now.