7 Replies Latest reply: Jun 22, 2012 6:02 PM by Barbara Boehmer RSS

    How does ACCUM work?

    Gwydion
      Hi there,

      I have a problem understanding how ACCUM works. We are using a text index over XML documents and want to apply a very simple relevance weighting based on occurences within a certain tag. For example, if TAG1 has a weight of 10, each occurrence in TAG1 should add 10 to the score. We have 13 Tags that should be weighted and multiple occurences in different tags should be cumulated. So 3 occurences in TAG1 (weight 10) and one occurence in TAG2 (weight 7) should result in a score of 37.

      Just searching in TAG1 works as expected:
      ( DEFINESCORE( foo, OCCURRENCE ) WITHIN TAG1 ) * 10
      Three occurences score 30.

      Just searching in TAG2 works also as expected. One occurence scores 7. However, when I combine both searches
      (( DEFINESCORE( foo, OCCURRENCE ) WITHIN TAG1 ) * 10) ACCUM (( DEFINESCORE( foo, OCCURRENCE ) WITHIN TAG2 ) * 7)
      the score is 61 instead of 37. More interestingly, including all 13 tags (of which most do not contain the search word) lowers the score again.

      I ended up with searches where two documents containing completely different occurrences of a search word got exactly the same score.
        • 1. Re: How does ACCUM work?
          Roger Ford-Oracle
          ACCUM is designed to score higher the more terms that match. It does this by "bucketing" the results

          So if I do "a ACCUM b ACCUM c" then a match on just one of those terms will score in the range 1-33, two of them will score in the range 34-66 and all three will score in the range 67-100.

          Similarly for a ACCUM b ACCUM c ACCUM d the ranges will be 1-25, 26-50, 51-74 and 75-100.

          Within those "buckets", the actual value will be based on the number of matches and the inherent "value" of the search term (look up TF-IDF for more details)
          • 2. Re: How does ACCUM work?
            Roger Ford-Oracle
            To achieve what you want, you will need to use DEFINEMERGE as well:
            create table mytab(text varchar2(255));
            
            insert into mytab values ('
              <tag1> foo foo foo </tag1>
              <tag2> foo foo     </tag2> 
              <tag3> bar         </tag3>
            ');
            
            create index mytabindex on mytab (text) indextype is ctxsys.context
            parameters ('section group ctxsys.auto_section_group');
            
            select score(1) from mytab where contains(text, '
            DEFINEMERGE 
                 ( ( ( (DEFINESCORE( foo, OCCURRENCE) WITHIN tag1) *10 ), 
                     ( (DEFINESCORE( foo, OCCURRENCE) WITHIN tag2) *7  ),
                     ( (DEFINESCORE( foo, OCCURRENCE) WITHIN tag3) *5  ) ), 
                   OR, ADD )
            ',1 ) > 0;
            This will score 44: 30 points from three occurrences in tag1, 14 points from two occurrences in tag2 and nothing from tag3.

            Edited - please note that if you received a notification based on my first attempt, the code there is simply wrong. I used AND rather than OR in the DEFINEMERGE, and a syntax error had crept in when I made a change but didn't properly test it.

            Edited by: Roger Ford on Jun 21, 2012 3:03 AM
            • 3. Re: How does ACCUM work?
              Gwydion
              Thanks a lot!
              • 4. Re: How does ACCUM work?
                Gwydion
                I still have another question: Is it possible to use the "AND NOT" operator with DEFINEMERGE?
                • 5. Re: How does ACCUM work?
                  Roger Ford-Oracle
                  You can certainly add a NOT clause to query which uses DEFINEMERGE:
                  select score(1) from mytab where contains(text, '
                  DEFINEMERGE 
                       ( ( ( (DEFINESCORE( foo, OCCURRENCE) WITHIN tag1) *10 ), 
                           ( (DEFINESCORE( foo, OCCURRENCE) WITHIN tag2) *7  ),
                           ( (DEFINESCORE( foo, OCCURRENCE) WITHIN tag3) *5  ) ), 
                         OR, ADD )
                  NOT (baz within tag4)
                  ',1 ) > 0;
                  Does that answer it?
                  • 6. Re: How does ACCUM work?
                    Gwydion
                    Yes and no :-)
                    The background of my question is that we want to extend the search component of our application with an optional score ranking. The component itself is already rather complex. We currently support an arbitrarily complex search tree (with AND, OR, ANDNOT), many operators for the leaf nodes (ANY, ALL, PHRASE, NEAR, WITH...), truncation, synonyms, hierarchical searches and so on.
                    Now we have the requirement to extend this with weigthed ranking. Now I'm asking myself what is the minimal invasive way to do that? For example, a non-leaf node in our search expression tree simply results in <left expression><operator><right expression>. I thought that for a ranked query I could replace that with something like DEFINEMERGE( ( <left expression>, <right expression> ), <operator>, ADD). However, NOT is not supported that way. Of course I know that it doesn't make sense to determine the score of a negated expression. I just wanted to handle all operators in a generic way without the need for a special case for NOT expressions.
                    But maybe its about time for a fundamental refactoring of our search engine. We are currently at a point where adding features like scoring is going to break a lot of things. Do you know of any resources (papers, books, etc.) which cover fundamental principles of building flexible and extensible search components? Something oracle-related would be best, but something generic would do.
                    • 7. Re: How does ACCUM work?
                      Barbara Boehmer
                      I usually prefer to use a function to format the search string. I make sure that the function has default values for any nulls. In the following demonstration, I added some data to Roger's example, created a function that accepts four sets of words/phrases, tags, and weights, and one operator, with default values, and builds the search string. I then demonstrated what the function returns, based on some sample values for the variables, then usage of the function in a query. I made the NOT part of the words/phrases variables, but you could separate that into four separate variables as well.

                      -- table, data, and index:
                      SCOTT@orcl_11gR2> create table mytab(text varchar2(255));
                      
                      Table created.
                      
                      SCOTT@orcl_11gR2> insert into mytab values ('
                        2    <tag0>
                        3        <tag1> foo foo foo </tag1>
                        4        <tag2> foo foo     </tag2>
                        5        <tag3> bar         </tag3>
                        6        <tag4> baz         </tag4>
                        7    </tag0>
                        8  ');
                      
                      1 row created.
                      
                      SCOTT@orcl_11gR2> insert into mytab values ('
                        2    <tag0>
                        3        <tag1> foo foo foo </tag1>
                        4        <tag2> foo foo     </tag2>
                        5        <tag3> bar         </tag3>
                        6        <tag4> bar         </tag4>
                        7    </tag0>
                        8  ');
                      
                      1 row created.
                      
                      SCOTT@orcl_11gR2> create index mytabindex on mytab (text) indextype is ctxsys.context
                        2  parameters ('section group ctxsys.auto_section_group');
                      
                      Index created.
                      -- function:
                      SCOTT@orcl_11gR2> CREATE OR REPLACE FUNCTION format_search
                        2    (p_word1    IN VARCHAR2,
                        3       p_tag1        IN VARCHAR2,
                        4       p_weight1  IN NUMBER,
                        5       p_word2    IN VARCHAR2,
                        6       p_tag2        IN VARCHAR2,
                        7       p_weight2  IN NUMBER,
                        8       p_word3    IN VARCHAR2,
                        9       p_tag3        IN VARCHAR2,
                       10       p_weight3  IN NUMBER,
                       11       p_word4    IN VARCHAR2,
                       12       p_tag4        IN VARCHAR2,
                       13       p_weight4  IN NUMBER,
                       14       p_operator IN VARCHAR2)
                       15    RETURN           VARCHAR2
                       16  AS
                       17    v_search_string      VARCHAR2(32767);
                       18  BEGIN
                       19    v_search_string := 'DEFINEMERGE((';
                       20    IF p_word1 IS NOT NULL AND p_word1 NOT LIKE 'NOT %' THEN
                       21        v_search_string := v_search_string
                       22        || CHR(10) || '((DEFINESCORE(' || p_word1 || ',OCCURRENCE) WITHIN '
                       23        || NVL (p_tag1, 'tag0') || ')*' || NVL (p_weight1, 1) || '),';
                       24    END IF;
                       25    IF p_word2 IS NOT NULL AND p_word2 NOT LIKE 'NOT %' THEN
                       26        v_search_string := v_search_string
                       27        || CHR(10) || '((DEFINESCORE(' || p_word2 || ',OCCURRENCE) WITHIN '
                       28        || NVL (p_tag2, 'tag0') || ')*' || NVL (p_weight2, 1) || '),';
                       29    END IF;
                       30    IF p_word3 IS NOT NULL AND p_word3 NOT LIKE 'NOT %' THEN
                       31        v_search_string := v_search_string
                       32        || CHR(10) || '((DEFINESCORE(' || p_word3 || ',OCCURRENCE) WITHIN '
                       33        || NVL (p_tag3, 'tag0') || ')*' || NVL (p_weight3, 1) || '),';
                       34    END IF;
                       35    IF p_word4 IS NOT NULL AND p_word4 NOT LIKE 'NOT %' THEN
                       36        v_search_string := v_search_string
                       37        || CHR(10) || '((DEFINESCORE(' || p_word4 || ',OCCURRENCE) WITHIN '
                       38        || NVL (p_tag4, 'tag0') || ')*' || NVL (p_weight4, 1) || '),';
                       39    END IF;
                       40    v_search_string := RTRIM (v_search_string, ',') || CHR(10) || '),'
                       41        || NVL (p_operator, 'OR') || ',ADD)';
                       42    IF p_word1 LIKE 'NOT %' THEN
                       43        v_search_string := v_search_string
                       44        || CHR(10) || 'NOT (' || SUBSTR (p_word1, 5) || ' WITHIN '
                       45        || NVL (p_tag1, 'tag0') || ')';
                       46    END IF;
                       47    IF p_word2 LIKE 'NOT %' THEN
                       48        v_search_string := v_search_string
                       49        || CHR(10) || 'NOT (' || SUBSTR (p_word2, 5) || ' WITHIN '
                       50        || NVL (p_tag2, 'tag0') || ')';
                       51    END IF;
                       52    IF p_word3 LIKE 'NOT %' THEN
                       53        v_search_string := v_search_string
                       54        || CHR(10) || 'NOT (' || SUBSTR (p_word3, 5) || ' WITHIN '
                       55        || NVL (p_tag3, 'tag0') || ')';
                       56    END IF;
                       57    IF p_word4 LIKE 'NOT %' THEN
                       58        v_search_string := v_search_string
                       59        || CHR(10) || 'NOT (' || SUBSTR (p_word4, 5) || ' WITHIN '
                       60        || NVL (p_tag4, 'tag0') || ')';
                       61    END IF;
                       62    RETURN v_search_string;
                       63  END format_search;
                       64  /
                      
                      Function created.
                      
                      SCOTT@orcl_11gR2> SHOW ERRORS
                      No errors.
                      -- variables and values::
                      SCOTT@orcl_11gR2> VARIABLE g_word1    VARCHAR2(100)
                      SCOTT@orcl_11gR2> VARIABLE g_tag1     VARCHAR2(100)
                      SCOTT@orcl_11gR2> VARIABLE g_weight1  NUMBER
                      SCOTT@orcl_11gR2> VARIABLE g_word2    VARCHAR2(100)
                      SCOTT@orcl_11gR2> VARIABLE g_tag2     VARCHAR2(100)
                      SCOTT@orcl_11gR2> VARIABLE g_weight2  NUMBER
                      SCOTT@orcl_11gR2> VARIABLE g_word3    VARCHAR2(100)
                      SCOTT@orcl_11gR2> VARIABLE g_tag3     VARCHAR2(100)
                      SCOTT@orcl_11gR2> VARIABLE g_weight3  NUMBER
                      SCOTT@orcl_11gR2> VARIABLE g_word4    VARCHAR2(100)
                      SCOTT@orcl_11gR2> VARIABLE g_tag4     VARCHAR2(100)
                      SCOTT@orcl_11gR2> VARIABLE g_weight4  NUMBER
                      SCOTT@orcl_11gR2> VARIABLE g_operator VARCHAR2(100)
                      SCOTT@orcl_11gR2> BEGIN
                        2    :g_word1 := 'foo';
                        3    :g_tag1 := 'tag1';
                        4    :g_weight1 := 10;
                        5    :g_word2 := 'foo';
                        6    :g_tag2 := 'tag2';
                        7    :g_weight2 := 7;
                        8    :g_word3 := 'foo';
                        9    :g_tag3 := 'tag3';
                       10    :g_weight3 := 5;
                       11    :g_word4 := 'NOT baz';
                       12    :g_tag4 := 'tag4';
                       13    :g_operator    := 'OR';
                       14  END;
                       15  /
                      
                      PL/SQL procedure successfully completed.
                      -- what the function returns:
                      SCOTT@orcl_11gR2> SELECT format_search
                        2             (:g_word1, :g_tag1, :g_weight1,
                        3              :g_word2, :g_tag2, :g_weight2,
                        4              :g_word3, :g_tag3, :g_weight3,
                        5              :g_word4, :g_tag4, :g_weight4,
                        6              :g_operator)
                        7  FROM   DUAL
                        8  /
                      
                      FORMAT_SEARCH(:G_WORD1,:G_TAG1,:G_WEIGHT1,:G_WORD2,:G_TAG2,:G_WEIGHT2,:G_WORD3,:
                      --------------------------------------------------------------------------------
                      DEFINEMERGE((
                      ((DEFINESCORE(foo,OCCURRENCE) WITHIN tag1)*10),
                      ((DEFINESCORE(foo,OCCURRENCE) WITHIN tag2)*7),
                      ((DEFINESCORE(foo,OCCURRENCE) WITHIN tag3)*5)
                      ),OR,ADD)
                      NOT (baz WITHIN tag4)
                      
                      
                      1 row selected.
                      -- query using function:
                      SCOTT@orcl_11gR2> SELECT SCORE(1) FROM mytab
                        2  WHERE  CONTAINS
                        3             (text,
                        4              format_search
                        5             (:g_word1, :g_tag1, :g_weight1,
                        6              :g_word2, :g_tag2, :g_weight2,
                        7              :g_word3, :g_tag3, :g_weight3,
                        8              :g_word4, :g_tag4, :g_weight4,
                        9              :g_operator),
                       10              1 ) > 0
                       11  /
                      
                        SCORE(1)
                      ----------
                              44
                      
                      1 row selected.