2 Replies Latest reply: Nov 6, 2012 9:15 PM by jmcnaug2 RSS

    Identifying which tokens/columns matched in a MULTI_COLUMN_DATASTORE Index

      Hello there,

      I have a multi-column datastore text index. I am searching this index with a variant of the following relaxed progressive search:
        FROM TEST_MV
       WHERE country_code = 'AU'
         AND CONTAINS(vehicle_text,
                          <seq>{ford}*3 AND {falcon}*3 AND {2012}*3 AND {SEDAN}*3 AND {blah1} AND {BLAH2} AND {xr6} AND {turbo} AND {test} AND {manual} AND {limited} AND {edition}</seq>
                          <seq>{ford}*3 ACCUM {falcon}*3 ACCUM {2012}*3 ACCUM {SEDAN} ACCUM {xr6}*4 ACCUM {turbo} ACCUM {test} ACCUM {manual} ACCUM {limited} ACCUM {edition}</seq>
                       </query>', 1) > 40
      Each column in the underlying table will correspond to one or more of the words in the CONTAINS function above. I won't necessarily know which words will correspond to which column at runtime. It is quite possible that the entire text string will be populated with unknown words that will attempt to be matched to the text index. Once a match is found, I would like to know which of the underlying column names:contained text were actually matched.

      I am new to text indexing, and am trying to find the most elegant way to accomplish this goal. The following OTN post has given me an idea:

      How can I find out what word fuzzy search matched

      I just wondered if anybody else has encountered this requirement, and what kind of solution you chose? Also curious if Oracle support/development staff have any opinions and/or elegant solution recommendations?

      Many thanks in advance for any responses.

        • 1. Re: Identifying which tokens/columns matched in a MULTI_COLUMN_DATASTORE Index
          Barbara Boehmer
          The link that you provided shows a method for determining the tokens matched. The second page of the thread in the following link shows some methods for obtaining the columns matched.

          Text index for UNION queries.
          • 2. Re: Identifying which tokens/columns matched in a MULTI_COLUMN_DATASTORE Index
            Hi Barbera,

            Thanks for such a quick response. I was actually thinking of a solution similar to the Materialised View "Pivot" Solution you provided in that post, i.e.

            Another method would be to take an entirely different approach and use a materialized view, as demonstrated below with two columns of two tables. I didn't miss your other questions. I just hadn't gotten to them yet. I respond inbetween doing other things and sometimes it takes a while to think about something and put together a good example.

            Rather than text indexing a pivoted materialised view, I am thinking of taking the strongest matching row from the text query result set against the multi-column data store, pivoting that into a global temp table containing column name and value columns, and then just matching the matched words against the global temp table in memory in order to get the column names. One of a number of reasons I want the column names is for the building of an IOT RefData table I can hit before the text index, which will contain known column names and values along with an associated (and configurable) weighting value to give each word when searching the text index. Additionally, if I can build up enough RefData, I may be able to avoid needing to hit the text index in the first place (Can't really properly explain that statement without going into a lot of detail about what I'm doing though).

            I did think about pivoting the entire data set on which I've created the text index (to build a column_name:column_value materialised view). However, based on the nature of the data I'm matching, I'm thinking this may be overkill for what I'm doing.

            Many thanks for your response. It is always interesting and insightful to discuss these issues on here. It would be great if Oracle were to provide a built-in that would return the multi-column datastore column names that were matched though. ;)