This discussion is archived
4 Replies Latest reply: Jan 15, 2013 2:48 AM by 984246 RSS

Searching a column with comma separated values using ora-text

984246 Newbie
Currently Being Moderated
I am using Oracle 11gR2 XE and Oracle Text for a web search engine.

I have now created & text-indexed a CLOB column Keywords which contains space-separated words. This allowed me to extend the search, as Oracle Text will return the rows that have one or more keywords stored in that column. The contents of the column are hidden from the user, and is used only to "extend" the search. This is working as intended.

But now I need to support multiple words or even complete sentences. With the current configuration, Oracle Text will search only for individual keyword. How do I need to store the phrases and configure Oracle Text so that it will search for whole phrases (exact match is preferred, but fuzzy matching is fine too)?

Column content example of two rows(semi-colon seperated values):
"hello, hello; is there anybody out there?; nope;"
"just the; basic facts;"
I found a similar question: Searching a column with comma seperated values, except that I need a solution for Oracle 11g with it's freetext search functionality.

Possible solutions:
1st solution: I was thinking of redesigning the DB as follows. I'd make a new table Keywords(pkID NUMBER, nonUniqueID NUMBER, singlePhrase VARCHAR2(100 BYTE)). And I'd change the previous column Keyword to KeywordNonUniqueID, which would hold the ID (instead of a list of values). At search-time I'd INNER JOIN with the new Keyword table. The problem with this solution is that I'll get multiple rows that contains the same data except the phrase. I assume this will destroy the ranking?

2nd solution: Is it possible to store phrases as a XML in the original Keyword column, and somehow tell Oracle Text to search within the XML?

3rd solution: Separating the individual phrases with spaces, but replacing the spaces within the phrases with underscore or something (making it a single word). So a phrase "Why hello there, Johnny!" would be stored as "Why_hello_there,_Johnny!".

4th solution: ?

Note that, generally, there won't be a lot of phrases (less than 100), nor will they be long (a single phrase will have up to 5 words).

Also note that I am currently using CONTAINS, and a few of its operators, for my full-text searching needs.
  • 1. Re: Searching a column with comma separated values using ora-text
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    I think you may be searching for a solution to a non-existent problem. Oracle Text using CONTAINS is not limited to searching for just individual words and can be used to search for complete phrases or sentences. All you need to do is put the whole phrase in your search string. However, you need to not include any things that are not indexed and have special meaning, such as commas and question marks. If you are having a problem, please post an example of what you want to search for, what results you want based on that search, and what you are currently getting that is not what you want.
  • 2. Re: Searching a column with comma separated values using ora-text
    984246 Newbie
    Currently Being Moderated
    Barbara Boehmer wrote:
    All you need to do is put the whole phrase in your search string. However, you need to not include any things that are not indexed and have special meaning, such as commas and question marks.
    That looks like a step forward, but how do I give precedence to phrases instead of individual words that are included in a phrase?

    Let's say a row exists with the following phrases:"just and kind, kind & loving"
    And let's say we want to search for the phrase "{just and kind}". The row will be correctly returned.
    But what if we search just for "{kind}"? Then the same row will, again, be returned, even though there is no exact match for a single phrase "kind", only "just and kind" and "kind & loving"
  • 3. Re: Searching a column with comma separated values using ora-text
    Roger Ford Expert
    Currently Being Moderated
    When you mention "phrase", do you mean "a list of words separated from other phrases by a comma"?

    That's not the definition of "phrase" used by Oracle Text, where it simply means "a list of words in a defined order".

    If I understand your requirement, you want to have data like:

    "aa bb cc, dd"
    "aa dd, ee ff"

    and prioritize the first over the second if someone searches for "dd".

    Firstly, in order to search within the comma separated list of words, you would need to search within a section. You can either explictly define field sections such as
    <sect>aa bb cc</sect> <sect>dd</sect>
    Or you can use the special section SENTENCE and define sentence delimiters appropriately. This is done with the BASIC_LEXER attribute PUNCTUATIONS

    Then you have the issue that you only want to find words where they are the only words in the section. That's much the same problem as I address in the last post of this forum entry:
    Contains: exactly match

    So our solution will be much the same, of surrounding the text with special markers, and then prioritizing a phrase search with those special markers either side of the word.
    We need to do some extra processing, though, as we need to surround each "phrase" (in your terminology) with the special markers. I've done this by surrounding the entire text with "XX1 ... XX2" then replacing each comma with "XX2, XX1" as part of a MULTI_COLUMN_DATASTORE:

    drop table names;
    create table names (id number primary key, text varchar2(50));
    
    insert into names values( 1, 'just and kind, kind and loving' );
    insert into names values( 2, 'just, kind' );
    
    exec ctx_ddl.drop_preference  ( 'mylex' )
    exec ctx_ddl.create_preference( 'mylex', 'BASIC_LEXER' )
    exec ctx_ddl.set_attribute    ( 'mylex', 'PUNCTUATIONS', ',' )
    
    exec ctx_ddl.drop_preference  ( 'mcds' )
    exec ctx_ddl.create_preference( 'mcds',  'MULTI_COLUMN_DATASTORE' )
    exec ctx_ddl.set_attribute    ( 'mcds', 'COLUMNS', '''XX1 ''||replace(text, '','',''XX2, XX1'')||'' XX2''' )
    
    exec ctx_ddl.drop_preference  ( 'mywl' )
    exec ctx_ddl.create_preference( 'mywl', 'BASIC_WORDLIST' )
    exec ctx_ddl.set_attribute    ( 'mywl', 'SUBSTRING_INDEX', 'YES' )
    
    create index namesindex on names(text)
    indextype is ctxsys.context
    parameters( 'datastore mcds wordlist mywl' )
    /
    
    select score(1),id,text from names where contains( text, '
    <query>
      <textquery>
        <progression>
          <seq> XX1 kind XX2 </seq>
          <seq> kind</seq>
        </progression>
      </textquery>
    </query>
    ', 1) > 0 
    order by score(1) desc
    /
    Output of this is:
      SCORE(1)        ID TEXT
    ---------- ---------- --------------------------------------------------
         52         2 just, kind
          2         1 just and kind, kind and loving
  • 4. Re: Searching a column with comma separated values using ora-text
    984246 Newbie
    Currently Being Moderated
    Roger Ford wrote:
    When you mention "phrase", do you mean "a list of words separated from other phrases by a comma"?
    Correct.

    Your solution is pretty much what I need, thanks! I'll try it out.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points