Forum Stats

  • 3,825,765 Users
  • 2,260,558 Discussions
  • 7,896,665 Comments

Discussions

How can you boost the score for exact matches?

1053685
1053685 Member Posts: 19
edited Aug 3, 2017 10:00AM in Text

Hi,

I have a quick question that hopefully has a simple (and positive) answer. I could not find anything helpful in the documentation. I am trying to move "exact matches" to the top of the search results, which are of course sorted by score. So I'm looking for a ways to boost the score for "exact matches". What do I mean by "exact matches"? I mean that the search terms/phrases match the entire indexed string and not just part of it. When searching for "embryogenesis", for instance, a document with the title "embryogenesis" should be considered an exact match, but a document titled "Environmental magnetic fields: influences on early embryogenesis" should not. However, I would still like to take advantage of Oracle Text's stemming, base letter conversion and whatnot, so a simple additional query for UPPER(title) = UPPER(query) is out of the question. Is there a way to adjust the score accordingly?

Any help is greatly appreciated. Thanks in advance.

Best Answer

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Jul 24, 2017 12:00PM Answer ✓

    No simple way to do this. Oracle Text does not mark the beginning nor the end of a text, so it can't tell the difference between a "complete" match and a partial match on words in the middle.  You would have to modify your indexed text - perhaps by a USER_DATASTORE or a MULTI_COLUMN_DATASTORE to add markers at the start and end of the text, then search for the these markers as part of a phrase search.  See the following example, which wraps the text with tags xxstart and xxend, then in the search uses a score multiplier to score higher for the term if it's a complete match:

    SQL> create table example(text varchar2(255));Table created.SQL> insert into example values ('embryogenesis');1 row created.SQL> insert into example values ('Environmental magnetic fields: influences on early embryogenesis');1 row created.SQL> exec ctx_ddl.create_preference('myds', 'MULTI_COLUMN_DATASTORE')PL/SQL procedure successfully completed.SQL> exec ctx_ddl.set_attribute('myds', 'COLUMNS', '''xxstart ''||text||'' xxend''')PL/SQL procedure successfully completed.SQL> create index exampleindex on example(text)  2  indextype is ctxsys.context  3  parameters('datastore myds');Index created.SQL> select score(0),text from example where contains(text, '(xxstart embryogenesis xxend)*2 OR embryogenesis', 0) > 0;  SCORE(0) TEXT---------- --------------------------------------------------    6      embryogenesis    3      Environmental magnetic fields: influences on early embryogenesis

Answers

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Jul 24, 2017 12:00PM Answer ✓

    No simple way to do this. Oracle Text does not mark the beginning nor the end of a text, so it can't tell the difference between a "complete" match and a partial match on words in the middle.  You would have to modify your indexed text - perhaps by a USER_DATASTORE or a MULTI_COLUMN_DATASTORE to add markers at the start and end of the text, then search for the these markers as part of a phrase search.  See the following example, which wraps the text with tags xxstart and xxend, then in the search uses a score multiplier to score higher for the term if it's a complete match:

    SQL> create table example(text varchar2(255));Table created.SQL> insert into example values ('embryogenesis');1 row created.SQL> insert into example values ('Environmental magnetic fields: influences on early embryogenesis');1 row created.SQL> exec ctx_ddl.create_preference('myds', 'MULTI_COLUMN_DATASTORE')PL/SQL procedure successfully completed.SQL> exec ctx_ddl.set_attribute('myds', 'COLUMNS', '''xxstart ''||text||'' xxend''')PL/SQL procedure successfully completed.SQL> create index exampleindex on example(text)  2  indextype is ctxsys.context  3  parameters('datastore myds');Index created.SQL> select score(0),text from example where contains(text, '(xxstart embryogenesis xxend)*2 OR embryogenesis', 0) > 0;  SCORE(0) TEXT---------- --------------------------------------------------    6      embryogenesis    3      Environmental magnetic fields: influences on early embryogenesis
  • 1053685
    1053685 Member Posts: 19
    edited Aug 2, 2017 6:35AM

    Hello again,

    I ran into a problem with that solution. We noticed during testing that for some queries EVERY record that consists of a single word is returned, even if it has nothing to do with the entered query. I found out that this happens, for instance, when the word "of" is in the query. I suspect the problem is that "of" is a stopword. But I don't fully understand why Oracle behaves that way. So I did some testing. When searching for "{xxstart of xxend}", indeed every record consisting of "xxstart <any word here> xxend" is returned. However, "{xxstart xxend}" returns nothing, as it should. Why is this happening? How exactly do the stopwords factor into the equation? And more importantly, how can I fix this?

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Aug 2, 2017 7:15AM

    A stopword in a query matches any word - stop word or not. So:

    xxstart stopword xxend

    matches "xxstart followed by any word followed by xxend".

    It does NOT match xxstart xxend as the word position numbers don't work - there must be a word between the start and end tokens.

  • 1053685
    1053685 Member Posts: 19
    edited Aug 3, 2017 10:00AM

    Alright, thank you very much for your help. I fixed it by eliminating exact match queries that consist solely of stopwords. Now it's working as expected.

This discussion has been closed.