Forum Stats

  • 3,814,004 Users
  • 2,258,806 Discussions
  • 7,892,488 Comments

Discussions

Token indexed, but CONTAINS cannot retrieve it

DannyS-Oracle
DannyS-Oracle Member Posts: 165
edited Feb 28, 2018 6:03PM in Text

Hi everyone,

I am building a CONTEXT index for my Apex app, with a prinjoin character '-' (hyphen) added within it. After building the index, the DR$INDEX_NAME$I table has successfully indexed all tokens containing '-', for example 'WEB-SITE', 'PSEUDO-SCIENCE'; but whenever I searched the tokens with the CONTAINS, my query result returns nothing. Any suggestions on how I can debug or fix the index?

select count(*)from   MDT_BASEwhere  contains(MDT_OTSEARCH_COLS, 'web-site')>0

The above search returns 'no data found.'

Since I can not share the sensitive table data, here is the description of the index:

CEMS_INDEX_SEARCH

=== INDEX DESCRIPTION ===

name: "CHACE"."CEMS_INDEX_SEARCH" index id: 69035 index type: context base table: "CHACE"."MDT_BASE" primary key column: MDT_INDEX text column: MDT_OTSEARCH_COLS text column type: VARCHAR2(1) language column: format column: charset column: configuration column: Query Stats Enabled: NO status: INDEXED full optimize token: full optimize count: docid count: 17851 nextid: 17852

=== INDEX OBJECTS ===

datastore: MULTI_COLUMN_DATASTORE columns: FISCAL_QUARTER_NAME, APPROVAL_ID, APPROVAL_VERSION, END_USER_NAME, APPROVAL_ITEM, DISCUSSION_BY, DEAL_STATUS, REGION, Q_DEAL_ID, APPROVAL_ITEM_ACTION, TAGS, DAS_TAGS filter: N,N,N,N,N,N,N,N,N,N,N,N filter: NULL_FILTER section group: BASIC_SECTION_GROUP field section: SECTION_TAGS section tag: TAGS visible: Y field id: 16 lexer: BASIC_LEXER printjoins: - startjoins: # mixed_case: NO index_themes: YES index_stems: ENGLISH wordlist: BASIC_WORDLIST stemmer: ENGLISH fuzzy_match: GENERIC storage: BASIC_STORAGE r_table_clause: lob (data) store as (cache) i_index_clause: compress 2

I cut the portion of BASIC_STOPLIST because it is too long.

Please let me know if you have any suggestions or questions. I am using Oracle DB v12.

Tagged:
DannyS-Oracle

Best Answer

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Feb 28, 2018 4:30PM Answer ✓

    As well as defining it as a PRINTJOINS character, you must escape it in the query :  web\-site or {web-site}

    If you don't, then the '-' character acts as the MINUS operator, and subtracts the score of 'site' from the score of 'web' (neither of which match since the indexed term is actually 'web-site').

    Roger

    DannyS-Oracle

Answers

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Feb 28, 2018 4:30PM Answer ✓

    As well as defining it as a PRINTJOINS character, you must escape it in the query :  web\-site or {web-site}

    If you don't, then the '-' character acts as the MINUS operator, and subtracts the score of 'site' from the score of 'web' (neither of which match since the indexed term is actually 'web-site').

    Roger

    DannyS-Oracle
  • DannyS-Oracle
    DannyS-Oracle Member Posts: 165
    edited Feb 28, 2018 5:34PM

    Hi @Roger Ford-Oracle, thank you for the suggestion! Adding brackets around the term to escape the '-' is working

    As a follow-up, do you know how to enable partial search? When I am searching for '{web-site}', it works. But when I am searching for '{web}', it does not return the 'web-site'. Is there a way to tell my lexer to tokenize both web-site and also break it into web and site?

    I am trying to make a search page for my user, and I do not want them to know too many caveats about Oracle Text, they should be able to get results with simple keywords. For example, I imagine that when a user search for 'pseudo', the query results should also return 'pseudo-science', because my user might be interested to search for all documents that contain 'pseudo'.  Asking them to enter the keywords completely will be too much I guess ... Let me know if you have any other suggestions?

    UPDATE:

    Instead of enclosing the bracket with {}, I am enclosing the terms with wildcard character '%%'. Now searching for '%web%' will also return 'web-site'. For now, it seems this is the way to go.

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Feb 28, 2018 6:03PM

    You can't have it indexed as both web-site and web site (at least, not without significant effort) but if you don't define hypen as a PRINTJOINS character, then it will be indexed as two words, "web" and "site".

    Then a search for any of these will find it:

    web

    site

    {web-site}

    web site

    Oracle Text search syntax isn't really designed for the end user - you usually need to do some processing on the user search term.  You might want to look at this blog entry: https://blogs.oracle.com/searchtech/oracle-text-query-parser

    DannyS-OracleDannyS-Oracle
This discussion has been closed.