Oracle Database Discussions

1 error has occurred

Your session has timed out.

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Text Index Lexer

Tom ClaffyJun 28 2019 — edited Jun 28 2019

I am testing text search for implementation on our Windows application by creating and querying some data through Developer that I loaded from actual documents from our system . Although I have years of query experience in SQL both MSS and Oracle, this full text indexing is completely new to me. We are currently storing some file attachments in NCLOB as a base64 string which works since the data can be any type of file. Going forward we plan to use a varbinary(max) and a blob. MSSQL is fairly straight forward on this and detects the language in my pdf stored in the database and returns the results that I expect for Arabic, Chinese, English, & French. Oracle is returning results also but I am confused about the lexer that I seem to need. If I create the index with the world or auto lexer, I only get results for single byte languages when querying in the language of the document with a known word in the document; If I use the Chinese lexer, I am getting results for all 4 languages. Note that I have tried this with and without a language and charset column specified in the index parameters and I seem to be getting the correct results without these columns.

I expected the name of "World" implied a larger character set than "Chinese". Is this result from the Chinese lexer expected or am I doing some wrong with the World lexer?

exec ctx_ddl.create_preference('MYLEXER', 'world_lexer');

-- RETURNS ONLY ENGLISH AND FRENCH RESULTS

CREATE INDEX my_docs_doc_idx ON my_docs(doc)

INDEXTYPE IS CTXSYS.CONTEXT

parameters( 'LEXER MYLEXER');

exec ctx_ddl.create_preference('CHINESE', 'CHINESE_LEXER');

-- RETURNS ARABIC, CHINESE, ENGLISH AND FRENCH RESULTS

CREATE INDEX my_docs_doc_idx ON my_docs(doc)

INDEXTYPE IS CTXSYS.CONTEXT

parameters( 'LEXER CHINESE');

Added on Jun 28 2019

#general-database-discussions, #text

0 comments

314 views

Oracle Database Discussions

Text Index Lexer

Comments

Post Details