Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

HTML content not handled as expected

GB_CHUVJun 30 2020 — edited Jun 30 2020

Dear community,

My document is an HTML document but its content is splitted accross multiple lines in a table.

Most often, html tags are found in my text content, but sometimes the text does not have any html tag but only encoding characters (mostly for accented characters).

For exemple: "Coronarographie élective" (that stands for "Coronarographie élective").

It appears that my index is dealing with it as if it was not HTML.

Therefore, the following CONTAINS query is unable to find this record: CONTAINS('PARAGRAPH_CONTENT','élective') > 0

However, If I have an opening and closing html tag ("Coronarographie &#233;lective" becoming "<html>Coronarographie &#233;lective</html>"), then it is handled correctly by the index.

Is there a way to force this content to be treated as HTML ?

Thanks for the help.

TEST CODE:

CREATE TABLE MyTEST (PARAGRAPH_CONTENT CLOB); 

INSERT INTO MyTEST VALUES ('Coronarographie &#233;lective');

INSERT INTO MyTEST VALUES ('<html>Coronarographie &#233;lective</html>'); 

EXEC CTX_DDL.CREATE_PREFERENCE('TEST_HTML_LXR', 'BASIC_LEXER') 

exec CTX_DDL.SET_ATTRIBUTE('TEST_HTML_LXR', 'PRINTJOINS', '_') 

CREATE INDEX TEST_HTML_IDX on MyTEST(PARAGRAPH_CONTENT)

INDEXTYPE is ctxsys.context    

PARAMETERS ('        

datastore       CTXSYS.DEFAULT_DATASTORE        

filter          CTXSYS.NULL_FILTER        

lexer           TEST_HTML_LXR        

section group   CTXSYS.HTML_SECTION_GROUP        

')

parallel 16; 

SELECT * FROM MyTEST WHERE CONTAINS(PARAGRAPH_CONTENT,'Coronarographie')>0; --> 2 lines

SELECT * FROM MyTEST WHERE CONTAINS(PARAGRAPH_CONTENT,'Coronarographie AND élective')>0; --> 1 line 

EXEC CTX_DDL.DROP_PREFERENCE('TEST_HTML_LXR')

DROP INDEX TEST_HTML_IDX;

DROP TABLE MyTEST;  

EDIT 14:35: My first exemple was having additionnal specifities. Made the example simpler and the test code better.

Comments

Post Details

Added on Jun 30 2020
0 comments
112 views