Discussions
Categories
- 17.9K All Categories
- 3.4K Industry Applications
- 3.3K Intelligent Advisor
- 63 Insurance
- 535.7K On-Premises Infrastructure
- 138.1K Analytics Software
- 38.6K Application Development Software
- 5.6K Cloud Platform
- 109.3K Database Software
- 17.5K Enterprise Manager
- 8.8K Hardware
- 71K Infrastructure Software
- 105.2K Integration
- 41.5K Security Software
HTML content not handled as expected
Dear community,
My document is an HTML document but its content is splitted accross multiple lines in a table.
Most often, html tags are found in my text content, but sometimes the text does not have any html tag but only encoding characters (mostly for accented characters).
For exemple: "Coronarographie élective" (that stands for "Coronarographie élective").
It appears that my index is dealing with it as if it was not HTML.
Therefore, the following CONTAINS query is unable to find this record: CONTAINS('PARAGRAPH_CONTENT','élective') > 0
However, If I have an opening and closing html tag ("Coronarographie élective" becoming "<html>Coronarographie élective</html>"), then it is handled correctly by the index.
Is there a way to force this content to be treated as HTML ?
Thanks for the help.
TEST CODE:
CREATE TABLE MyTEST (PARAGRAPH_CONTENT CLOB); INSERT INTO MyTEST VALUES ('Coronarographie élective'); INSERT INTO MyTEST VALUES ('<html>Coronarographie élective</html>'); EXEC CTX_DDL.CREATE_PREFERENCE('TEST_HTML_LXR', 'BASIC_LEXER') exec CTX_DDL.SET_ATTRIBUTE('TEST_HTML_LXR', 'PRINTJOINS', '_') CREATE INDEX TEST_HTML_IDX on MyTEST(PARAGRAPH_CONTENT) INDEXTYPE is ctxsys.context PARAMETERS (' datastore CTXSYS.DEFAULT_DATASTORE filter CTXSYS.NULL_FILTER lexer TEST_HTML_LXR section group CTXSYS.HTML_SECTION_GROUP ') parallel 16; SELECT * FROM MyTEST WHERE CONTAINS(PARAGRAPH_CONTENT,'Coronarographie')>0; --> 2 lines SELECT * FROM MyTEST WHERE CONTAINS(PARAGRAPH_CONTENT,'Coronarographie AND élective')>0; --> 1 line EXEC CTX_DDL.DROP_PREFERENCE('TEST_HTML_LXR')DROP INDEX TEST_HTML_IDX;DROP TABLE MyTEST;
EDIT 14:35: My first exemple was having additionnal specifities. Made the example simpler and the test code better.