Discussions
Categories
- 17.9K All Categories
- 3.4K Industry Applications
- 3.3K Intelligent Advisor
- 62 Insurance
- 536K On-Premises Infrastructure
- 138.2K Analytics Software
- 38.6K Application Development Software
- 5.7K Cloud Platform
- 109.4K Database Software
- 17.5K Enterprise Manager
- 8.8K Hardware
- 71.1K Infrastructure Software
- 105.2K Integration
- 41.5K Security Software
printjoins/skipjoin : middle dot
Hi,
I'im using Oracle Text (11.0.2.4.)
A context index is created on a clob column, which contain french text. In these text, there are middle dot (U+00B7)
I want to add middle dot in skipjoin characters :
BEGIN
CTX_DDL.CREATE_PREFERENCE('LEXER_GPSR_ACI', 'BASIC_LEXER');
CTX_DDL.SET_ATTRIBUTE('LEXER_GPSR_ACI', 'BASE_LETTER', 'YES');
CTX_DDL.SET_ATTRIBUTE('LEXER_GPSR_ACI', 'MIXED_CASE', 'NO');
CTX_DDL.SET_ATTRIBUTE('LEXER_GPSR_ACI', 'BASE_LETTER_TYPE','SPECIFIC');
CTX_DDL.SET_ATTRIBUTE('LEXER_GPSR_ACI', 'skipjoins','·'); -- middle dot U+00B7
END;
Oracle Text doesn't index correctly, skipjoins is not taken into account: With or without adding skipjoins character, words with middle dot are separated in two word --> abc·def is indexed in abc and def and i want to have abcdef .
Not all characters can be added in skipjoins ??
How can I change this ?
Thanks
Regards
Fabrice
Answers
-
I haven't used the lexer for something like this, but here are some ideas:
Is the keyword 'skipjoin' or 'skipjoins'? I think you have it right (skipjoins), but I saw one reference singular skipjoin vs. plural.
Check to make sure your middle dot matches the actual character code in your text. I would use the actual character code in the lexer skipjoin definition to be explicit. That may require you to parse your text and report the actual character code of the text's middle dot character.
Finally, if no luck with above (although it should work), either contact Oracle Support, or as a workaround you could use a filter in the index creation, procedure_filter, that looks for the middle dot character and removes it prior to indexing (although the lexer skipjoins would be preferred):
https://docs.oracle.com/cd/B28359_01/text.111/b28304/cdatadic.htm#CCREF1957
-
It is SKIPJOINS, not SKIPJOIN.
I can reproduce this and there doesn't seem to be a simple workaround. I tried inserting the middle-dot character as an explicit UTF-8 string (0xC2B7) but that didn't help.
drop table middledot;create table middledot (x varchar2(50));insert into middledot values ('abc'|| chr(to_number('C2B7','xxxx')) ||'def');exec ctx_ddl.drop_preference('mdlexer')exec ctx_ddl.create_preference('mdlexer', 'BASIC_LEXER')exec ctx_ddl.set_attribute('mdlexer', 'SKIPJOINS', chr(to_number('C2B7', 'xxxx')))create index mdi on middledot ( x ) indextype is ctxsys.context parameters('lexer mdlexer');select * from middledot;select token_text from dr$mdi$i;
I would recommend you call support and ask them to raise a bug so it can be investigated further.
-
Hi Roger
thanks for testing
I have same problem with other characters (no-breaking hyphen for example...)
I will contact support ...
Fabrice