Forum Stats

  • 3,825,196 Users
  • 2,260,479 Discussions
  • 7,896,435 Comments

Discussions

printjoins/skipjoin : middle dot

FabriceC
FabriceC Member Posts: 42 Blue Ribbon
edited Nov 21, 2017 10:59AM in Text

Hi,

I'im using Oracle Text (11.0.2.4.)

A context index is created on a clob column, which contain french text.  In these text, there are middle dot (U+00B7)

I want to add middle dot in skipjoin characters :

BEGIN

  CTX_DDL.CREATE_PREFERENCE('LEXER_GPSR_ACI', 'BASIC_LEXER');

  CTX_DDL.SET_ATTRIBUTE('LEXER_GPSR_ACI', 'BASE_LETTER', 'YES');

  CTX_DDL.SET_ATTRIBUTE('LEXER_GPSR_ACI', 'MIXED_CASE', 'NO');

  CTX_DDL.SET_ATTRIBUTE('LEXER_GPSR_ACI', 'BASE_LETTER_TYPE','SPECIFIC');

  CTX_DDL.SET_ATTRIBUTE('LEXER_GPSR_ACI', 'skipjoins','·'); -- middle dot U+00B7

END;

Oracle Text doesn't index correctly, skipjoins is not taken into account: With or without adding skipjoins character, words with middle dot are separated in two word --> abc·def is indexed in abc and def and i want to have abcdef .

Not all characters can be added in skipjoins ??

How can I change this ?

Thanks

Regards

Fabrice

Answers

  • User_QX1CQ
    User_QX1CQ Member Posts: 25 Blue Ribbon
    edited Nov 21, 2017 10:35AM

    I haven't used the lexer for something like this, but here are some ideas:

    Is the keyword 'skipjoin' or 'skipjoins'?  I think you have it right (skipjoins), but I saw one reference singular skipjoin vs. plural.

    Check to make sure your middle dot matches the actual character code in your text.  I would use the actual character code in the lexer skipjoin definition to be explicit.  That may require you to parse your text and report the actual character code of the text's middle dot character.

    Finally, if no luck with above (although it should work), either contact Oracle Support, or as a workaround you could use a filter in the index creation, procedure_filter, that looks for the middle dot character and removes it prior to indexing (although the lexer skipjoins would be preferred):

    https://docs.oracle.com/cd/B28359_01/text.111/b28304/cdatadic.htm#CCREF1957

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Nov 21, 2017 10:32AM

    It is SKIPJOINS, not SKIPJOIN.

    I can reproduce this and there doesn't seem to be a simple workaround. I tried inserting the middle-dot character as an explicit UTF-8 string (0xC2B7) but that didn't help.

    drop table middledot;create table middledot (x varchar2(50));insert into middledot values ('abc'|| chr(to_number('C2B7','xxxx')) ||'def');exec ctx_ddl.drop_preference('mdlexer')exec ctx_ddl.create_preference('mdlexer', 'BASIC_LEXER')exec ctx_ddl.set_attribute('mdlexer', 'SKIPJOINS', chr(to_number('C2B7', 'xxxx')))create index mdi on middledot ( x ) indextype is ctxsys.context parameters('lexer mdlexer');select * from middledot;select token_text from dr$mdi$i;

    I would recommend you call support and ask them to raise a bug so it can be investigated further.

  • FabriceC
    FabriceC Member Posts: 42 Blue Ribbon
    edited Nov 21, 2017 10:59AM

    Hi Roger

    thanks for testing

    I have same problem with other characters (no-breaking hyphen for example...)

    I will contact support ...

    Fabrice

This discussion has been closed.