Forum Stats

  • 3,815,708 Users
  • 2,259,070 Discussions
  • 7,893,213 Comments

Discussions

Query relaxation with wildcard and SYN function

dmak2709
dmak2709 Member Posts: 2 Green Ribbon
edited Apr 21, 2020 7:24AM in Text

Hi,

I found some trouble with setting up the query relaxation with SYN and wildcards in Oracle Text.

First let me show how the table and index was created:

BEGIN   /*   BEGIN        ctx_ddl.drop_preference('INVENTAR_DEV_LEXER');        ctx_ddl.drop_preference('INV_DEV_WORDLIST');   END;   */      /**    LEXER   */   ctx_ddl.create_preference('INVENTAR_DEV_LEXER','BASIC_LEXER');      ctx_ddl.set_attribute('INVENTAR_DEV_LEXER', 'numgroup',',');   ctx_ddl.set_attribute('INVENTAR_DEV_LEXER', 'numjoin','.');   ctx_ddl.set_attribute('INVENTAR_DEV_LEXER', 'printjoins','.-_%:;/,()?!*+');   /**    FUZZY   */   ctx_ddl.create_preference('INV_DEV_WORDLIST', 'BASIC_WORDLIST');   ctx_ddl.set_attribute('INV_DEV_WORDLIST','FUZZY_MATCH','GENERIC');   ctx_ddl.set_attribute('INV_DEV_WORDLIST','FUZZY_SCORE','70');   ctx_ddl.set_attribute('INV_DEV_WORDLIST','FUZZY_NUMRESULTS','10');        ctx_ddl.set_attribute('INV_DEV_WORDLIST','SUBSTRING_INDEX','FALSE');   ctx_ddl.set_attribute('INV_DEV_WORDLIST','STEMMER','NULL');   ctx_ddl.set_attribute('INV_DEV_WORDLIST','PREFIX_INDEX','TRUE');   ctx_ddl.set_attribute('INV_DEV_WORDLIST','PREFIX_MIN_LENGTH',3);   ctx_ddl.set_attribute('INV_DEV_WORDLIST','PREFIX_MAX_LENGTH',7);   /**    THESAURUS   */   ctx_thes.create_thesaurus('inventar_thes', FALSE);   CTX_THES.CREATE_RELATION('inventar_thes','el','SYN','electric');END;/** TABLE*/CREATE TABLE inventar_dev (id VARCHAR2(16), name VARCHAR2(255));/** INDEX*/CREATE INDEX inventar_dev_idx ON inventar_dev(name)        INDEXTYPE IS CTXSYS.CONTEXT        PARAMETERS (            'DATASTORE CTXSYS.DEFAULT_DATASTORE FILTER CTXSYS.NULL_FILTER LEXER INVENTAR_DEV_LEXER WORDLIST  INV_DEV_WORDLIST STOPLIST CTXSYS.EMPTY_STOPLIST'        );

And know when I run the following query, it works fine (I left only one <seq>).

SELECT /*+ FIRST_ROWS(150) */     i.nameFROM inventar_dev i                WHERE contains(i.name,             '<query>                 <textquery grammar="context">                    <progression>                        <seq>SYN({el}, inventar_thes)</seq>                    </progression>                </textquery>                <score datatype="FLOAT" algorithm="DEFAULT"/>                <order>                    <orderkey> Score DESC </orderkey>                </order>            </query>', 1) > 0;

But if I add one word in the <seq> as follow:

SELECT /*+ FIRST_ROWS(150) */OPISFROM david.inventar_dev i                WHERE contains(i.opis,             '<query>                 <textquery grammar="context">                    <progression>                        <seq>SYN({el}, inventar_thes) {kos}</seq>                    </progression>                </textquery>                <score datatype="FLOAT" algorithm="DEFAULT"/>                <order>                    <orderkey> Score DESC </orderkey>                </order>            </query>', 1) > 0;

I throws an exception:

ORA-29902: error in executing ODCIIndexStart() routine

ORA-20000: Oracle Text error:

DRG-50920: part of phrase not itself a phrase or equivalence

29902. 00000 -  "error in executing ODCIIndexStart() routine"

*Cause:    The execution of ODCIIndexStart routine caused an error.

*Action:   Examine the error messages produced by the indextype code and

           take appropriate action.

Also % doesn't seems to work:

SELECT /*+ FIRST_ROWS(150) */OPISFROM david.inventar_dev i                WHERE contains(i.opis,             '<query>                 <textquery grammar="context">                    <progression>                        <seq>{el}%</seq>                    </progression>                </textquery>                <score datatype="FLOAT" algorithm="DEFAULT"/>                <order>                    <orderkey> Score DESC </orderkey>                </order>            </query>', 1) > 0;

Update

I realized that SYN({something}, thes) doesn't work when there are multiple words devided by spaces. So you need to add an operator between those words.

The query works with the SYN if you remove the foolowing line from the text:

<span class="pun"><</span><span class="pln">seq</span><span class="pun">><</span><span class="pln">rewrite</span><span class="pun">></span><span class="pln">transform</span><span class="pun">((</span><span class="pln">TOKENS</span><span class="pun">,</span><span class="pln"> </span><span class="str">"FUZZY(SYN({"</span><span class="pun">,</span><span class="pln"> </span><span class="str">"}, inv_thes), 70, 10, weight)"</span><span class="pun">,</span><span class="pln"> </span><span class="str">" "</span><span class="pun">))</</span><span class="pln">rewrite</span><span class="pun">></</span><span class="pln">seq</span><span class="pun">></span><span class="pln"> </span>

But I'm still not sure what could be the reason.

In fact I have also removed the TRANSFORM and REWRITE function and wrote my custom query parsing function.