Forum Stats

  • 3,825,763 Users
  • 2,260,558 Discussions
  • 7,896,662 Comments

Discussions

Spanish stemming in Oracle Text

3505056
3505056 Member Posts: 3
edited Jul 18, 2017 12:09PM in Text

I'm trying to create an Oracle Text index to make FTS queries on some text columns in Spanish in the database.

According to Oracle docs I need to create a LEXER and a WORDLIST to enable stem and fuzzy queries:

    exec ctxsys.ctx_ddl.create_preference ('cust_lexer','BASIC_LEXER');

    exec ctxsys.ctx_ddl.set_attribute ('cust_lexer','base_letter','YES');

    exec ctxsys.ctx_ddl.set_attribute ('cust_lexer','index_stems','SPANISH');

    exec ctxsys.ctx_ddl.create_preference('cust_wordlist','BASIC_WORDLIST');

    exec ctxsys.ctx_ddl.set_attribute('cust_wordlist','stemmer','AUTO');

    exec ctxsys.ctx_ddl.set_attribute('cust_wordlist','fuzzy_match','AUTO');

And then create the index using those preferences:

    CREATE INDEX NOMBREACCION_CTX ON ACCION(NOMBRE_ACCION) INDEXTYPE IS CTXSYS.CONTEXT parameters ('LEXER cust_lexer WORDLIST cust_wordlist');

When I run a query using stem operator ($) I get the following error:

    ORA-20000: Oracle Text error:

    DRG-00100: internal error, arguments : [50935],[drpn.c],[1113],[],[]

    DRG-00100: internal error, arguments : [50935],[drpnw.c],[651],[],[]

    DRG-00100: internal error, arguments : [51002],[drwa.c],[597],[],[]

    DRG-00100: internal error, arguments : [51029],[drwas.c],[498],[ACCION],[]

    DRG-51023: stemmer file cannot be opened

    20000. 00000 -  "%s"

    *Cause:    The stored procedure 'raise_application_error'

               was called which causes this error to be generated. 

    *Action:   Correct the problem as described in the error message or contact

               the application administrator or DBA for more information.

According to Oracle docs stem feature should work for Spanish: http://docs.oracle.com/cd/B28359_01/text.111/b28304/amultlng.htm#CCREF2294

Also, this doesn't seem to be a missing feature in Oracle XE: http://docs.oracle.com/cd/E17781_01/doc.112/e21743/toc.htm#XERDM105

If i change 'SPANISH' for 'ENGLISH' it works OK. Has anyone managed to setup Spanish stemming in Oracle Text?

3505056

Answers

  • Barbara Boehmer
    Barbara Boehmer Member Posts: 4,625 Gold Trophy
    edited Jul 17, 2017 2:39AM

    I believe the relevant portion is:

    DRG-51023: stemmer file cannot be opened

        Cause: Unable to open OS file containing stemmer lexicon, or the file contents were not correct

        Action: Consult documentation for for correct location of the stemmer lexicons and check for file damage

    I believe you can find those files in <oracle_home>\ctx\data.  On my system there is a file <oracle_home>\ctx\data\eslx\dres.is.  Some things are contained in the "examples" file that is downloaded and installed separately from the database installation.  I don't know if this was one of those.  I don't know if the "examples" are compatible with express edition.

    The following demonstrates Spanish stemming:

    [email protected]_12.1.0.2.0> alter session set nls_language = spanish
      2  /

    Session altered.

    [email protected]_12.1.0.2.0> select banner from v$version
      2  /

    BANNER
    --------------------------------------------------------------------------------
    Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
    PL/SQL Release 12.1.0.2.0 - Production
    CORE 12.1.0.2.0 Production
    TNS for 64-bit Windows: Version 12.1.0.2.0 - Production
    NLSRTL Version 12.1.0.2.0 - Production

    5 rows selected.

    [email protected]_12.1.0.2.0> create table accion (nombre_accion varchar2(60))
      2  /

    Table created.

    [email protected]_12.1.0.2.0> insert all
      2  into accion values ('pensar')
      3  into accion values ('pensando')
      4  into accion values ('alguna cosa')
      5  select * from dual
      6  /

    3 rows created.

    [email protected]_12.1.0.2.0> begin
      2    ctxsys.ctx_ddl.create_preference ('cust_lexer','BASIC_LEXER');
      3    ctxsys.ctx_ddl.set_attribute ('cust_lexer','base_letter','YES');
      4    ctxsys.ctx_ddl.set_attribute ('cust_lexer','index_stems','SPANISH');
      5    ctxsys.ctx_ddl.create_preference('cust_wordlist','BASIC_WORDLIST');
      6    ctxsys.ctx_ddl.set_attribute('cust_wordlist','stemmer','AUTO');
      7    ctxsys.ctx_ddl.set_attribute('cust_wordlist','fuzzy_match','AUTO');
      8  end;
      9  /

    PL/SQL procedure successfully completed.

    [email protected]_12.1.0.2.0> CREATE INDEX NOMBREACCION_CTX ON ACCION(NOMBRE_ACCION) INDEXTYPE IS CTXSYS.CONTEXT
      2    parameters ('LEXER cust_lexer WORDLIST cust_wordlist')
      3  /

    Index created.

    [email protected]_12.1.0.2.0> select token_text, token_type from dr$nombreaccion_ctx$i
      2  /

    TOKEN_TEXT                                                       TOKEN_TYPE
    ---------------------------------------------------------------- ----------
    ALGUNA                                                                    0
    ALGUNO                                                                    9
    COSA                                                                      0
    COSER                                                                     9
    PENSANDO                                                                  0
    PENSAR                                                                    0
    PENSAR                                                                    9

    7 rows selected.

    [email protected]_12.1.0.2.0> select * from accion where contains (nombre_accion, '$pensar') > 0
      2  /

    NOMBRE_ACCION
    ------------------------------------------------------------
    pensar
    pensando

    2 rows selected.

    35050563505056
  • 3505056
    3505056 Member Posts: 3
    edited Jul 17, 2017 12:36PM

    Then maybe it is a missing feature of Oracle XE, since I can't find the directory that you mentioned, this is my listing of files in <oracle_home>\ctx\data.:

    total 2400

    drwxr-xr-x. 5 oracle dba    4096 abr 20 16:01 ./

    drwxr-xr-x. 9 oracle dba    4096 abr 20 16:01 ../

    -rwxr-xr-x. 1 oracle dba 1549695 ago 29  2011 drendict.dat*

    -rwxr-xr-x. 1 oracle dba   16295 ago 29  2011 drenrule.dat*

    -rwxr-xr-x. 1 oracle dba  865739 ago 29  2011 drentkdict.dat*

    drwxr-xr-x. 2 oracle dba    4096 abr 20 16:01 enlx/

    drwxr-xr-x. 2 oracle dba    4096 abr 20 16:01 jalx/

    drwxr-xr-x. 2 oracle dba    4096 abr 20 16:01 template/

    It seems that only English and Japanese dictionaries were installed in my system. I'll check if the installer has an option to add missing languages.

  • Roger Ford-Oracle
    Roger Ford-Oracle Member Posts: 1,132 Employee
    edited Jul 17, 2017 1:12PM

    I just tried Barbara's testcase on an 11.2 XE installation.

    I didn't get the error messages, but neither did I get the expected result - I only got "pensar" and not "pensando". And my tokens didn't include the type 9 stemming tokens.

    It does look like XE only installs the English lexicon, at least by default.

    35050563505056
  • 3505056
    3505056 Member Posts: 3
    edited Jul 18, 2017 12:09PM

    I can confirm that only English and Japanese dictionaries are available on a fresh XE installation in both Windows and Linux.

    I think documentation should be updated to reflect this Oracle XE limitations: Database Express Edition Readme - Contents

This discussion has been closed.