Forum Stats

  • 3,824,772 Users
  • 2,260,416 Discussions
  • 7,896,309 Comments

Discussions

Oracle Text Stemming question

spdpw1
spdpw1 Member Posts: 2
edited Jul 25, 2017 9:00PM in Text

I have an Oracle text question regarding stemming.  When I search for the term INSPECTOR with the stemming character, where contains(pcs.srch_string,'${INSPECTOR}',1) > 0  I get a bunch of rows returned with the word INS in it.  I'm puzzled by the correlation between the two as it relates to word stemming.

Anyone have any ideas on that?

Thanx.

Answers

  • Barbara Boehmer
    Barbara Boehmer Member Posts: 4,625 Gold Trophy
    edited Jul 25, 2017 9:00PM

    Are you using a lexer with index_stems?  The following demonstrates the results including INS without such a lexer, then the results without INS using a lexer with index_stems, which also produces the type 9 tokens.

    [email protected]_12.1.0.2.0> -- test environment:
    [email protected]_12.1.0.2.0> select banner from v$version
      2  /

    BANNER
    --------------------------------------------------------------------------------
    Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
    PL/SQL Release 12.1.0.2.0 - Production
    CORE 12.1.0.2.0 Production
    TNS for 64-bit Windows: Version 12.1.0.2.0 - Production
    NLSRTL Version 12.1.0.2.0 - Production

    5 rows selected.

    [email protected]_12.1.0.2.0> create table pcs (srch_string varchar2(30))
      2  /

    Table created.

    [email protected]_12.1.0.2.0> insert all
      2  into pcs values ('INSPECTOR')
      3  into pcs values ('INSPECTORS')
      4  into pcs values ('INS')
      5  into pcs values ('SOMETHING')
      6  select * from dual
      7  /

    4 rows created.

    [email protected]_12.1.0.2.0> -- without lexer using index_stems:
    [email protected]_12.1.0.2.0> create index test_idx on pcs (srch_string) indextype is ctxsys.context
      2  /

    Index created.

    [email protected]_12.1.0.2.0> select token_type, token_text from dr$test_idx$i
      2  /

    TOKEN_TYPE TOKEN_TEXT
    ---------- ----------------------------------------------------------------
            0 INS
            0 INSPECTOR
            0 INSPECTORS
            0 SOMETHING

    4 rows selected.

    [email protected]_12.1.0.2.0> select * from pcs
      2  where  contains(pcs.srch_string,'${INSPECTOR}',1) > 0
      3  /

    SRCH_STRING
    ------------------------------
    INSPECTOR
    INSPECTORS
    INS

    3 rows selected.

    [email protected]_12.1.0.2.0> -- with lexer using index_stems:
    [email protected]_12.1.0.2.0> drop index test_idx
      2  /

    Index dropped.

    [email protected]_12.1.0.2.0> begin
      2    ctxsys.ctx_ddl.create_preference ('test_lex', 'basic_lexer');
      3    ctxsys.ctx_ddl.set_attribute ('test_lex', 'index_stems', 'ENGLISH');
      4  end;
      5  /

    PL/SQL procedure successfully completed.

    [email protected]_12.1.0.2.0> create index test_idx on pcs (srch_string) indextype is ctxsys.context
      2    parameters ('lexer test_lex')
      3  /

    Index created.

    [email protected]_12.1.0.2.0> select token_type, token_text from dr$test_idx$i
      2  /

    TOKEN_TYPE TOKEN_TEXT
    ---------- ----------------------------------------------------------------
            0 INS
            0 INSPECTOR
            9 INSPECTOR
            0 INSPECTORS
            0 SOMETHING

    5 rows selected.

    [email protected]_12.1.0.2.0> select * from pcs
      2  where  contains(pcs.srch_string,'${INSPECTOR}',1) > 0
      3  /

    SRCH_STRING
    ------------------------------
    INSPECTOR
    INSPECTORS

    2 rows selected.

This discussion has been closed.