Discussions
Categories
- 17.9K All Categories
- 3.4K Industry Applications
- 3.3K Intelligent Advisor
- 62 Insurance
- 536K On-Premises Infrastructure
- 138.2K Analytics Software
- 38.6K Application Development Software
- 5.7K Cloud Platform
- 109.4K Database Software
- 17.5K Enterprise Manager
- 8.8K Hardware
- 71.1K Infrastructure Software
- 105.2K Integration
- 41.5K Security Software
Oracle Text Stemming question
I have an Oracle text question regarding stemming. When I search for the term INSPECTOR with the stemming character, where contains(pcs.srch_string,'${INSPECTOR}',1) > 0 I get a bunch of rows returned with the word INS in it. I'm puzzled by the correlation between the two as it relates to word stemming.
Anyone have any ideas on that?
Thanx.
Answers
-
Are you using a lexer with index_stems? The following demonstrates the results including INS without such a lexer, then the results without INS using a lexer with index_stems, which also produces the type 9 tokens.
[email protected]_12.1.0.2.0> -- test environment:
[email protected]_12.1.0.2.0> select banner from v$version
2 /BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
PL/SQL Release 12.1.0.2.0 - Production
CORE 12.1.0.2.0 Production
TNS for 64-bit Windows: Version 12.1.0.2.0 - Production
NLSRTL Version 12.1.0.2.0 - Production5 rows selected.
[email protected]_12.1.0.2.0> create table pcs (srch_string varchar2(30))
2 /Table created.
[email protected]_12.1.0.2.0> insert all
2 into pcs values ('INSPECTOR')
3 into pcs values ('INSPECTORS')
4 into pcs values ('INS')
5 into pcs values ('SOMETHING')
6 select * from dual
7 /4 rows created.
[email protected]_12.1.0.2.0> -- without lexer using index_stems:
[email protected]_12.1.0.2.0> create index test_idx on pcs (srch_string) indextype is ctxsys.context
2 /Index created.
[email protected]_12.1.0.2.0> select token_type, token_text from dr$test_idx$i
2 /TOKEN_TYPE TOKEN_TEXT
---------- ----------------------------------------------------------------
0 INS
0 INSPECTOR
0 INSPECTORS
0 SOMETHING4 rows selected.
[email protected]_12.1.0.2.0> select * from pcs
2 where contains(pcs.srch_string,'${INSPECTOR}',1) > 0
3 /SRCH_STRING
------------------------------
INSPECTOR
INSPECTORS
INS3 rows selected.
[email protected]_12.1.0.2.0> -- with lexer using index_stems:
[email protected]_12.1.0.2.0> drop index test_idx
2 /Index dropped.
[email protected]_12.1.0.2.0> begin
2 ctxsys.ctx_ddl.create_preference ('test_lex', 'basic_lexer');
3 ctxsys.ctx_ddl.set_attribute ('test_lex', 'index_stems', 'ENGLISH');
4 end;
5 /PL/SQL procedure successfully completed.
[email protected]_12.1.0.2.0> create index test_idx on pcs (srch_string) indextype is ctxsys.context
2 parameters ('lexer test_lex')
3 /Index created.
[email protected]_12.1.0.2.0> select token_type, token_text from dr$test_idx$i
2 /TOKEN_TYPE TOKEN_TEXT
---------- ----------------------------------------------------------------
0 INS
0 INSPECTOR
9 INSPECTOR
0 INSPECTORS
0 SOMETHING5 rows selected.
[email protected]_12.1.0.2.0> select * from pcs
2 where contains(pcs.srch_string,'${INSPECTOR}',1) > 0
3 /SRCH_STRING
------------------------------
INSPECTOR
INSPECTORS2 rows selected.