being concerned with a multilanguage environment - I'm planning to apply the following strategy:
- create a MULTI_LEXER index along with all needed sublexers.
- for each document to index:
- fetch the text using POLICY_FILTER;
- detect the language by means of external (non-Oracle Text) tools.
- index that text using NULL_FILTER and setting the language column, or alternatively:
- compress text through gzip and index it using AUTO_FILTER and proper language setting.
Now, I wonder what the initial policy is used for. I feel that I might use an empty policy (BASIC_LEXER, BASIC_WORDLIST, EMPTY_STOPLIST) getting the same text block as by means of a real policy.
The same should be true also for POLICY_TOKENS.
Actually both procedures require an input language (or NULL), but I guess it should be related to choose a proper lexer, although I still miss how this might influence results.
LEXER, WORDLIST and STOPLIST settings will have no effect on POLICY_FILTER, so you can use an "empty" policy. The only setting which will have any effect is the FILTER.
For POLICY_TOKENS, I should think all of those settings will influence the output.
Edited by: Roger Ford on Feb 19, 2013 8:03 AM "think of those settings" -> "think all of those settings"