being concerned with a multilanguage environment - I'm planning to apply the following strategy:
- create a MULTI_LEXER index along with all needed sublexers.
- for each document to index:
- fetch the text using POLICY_FILTER;
- detect the language by means of external (non-Oracle Text) tools.
- index that text using NULL_FILTER and setting the language column, or alternatively:
- compress text through gzip and index it using AUTO_FILTER and proper language setting.
Now, I wonder what the initial policy is used for. I feel that I might use an empty policy (BASIC_LEXER, BASIC_WORDLIST, EMPTY_STOPLIST) getting the same text block as by means of a real policy.
The same should be true also for POLICY_TOKENS.
Actually both procedures require an input language (or NULL), but I guess it should be related to choose a proper lexer, although I still miss how this might influence results.