This content has been marked as final. Show 3 replies
Have you seen the release notes for Commerce 3.1.1? The biggest Endeca change is the inclusion of "Oracle Language Technology" (OLT) language analysis, including for Japanese (Kanji, Hiragana, Romaji and Katakana (full- and half-width)). One of the features of OLT is that stemming is generated dynamically using programmatic rules, along with tokenization, segmentation, decompounding etc. The downside is that some Endeca features aren't supported for data indexed using OLT instead of regular Latin-1, so if you need wildcard search, phrase search, search characters or diacritic folding you'll need to run two indexes.
You can find out more in Chapter 16 here: http://docs.oracle.com/cd/E38682_01/MDEX.640/pdf/AdvDevGuide.pdf .
Hi Michael, thank you very much for the answer. I understand the manual behaviour, and let me add some background of the question:
As for 1), for previous version, Ngram is considered to be used to spread the range of hit word. In OLT, do we have similar concept or parameter to make the hit word increase ?
As for 2), for Japanese we use Dynamic Stemming, so is there any way to customize the dictionary like static case ? Is that impossible ?
For (1), if I've understood the new functionality correctly, I don't think it should be necessary to use NGRAM at all as you don't use wildcarding (assuming this is going with the old approach of segmenting characters and using wildcarding to get around lack of whitespace in Japanese).
For (2), yes, you can add an "auxiliary dictionary" (see the documentation for details). I think you should be able to use thesaurus entries too as well.
Hope that helps.