3 Replies Latest reply: Jan 16, 2013 7:11 AM by Michael Peel-Oracle RSS

    Regarding Ngram and Stemming

    user776266
      Hi all,
      We have question regarding the statement of Oracle Endeca Platform Services XMLReference:

      1) Ngram setting seems to be ignored after ver 6.1.2. is that true ? On latest Endeca commerce 3.1.1, is the Ngram ignored ?
      2) Is there Stemming dictionary for Japanese ? If so, where it is stored in standard endeca module ?

      Any help will be greatly appreciated.

      Thanks,
      Yuki
        • 1. Re: Regarding Ngram and Stemming
          Michael Peel-Oracle
          Hi

          Have you seen the release notes for Commerce 3.1.1? The biggest Endeca change is the inclusion of "Oracle Language Technology" (OLT) language analysis, including for Japanese (Kanji, Hiragana, Romaji and Katakana (full- and half-width)). One of the features of OLT is that stemming is generated dynamically using programmatic rules, along with tokenization, segmentation, decompounding etc. The downside is that some Endeca features aren't supported for data indexed using OLT instead of regular Latin-1, so if you need wildcard search, phrase search, search characters or diacritic folding you'll need to run two indexes.

          You can find out more in Chapter 16 here: http://docs.oracle.com/cd/E38682_01/MDEX.640/pdf/AdvDevGuide.pdf .

          Michael
          • 2. Re: Regarding Ngram and Stemming
            user776266
            Hi Michael, thank you very much for the answer. I understand the manual behaviour, and let me add some background of the question:

            As for 1), for previous version, Ngram is considered to be used to spread the range of hit word. In OLT, do we have similar concept or parameter to make the hit word increase ?

            As for 2), for Japanese we use Dynamic Stemming, so is there any way to customize the dictionary like static case ? Is that impossible ?

            Regards,
            Yuki
            • 3. Re: Regarding Ngram and Stemming
              Michael Peel-Oracle
              Hi

              For (1), if I've understood the new functionality correctly, I don't think it should be necessary to use NGRAM at all as you don't use wildcarding (assuming this is going with the old approach of segmenting characters and using wildcarding to get around lack of whitespace in Japanese).

              For (2), yes, you can add an "auxiliary dictionary" (see the documentation for details). I think you should be able to use thesaurus entries too as well.

              Hope that helps.

              Michael