I am looking to evaluate Endeca for implementing search for a help-center kind of a site where there are list of questions and answers available and the requirement is to provide a search for this content. The guest user can type in question or some search terms in the box to get the right question and ultimately the answer to the question.
However; what I see from the basic tests is that there will be demand of natural language search for verbal matching. I understand that some part of this feature is supported by stemming in Endeca but that dictionary is not the complete list. It will not be a scalable solution to have business update this word forms in the stemming dictionary time and again for such requirement.
Are there any thoughts on the above problem ? What can be the possible ways to move ahead ?
Here's a few tips:
- First, Endeca doesn't do natural language processing. It is very literal with the words that are entered and matched.
- By default, the MDEX limits the number of search terms to 10 by default. Meaning if you entered 11 or more terms, terms #11 and so-forth would be ignored.
- You can increase this by setting the --search_max flag (typically configured in the <dgraph-defaults> section of your appConfig.xml
- Now, you can increase this to be quite large (say 200 terms). Then, when a customer enters in their problem description, it won't ignore terms.
- Use the "Match Any" match mode. Now, this might give you a lot of search results, but that goes to my next point.
- Use the NTerms match mode as your first relevancy ranking module. This will order the results based on the number of terms
- Make sure you have a good set of "Stop Words", like "the a and it was is of", etc.
So now what will happen is that the customer will enter their problem, and you'll submit the whole thing to the mdex. The MDEX will throw out the stop words, and then order the results based on the number of terms that matched.
Some other relevancy modules to think about:
- WFreq / Weighted Frequency. This counts the frequency of words that are more important.
- Stay away from Phrase and Exact. They'll almost certainly be too slow.
- Use a Static module based on the view count of an article, so that more popular articles will show up first.
You might also want to turn off these MDEX features:
--wb_noibrk Disables word-break insertion analysis.
--wb_norbrk Disables word-break removal analysis.
If you're going to submit big sets of search terms, those features will probably be too slow.