This discussion is archived
4 Replies Latest reply: Aug 14, 2013 9:50 PM by endeca_learner RSS

Endeca Stemming Issue

endeca_learner Newbie
Currently Being Moderated

I am having issues with Stemming in my application when I run baseline_update on Windows box. It would be a big help to this newbie if you Endeca gurus help me resolve this. In my simple pipeline, I have just default properties and dimensions, nothing created by me. I had default Stemming selected for English that I played around to add more languages after I had this problem. Currently I have four languages selected - Chinese simplified, Chinese traditional, English and Spanish. I don't have the file - C:\Endeca\Apps\MyApp\data\dgraphs\Dgraph1\dgraph_input/MyApp-aspell.spelldat in the location as pointed in the Dgraph log. Wondering if I need to create this? How?

 

Here is what I see in my command prompt and the log file:

 

baseline_update error:

[08.12.13 21:20:37] INFO: [MDEXHost] Starting backup utility 'backup_log_dir_for_component_Dgraph1'.

[08.12.13 21:20:38] INFO: [MDEXHost] Starting component 'Dgraph1'.

[08.12.13 21:20:57] SEVERE: Server component 'Dgraph1' failed to start. Refer to component logs in C:\Endeca\Apps\MyApp\config\script\..\..\.\logs\dgraphs\Dgraph1 on host MDEXHost.

Occurred while executing line 5 of valid BeanShell script:

[[

 

2|

3|    DgraphCluster.cleanDirs();

4|    DgraphCluster.copyIndexToDgraphServers();

5|    DgraphCluster.applyIndex();

6|

7|

]]

 

[08.12.13 21:20:57] SEVERE: Error executing valid BeanShell script.

Occurred while executing line 35 of valid BeanShell script:

[[

 

32|        Dgidx.run();

33|

34|        // distributed index, update Dgraphs

35|        DistributeIndexAndApply.run();

36|

37|        // if Web Studio is integrated, update Web Studio with latest

38|        // dimension values

 

]]

 

[08.12.13 21:20:57] SEVERE: Caught an exception while invoking method 'run' on object 'BaselineUpdate'. Releasing locks.

 

Caused by java.lang.reflect.InvocationTargetException sun.reflect.NativeMethodAccessorImpl invoke0 - null

Caused by com.endeca.soleng.eac.toolkit.exception.AppControlException com.endeca.soleng.eac.toolkit.script.Script runBeanShellScript - Error executing valid BeanShell script.

Caused by com.endeca.soleng.eac.toolkit.exception.AppControlException  com.endeca.soleng.eac.toolkit.script.Script runBeanShellScript - Error executing valid BeanShell script.

Caused by com.endeca.soleng.eac.toolkit.exception.EacComponentControlException com.endeca.soleng.eac.toolkit.component.ServerComponent startInParallel - Server component 'Dgraph1' failed to start. Refer to component logs in C:\Endeca\Apps\MyApp\config\script\..\..\.\logs\dgraphs\Dgraph1 on host MDEXHost.

 

[08.12.13 21:20:57] INFO: Released lock 'update_lock'.

 

Error in Dgraph log:

Stemming should be enabled for 4 languages

WARN 08/13/13 04:48:06.592 UTC (1376369286592) DGRAPH {dgraph,baseline} couldn't stat binary word forms file C:\Endeca\Apps\MyApp\data\dgraphs\Dgraph1\dgraph_input\MyApp.zh-CN.wfdat [err=`No such file or directory',errno=2]

WARN 08/13/13 04:48:06.608 UTC (1376369286592) DGRAPH {dgraph,baseline} couldn't stat binary word forms file C:\Endeca\Apps\MyApp\data\dgraphs\Dgraph1\dgraph_input\MyApp.zh-TW.wfdat [err=`No such file or directory',errno=2]

ERROR 08/13/13 04:48:07.639 UTC (1376369287639) DGRAPH {dgraph,baseline} OptiSpell, error creating pspell manager, "The file "C:\Endeca\Apps\MyApp\data\dgraphs\Dgraph1\dgraph_input/MyApp-aspell.spelldat" can not be opened for reading."

FATAL 08/13/13 04:48:07.639 UTC (1376369287639) DGRAPH {dgraph,baseline} Errors initializing aspell module.  This error is most likely due to an incorrect configuration of aspell. Please correct any previous errors and restart the dgraph.

 

When I unselect all the languages from Stemming in Dev Studio, the error in Dgraph log shows as following:

Stemming should be enabled for 0 languages

ERROR 08/13/13 05:18:16.444 UTC (1376371096444) DGRAPH {dgraph,baseline} OptiSpell, error creating pspell manager, "The file "C:\Endeca\Apps\MyApp\data\dgraphs\Dgraph1\dgraph_input/MyApp-aspell.spelldat" can not be opened for reading."

FATAL 08/13/13 05:18:16.444 UTC (1376371096444) DGRAPH {dgraph,baseline} Errors initializing aspell module.  This error is most likely due to an incorrect configuration of aspell. Please correct any previous errors and restart the dgraph.

  • 1. Re: Endeca Stemming Issue
    Michael Peel Journeyer
    Currently Being Moderated

    Have a look at Chapter 16: Using Internationalized Data in the Advanced Development Guide (http://docs.oracle.com/cd/E38682_01/MDEX.640/pdf/AdvDevGuide.pdf), it provides instructions on using internationalized data.  As you are using Chinese you should maintain a separate index for each language (so you get correct linguistic processing) and use the "espell" module instead of "aspell" (you can change this in the ./config/script/AppConfig.xml (or ./config/script/DataIngest.xml (I think) if you are using a recent deployment template), do a search for <run-aspell> and change the node value from "true" to "false".

     

    HTH

     

    Michael

  • 2. Re: Endeca Stemming Issue
    endeca_learner Newbie
    Currently Being Moderated

    Michael,

     

    Thanks for the reply. I do not want Chinese or any other language, except English. I selected English only in stemming and also changed the <aspell> to false as you suggested but still no change. I am still seeing following error in the dgraph log:

     

    Stemming should be enabled for 1 languages

    ERROR 08/14/13 21:56:54.375 UTC (1376517414365) DGRAPH {dgraph,baseline} OptiSpell, error creating pspell manager, "The file "C:\Endeca\Apps\MyApp\data\dgraphs\Dgraph1\dgraph_input/MyApp-aspell.spelldat" can not be opened for reading."

    FATAL 08/14/13 21:56:54.375 UTC (1376517414365) DGRAPH {dgraph,baseline} Errors initializing aspell module.  This error is most likely due to an incorrect configuration of aspell. Please correct any previous errors and restart the dgraph.

     

    Any other changes you can suggest?

  • 3. Re: Endeca Stemming Issue
    endeca_learner Newbie
    Currently Being Moderated

    When I unselect all the languages from Stemming, I see similar error:

     

    Stemming should be enabled for 0 languages

    ERROR 08/15/13 03:07:23.108 UTC (1376536043108) DGRAPH {dgraph,baseline} OptiSpell, error creating pspell manager, "The file "C:\Endeca\Apps\MyApp\data\dgraphs\Dgraph1\dgraph_input/MyApp-aspell.spelldat" can not be opened for reading."

    FATAL 08/15/13 03:07:23.108 UTC (1376536043108) DGRAPH {dgraph,baseline} Errors initializing aspell module.  This error is most likely due to an incorrect configuration of aspell. Please correct any previous errors and restart the dgraph.

  • 4. Re: Endeca Stemming Issue
    endeca_learner Newbie
    Currently Being Moderated

    Michael,

     

    Finally, I followed your advise and read through the internationalization in development guide and found the problem. I just needed to create a MyApp.spell_config.xml file to enforce my application to use espell for English and that did the trick. Thank you so much!

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points