7 Replies Latest reply: May 9, 2014 8:10 AM by Greg E. RSS

    Performance Issue-Endeca



      we are facing performance issue due to huge amount of data to be indexed. currently we are testing our structure with 1 million data which is taking around 3.5 hrs to complete. But we need to run baseline for around 12 million data for which we are expecting it to take lot of time. So now we are looking for some performance tuning tips.

      In the Oracle doc, it's mentioned that by increasing the no of threads i.e., multi threading will improve the performance. can anyone tell me how to change the settings of threads to increase its number both in windows and linux? Also, how can I check that currently it is running with how many threads?

      Please help.




        • 1. Re: Performance Issue-Endeca
          Greg E.

          A few things:


          - You can increase the number of threads used by Dgidx.  In later versions of Endeca, you can add <arg>--threads</arg><arg>4</arg> to the "Dgidx" section of your /config/script/DataIngest.xml file.  In older versions, this was in the /config/script/AppConfig.xml. 

          - You can set the threads to be the number of CPU cores.

          - This won't vastly improve your indexing, however.  Only certain parts of building the index use the extra cores.  But it won't hurt!


          What you can really try and do is examine how many fields are enabled for search.    In your pipeline folder, find the file with a name like "discover.recsearch_indexes.xml".   (Replace "discover" with your app name).


          In there are all of the various fields enabled for keyword search.  Removing entries will reduce indexing time.  Also, with your quantity of data you should pretty much make sure no fields are enabled for wildcard searching.    Look for <WILDCARD_INDEX/> entries and consider removing them.


          One last suggestion:  In your Dgidx section (which I mentioned above) consider removing --compoundDimSearch.  This is only useful for certain kinds of typeahead searches.  Removing this will also help with indexing time.


          If you're not doing typeahead searches at all, consider turning off wildcard searches for dimension searches by updating "discover.dimsearch_index.xml".

          • 2. Re: Performance Issue-Endeca

            Look into the breakup of time taken in each of the baseline process, then apply the tip that Greg mentioned. Also see if you can do some processing/massaging on the data outside the pipeline so that you can reduce the forge times.


            Another thing to think at a higher level, do all your customers need the full data (of 12 million)? If not you can think of sharding your Endeca instance.




            • 3. Re: Performance Issue-Endeca

              Thanks for the reply Greg and Pankaj.


              I cannot see <arg>--threads</arg><arg>4</arg> in "DataIngest.xml" file. Instead I can see it in Dgraph_defaults.xml file. By default it is set to 2. Is it the same which you have mentioned? And I will try with the other tips which you have mentioned. Thanks a lot.



              Yes. we need all 12 million data and we control data for different users via navigation queries.





              • 4. Re: Performance Issue-Endeca

                One more point to add, My Forge completes quickly. Dgidx(Indexing) takes a lot of time. So any inputs to improve Dgidx performance will be helpful.




                • 5. Re: Performance Issue-Endeca
                  Greg E.

                  You have to add the <arg>--threads</arg><arg>4</arg> to the Dgidx block. It is not there by default. 


                  If Dgidx is taking a long time, adding the threads and revisiting which fields are enabled for keyword search will help with that.  Please post how many fields are enabled for search before and after, with the corresponding times.



                  • 6. Re: Performance Issue-Endeca

                    Thanks a lot Greg.

                    I increased the no of threads for Dgidx to 12. Now my baseline runs better than before. We haven't enabled all the properties and dimensions for search. We have enabled only few properties for record search which was client's requirement. So that can't be disabled. Is there any other way to further improve the speed of indexing?




                    • 7. Re: Performance Issue-Endeca
                      Greg E.

                      Which versions of Endeca are you using?  I would strongly recommend looking at removing the --compoundDimSearch flag as a quick win.  Beyond that, there might not be any more tweaks I can think of.