5 Replies Latest reply: Jul 25, 2013 9:48 AM by Michael Peel-Oracle RSS

    ATG Endeca Integration Forge and Dgidx Performance Issue

    Arjun99


      Hi All,

       

      I am using ATG Endeca Product Catalog Integration to integrate our B2C application with Endeca Guided Search. My Pipeline consists of 12 Record Adapters out of which 11 read data from data feed files(txt files) and the other one record adapter reads data from my CAS Record store.

       

      The issue I am facing is that, my endeca application takes around 1 hour 2complete the "Forge and then takes more than 6 HOURS is taken my dgidx!!!!! I am running this on a dedicated unix dev box...I my local 64 bit windows machine it takes much longer.

       

      Below are the total stats:

      These are the versions ATG10.1.2, MDEX- 6.4.0, PlatformServices-6.1.3, CAS- 3.1.1, ToolsAndFrameWorks-3.1.1.

      I have 90 endeca properties, 43 Auto gen dimensions, 2 Internal dimensions and totally around 1 million records read from CAS data record store.

      None of the above mentioned 43  dimensions are searchable. The two internal dims have around 50-100 dimvals each. They are searchable(Not wildcard though).

       

      PFB the basline update log:

       

      CONFIG: Loading beanshell.imports from /a/endeca/apps/MyAppen/config/script/beanshell.imports

      Jul 22, 2013 3:42:02 PM com.endeca.soleng.eac.toolkit.Controller setUpBeanShellInterpreters

      CONFIG: Loading beanshell.imports from /a/endeca/apps/MyAppen/config/script/beanshell.imports

      Jul 22, 2013 3:42:02 PM com.endeca.soleng.eac.toolkit.Controller provision

      INFO: Checking definition from AppConfig.xml against existing EAC provisioning.

      Jul 22, 2013 3:42:03 PM com.endeca.soleng.eac.toolkit.Controller provision

      INFO: Definition has not changed.

      Jul 22, 2013 3:42:03 PM sun.reflect.NativeMethodAccessorImpl invoke0

      INFO: Starting baseline update script.

      Jul 22, 2013 3:42:03 PM com.endeca.soleng.eac.toolkit.base.LockManager acquireLock

      INFO: Acquired lock 'update_lock'.

      Jul 22, 2013 3:42:03 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting shell utility 'cleanDir_processing'.

      Jul 22, 2013 3:42:05 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting shell utility 'move_-_to_processing'.

      Jul 22, 2013 3:42:06 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting copy utility 'fetch_config_to_input_for_forge_Forge'.

      Jul 22, 2013 3:42:07 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting backup utility 'backup_log_dir_for_component_ConfigurationGeneratorForge'.

      Jul 22, 2013 3:42:08 PM com.endeca.soleng.eac.toolkit.component.Component start

      INFO: [ITLHost] Starting component 'ConfigurationGeneratorForge'.

      Jul 22, 2013 3:44:30 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting backup utility 'backup_log_dir_for_component_Forge'.

      Jul 22, 2013 3:44:31 PM com.endeca.soleng.eac.toolkit.component.Component start

      INFO: [ITLHost] Starting component 'Forge'.

      Jul 22, 2013 4:27:58 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting backup utility 'backup_log_dir_for_component_Dgidx'.

      Jul 22, 2013 4:27:59 PM com.endeca.soleng.eac.toolkit.component.Component start

      INFO: [ITLHost] Starting component 'Dgidx'.

      Jul 22, 2013 10:27:45 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [AuthoringMDEXHost] Starting shell utility 'cleanDir_local-dgraph-input'.

      Jul 22, 2013 10:27:46 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [AuthoringMDEXHost] Starting copy utility 'copy_index_to_host_AuthoringMDEXHost_AuthoringDgraph'.

      Jul 22, 2013 10:31:39 PM com.endeca.soleng.eac.toolkit.component.cluster.DgraphCluster applyIndex

      INFO: Applying index to dgraphs in restart group 'A'.

      Jul 22, 2013 10:31:39 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [AuthoringMDEXHost] Starting shell utility 'mkpath_dgraph-input-new'.

      Jul 22, 2013 10:31:40 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [AuthoringMDEXHost] Starting copy utility 'copy_index_to_temp_new_dgraph_input_dir_for_AuthoringDgraph'.

      Jul 22, 2013 10:34:02 PM com.endeca.soleng.eac.toolkit.component.Component stop

      INFO: Stopping component 'AuthoringDgraph'.

      Jul 22, 2013 10:34:10 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [AuthoringMDEXHost] Starting shell utility 'move_dgraph-input_to_dgraph-input-old'.

      Jul 22, 2013 10:34:17 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [AuthoringMDEXHost] Starting shell utility 'move_dgraph-input-new_to_dgraph-input'.

      Jul 22, 2013 10:34:18 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [AuthoringMDEXHost] Starting backup utility 'backup_log_dir_for_component_AuthoringDgraph'.

      Jul 22, 2013 10:34:28 PM com.endeca.soleng.eac.toolkit.component.Component start

      INFO: [AuthoringMDEXHost] Starting component 'AuthoringDgraph'.

      Jul 22, 2013 10:35:07 PM sun.reflect.NativeMethodAccessorImpl invoke0

      INFO: Publishing Workbench 'authoring' configuration to MDEX 'AuthoringDgraph'

      Jul 22, 2013 10:35:07 PM com.endeca.soleng.eac.toolkit.component.IFCRComponent pushAuthoringContentToDgraphById

      INFO: Pushing authoring content to dgraph: AuthoringDgraph

      Jul 22, 2013 10:35:08 PM com.endeca.soleng.eac.toolkit.component.IFCRComponent pushAuthoringContentToDgraphById

      INFO: Finished pushing content to dgraph.

      Jul 22, 2013 10:35:08 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [AuthoringMDEXHost] Starting shell utility 'rmdir_dgraph-input-old'.

      Jul 22, 2013 10:35:10 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [LiveMDEXHostA] Starting shell utility 'cleanDir_local-dgraph-input'.

      Jul 22, 2013 10:35:27 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [LiveMDEXHostA] Starting copy utility 'copy_index_to_host_LiveMDEXHostA_DgraphA1'.

      Jul 22, 2013 10:40:48 PM com.endeca.soleng.eac.toolkit.component.cluster.DgraphCluster applyIndex

      INFO: Applying index to dgraphs in restart group '1'.

      Jul 22, 2013 10:40:48 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [LiveMDEXHostA] Starting shell utility 'mkpath_dgraph-input-new'.

      Jul 22, 2013 10:40:49 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [LiveMDEXHostA] Starting copy utility 'copy_index_to_temp_new_dgraph_input_dir_for_DgraphA1'.

      Jul 22, 2013 10:42:30 PM com.endeca.soleng.eac.toolkit.component.Component stop

      INFO: Stopping component 'DgraphA1'.

      Jul 22, 2013 10:42:33 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [LiveMDEXHostA] Starting shell utility 'move_dgraph-input_to_dgraph-input-old'.

      Jul 22, 2013 10:42:39 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [LiveMDEXHostA] Starting shell utility 'move_dgraph-input-new_to_dgraph-input'.

      Jul 22, 2013 10:42:40 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [LiveMDEXHostA] Starting backup utility 'backup_log_dir_for_component_DgraphA1'.

      Jul 22, 2013 10:42:43 PM com.endeca.soleng.eac.toolkit.component.Component start

      INFO: [LiveMDEXHostA] Starting component 'DgraphA1'.

      Jul 22, 2013 10:43:57 PM sun.reflect.NativeMethodAccessorImpl invoke0

      INFO: Publishing Workbench 'live' configuration to MDEX 'DgraphA1'

      Jul 22, 2013 10:43:57 PM com.endeca.soleng.eac.toolkit.component.IFCRComponent pushLiveContentToDgraphById

      INFO: Pushing live content to dgraph: DgraphA1

      Jul 22, 2013 10:43:57 PM com.endeca.soleng.eac.toolkit.component.IFCRComponent pushLiveContentToDgraphById

      INFO: Finished pushing content to dgraph.

      Jul 22, 2013 10:43:57 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [LiveMDEXHostA] Starting shell utility 'rmdir_dgraph-input-old'.

      Jul 22, 2013 10:43:59 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting shell utility 'cleanDir_temp'.

      Jul 22, 2013 10:44:01 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting copy utility 'fetch_post_forge_dimensions_to_ws_temp_dir_-a-ecom-endeca-apps-MyAppen-data-workbench-temp'.

      Jul 22, 2013 10:44:02 PM com.endeca.soleng.eac.toolkit.component.WorkbenchManagerComponent updateWsDimensions

      INFO: Uploading post-Forge Dimensions to Workbench.

      Jul 22, 2013 10:44:02 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting shell utility 'emgr_update_set_post_forge_dims'.

      Jul 22, 2013 10:44:19 PM com.endeca.soleng.eac.toolkit.component.WorkbenchManagerComponent updateWsConfig

      INFO: Uploading config files to Workbench.

      Jul 22, 2013 10:44:19 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting shell utility 'emgr_update_update_mgr_settings'.

      Jul 22, 2013 10:44:21 PM com.endeca.soleng.eac.toolkit.component.WorkbenchManagerComponent updateWsConfig

      INFO: Finished uploading config files to Workbench.

      Jul 22, 2013 10:44:21 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting backup utility 'backup_state_dir_for_component_Forge'.

      Jul 22, 2013 10:44:22 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ITLHost] Starting backup utility 'backup_index_Dgidx'.

      Jul 22, 2013 10:49:10 PM com.endeca.soleng.eac.toolkit.component.Component stop

      INFO: Stopping component 'LogServer'.

      Jul 22, 2013 10:49:11 PM com.endeca.soleng.eac.toolkit.utility.Utility start

      INFO: [ReportGenerationHost] Starting backup utility 'backup_log_dir_for_component_LogServer'.

      Jul 22, 2013 10:49:12 PM com.endeca.soleng.eac.toolkit.component.Component start

      INFO: [ReportGenerationHost] Starting component 'LogServer'.

      Jul 22, 2013 10:49:13 PM com.endeca.soleng.eac.toolkit.base.LockManager releaseLock

      INFO: Released lock 'update_lock'.

      Jul 22, 2013 10:49:13 PM sun.reflect.NativeMethodAccessorImpl invoke0

      INFO: Baseline update script finished.

      Please let me know any way I can improve the performance of dgidx and reduce the total time taken for a baseline update.

       

      Thanks in advance,

      Arjun

        • 1. Re: ATG Endeca Integration Forge and Dgidx Performance Issue
          Branchbird - Pat

          Hi Arjun,

           

          I think we'd need your dgidx.log file to troubleshoot this effectively.  Understandable if you weren't able to share it (since it has property names, app names, etc.) but this would help illuminate details about your configuration and where time is being spent.

           

          How large (in GB) is your dgidx_output folder after Dgidx completes?

           

          Regards,

           

          Patrick Rafferty

          http://branchbird.com

          • 2. Re: ATG Endeca Integration Forge and Dgidx Performance Issue
            TimK

            I'm going to add to what Patrick said - in my experience, issues like this can point to malformed joins - you may have Keys setup on caches inappropriately that are causing the forge/dgidx to do way too much work or other join issues.

            First we want to look at the dgidx log as it should indicate that this kind of issue is happening. Then perhaps your pipeline.epx file if that is the direction it takes us. Again, do not share anything sensitive or proprietary.

            -Tim

            • 3. Re: ATG Endeca Integration Forge and Dgidx Performance Issue
              Michael Peel-Oracle

              How is memory utilisation looking during the dgidx step?  My guess would be the server is running out of physical memory, and when this happens the dgidx pretty much grinds to a halt.

              • 4. Re: ATG Endeca Integration Forge and Dgidx Performance Issue
                Arjun99

                Hi Michael,Tim and Pat,

                 

                Thanks for replying to my query!!

                 

                I was able to bring the time taken by dgidx down just over an hour!

                 

                I am unable to attach the dgidx logs as it over 1 GB(verbose logging) in size and the total size of my app's dgidx_output folder is 17.50 GB.

                 

                PFB the memory stats while running dgidx:

                    65968572  total memory

                     65786008  used memory

                     32091080  active memory

                     32067732  inactive memory

                       182564  free memory

                       158576  buffer memory

                     52967140  swap cache

                      8388600  total swap

                          264  used swap

                      8388336  free swap

                     12826169 non-nice user cpu ticks

                          378 nice user cpu ticks

                       517949 system cpu ticks

                  14908738257 idle cpu ticks

                      1288895 IO-wait cpu ticks

                         8262 IRQ cpu ticks

                        84898 softirq cpu ticks

                            0 stolen cpu ticks

                    109814562 pages paged in

                    688548068 pages paged out

                            0 pages swapped in

                           66 pages swapped out

                   2012364733 interrupts

                   1299354110 CPU context switches

                   1368541891 boot time

                      1247490 forks

                 

                 

                I did the following changes to dgidx conf. file.

                 

                1. Removed the compoundDimSearch arguement.

                2. Added the --threads arg with value of 6. My unix dev box is running on intel xeon E-2630 @ 2.3 GHz. Please let me know if I need to change this. I assumed that the the no. of threads is equal to the number of physical CPU cores.

                 

                This brought down the time taken to run dgidx to One hour!

                 

                I will post any updates if I have!

                Thanks,

                Arjun

                • 5. Re: ATG Endeca Integration Forge and Dgidx Performance Issue
                  Michael Peel-Oracle

                  "182564  free memory" - it looks like your changes caused the memory utilisation to just about stay under the maximum.  It is probably the compoundDimSearch change, as the multi-threading only applies to the sorting (unless you have hundreds of sortable properties).  Try reviewing your list of properties/dimensions, and considering whether:

                  1) only properties that are definitely being used for sorting are enabled for offline sort indexing (note you can still sort against them, it just means the sort index is built on request rather than during the dgidx)

                  2) you have any properties/dimensions flagged as searchable that don't need to be

                  3) if you are using wildcard searching, definitely make sure only the minimum properties/dimensions are flagged as wildcard-enabled

                  4) if you have any extremely large dimensions (check the ./data/state/autogen_dimensions.xml file to find out, look for any with 10k+) consider consolidating them, for example changing from unique values to ranges/groups

                  5) only map attributes to dimensions if you will be using them for navigation (any system usage can often be managed using record filters against the properties)

                   

                  HTH

                   

                  Michael