2 Replies Latest reply on Nov 5, 2012 1:26 PM by Branchbird - Pat

    Msg regarding "Dgraph did not start in startup timeout of 120 seconds"

      Hi All,

      Baseline update failed with following error message.
      "WARNING: Component 'Dgraph1' did not start in startup timeout of 120 seconds.
      SEVERE: Server component 'Dgraph1' did not start in the allotted startup time.Refer to component logs in /usr/local/endeca/EndAppR7/./logs/dgraphs/Dgraph1 on host MDEXHost1.
      Occurred while executing line 5 of valid BeanShell script:

      3| DgraphCluster.cleanDirs();
      4| DgraphCluster.copyIndexToDgraphServers();
      5| DgraphCluster.applyIndex();

      I've checked the dgraph1 logs and i don't see any difference than usual logs - like what i had yesterday.
      "all dgraph transactions completed"
      "Shutting down dgraph (pid=17398)"
      It had been running well as daily scheduled job since last 1 week.
      it has very small data to be indexed, just few MB.
      I've checked main application log, dgraphs logs, forge logs just in case, and process.0.log in PlatformServices. I don't see any useful detailed information to find out what causes this issue.
      And there was no locked flag on the application.

      to resolve this issue, i just started the Dgraph1 from Workbench and run the baseline update to make sure if it does not fail again. it ran well.
      But i'm wondering why... it happened.

      Can anyone kindly tell me the possible cause or where i can find out the root cause?


      Edited by: julia on Oct 31, 2012 11:06 AM
        • 1. Re: Msg regarding "Dgraph did not start in startup timeout of 120 seconds"
          Hi !

          did you try a remote ./runcommand.sh Dgraph1 start (shut) to see how the process responds from the ITL ?
          some guesses :

          - any network issues (see how the dgraph responds on port 15000 (if this is the correct port) or how the EAC agent responds (8888 port)
          - any Full File System ? check disk space on your MDEX

          how many Dgraphs are included in your Dgraph Cluster ? 1 ? try and configure another Dgraph on another Host based on the same appconfig to check if the issue is with the index or the Dgraph

          check if you need to extend the 120s startup time (but i fail to see why if there we no major changes)

          hope that helps

          • 2. Re: Msg regarding "Dgraph did not start in startup timeout of 120 seconds"
            Branchbird - Pat

            After you issue the start command (either as part of a baseline update or with a runcommand.sh Dgraph1 start), you should see your Dgraph1.log file (in [your_app_folder]/data/dgraphs/Dgraph1/) get updated with information related to dgraph startup.

            You should look there to see if your dgraph is actually starting in the 2 minute timeframe, the dgraph is up when you see the following line:

            pid=[SOME_NUMBER] listening for HTTP connections on port [SOME_NUMBER] at [SOME_DATE_TIME]

            If you don't see that, there is either an error starting the dgraph (you should see that in the same log file) or an error sending the command to start the dgraph (port 8888 not open, EAC Agent not running on the MDEX server, etc.).

            If you do see a successful message starting the dgraph, it's likely that the EAC Central Server (i.e. likely where you ran the command) was not able to determine that the dgraph started. It usually does this by issuing an admin?op=ping to the dgraph port so it's likely that port 15000 (or whereever your dgraph is running) is blocked between those two servers.

            Hopefully, that's enough for you to go on.