This content has been marked as final. Show 8 replies
Seeing a similar issue where my managed servers are showing as Ok but the application deployments are showing as down. Any ideas on where to look besides the emoms.log file and what might be causing this?
Managed servers attempt to connect to the admin server at startup and periodically thereafter. To be successful, the correct admin server listen address, listen port, protocol, and security credentials need to be specified. Usually connection issues such as this can be traced to a issue in one of these areas. Depending on how you are starting the server, this may be specified in a startup command file or in the config.xml.
Look at the managed server logs. You should see connections to the admin server being attempt being attempted, and those log messages include details on the listen port and address attempted. If they are correct, look at the logs for any unexpected exceptions from those attempts.
If your servers are up and healthy but an application deployed to those servers is unexpectedly down, looking into the logs is probably the best way to find the cause. You may see an issue reported on the console if you select the app and try to start it, but it really depends on what the issue is and when it is detected.
Thanks Loren for the response.
The admin server listen address, port, protocol and security credentials are correct. If one was wrong, I would expect to see this issue sooner, and on all nodes.
The config.xml file is the same on all machines.
The server starts one of two ways:
1) Server Reboot - The nodemanger will start the nodes, using the nmStart(server) comment.
2) Manual Restart - using the Adminserver to restart the nodes.
I do know that this issue has come up after a server reboot. For example, I'll check it when I come in, in the morning and everything will be "OK", but come the afternoon, one or more nodes will not display "OK", but will still be Running and serving data.
Did you ever solve this issue? We are running into the same issue, where nodes will just drop out of the cluster. They will still show running, but the health is not "OK".
Sadly we have not.
We had this issue and it was because we created our managed server through the quickstart. When we create it through the console everything shows as up and we can see stats on everything.
It could be if server came out of cluster due to issue with cluster and connection between admin server and managed server broke. At such stage even if server is healthy, it will not present in admin server or will be shown as failed.
To start with did you first checked if your server is UP and taking request. This you can achieve by doing any of the below,
1. PING test to managed server. It will show if there are any packet loss.
2. Check server access log to confirm if it is serving request in the runtime.
3. "telnet" to the ip:port pair and also check if process id is alive.
Also check server logs for cluster error. If server is healthy but not presented in admin console, then only restarting the server will re-create communication between admin server and managed server which will then display correct status on console. And if server itself is down, go for server restart.