Traceback (innermost last):
File "<console>", line 1, in ?
File "<iostream>", line 704, in ls
File "<iostream>", line 1847, in raiseWLSTException
WLSTException: Error occured while performing ls : Error while retrieving attribute names and values : javax.management.RuntimeMBeanException: MBean getAttribute failed: java.lang.IllegalStateException: Admin server identity is unavailable. The managed server may not be connected to the admin server
Use dumpStack() to view the full stacktrace
When connecting to other nodes on the same machine, we see lots of great information. This tells me that the node is unable to talk to the Admin server correctly. Yet, stopping and starting the node from the admin server works fine (and is out current way to resolve the issue).
Ideas on where to look for the underlying cause of this problem?
Or how to force the node to attempt to reconnect to the Admin server?
Seeing a similar issue where my managed servers are showing as Ok but the application deployments are showing as down. Any ideas on where to look besides the emoms.log file and what might be causing this?
Managed servers attempt to connect to the admin server at startup and periodically thereafter. To be successful, the correct admin server listen address, listen port, protocol, and security credentials need to be specified. Usually connection issues such as this can be traced to a issue in one of these areas. Depending on how you are starting the server, this may be specified in a startup command file or in the config.xml.
Look at the managed server logs. You should see connections to the admin server being attempt being attempted, and those log messages include details on the listen port and address attempted. If they are correct, look at the logs for any unexpected exceptions from those attempts.
If your servers are up and healthy but an application deployed to those servers is unexpectedly down, looking into the logs is probably the best way to find the cause. You may see an issue reported on the console if you select the app and try to start it, but it really depends on what the issue is and when it is detected.
The admin server listen address, port, protocol and security credentials are correct. If one was wrong, I would expect to see this issue sooner, and on all nodes.
The config.xml file is the same on all machines.
The server starts one of two ways:
1) Server Reboot - The nodemanger will start the nodes, using the nmStart(server) comment.
2) Manual Restart - using the Adminserver to restart the nodes.
I do know that this issue has come up after a server reboot. For example, I'll check it when I come in, in the morning and everything will be "OK", but come the afternoon, one or more nodes will not display "OK", but will still be Running and serving data.
It could be if server came out of cluster due to issue with cluster and connection between admin server and managed server broke. At such stage even if server is healthy, it will not present in admin server or will be shown as failed.
To start with did you first checked if your server is UP and taking request. This you can achieve by doing any of the below,
1. PING test to managed server. It will show if there are any packet loss.
2. Check server access log to confirm if it is serving request in the runtime.
3. "telnet" to the ip:port pair and also check if process id is alive.
Also check server logs for cluster error. If server is healthy but not presented in admin console, then only restarting the server will re-create communication between admin server and managed server which will then display correct status on console. And if server itself is down, go for server restart.