I have a WLS Domain discovered in OEM (12c). I used the same agent configuration and setup on all 3 servers in the cluster. Server 1 and 2 show as up, as well as the cluster and overall domain. Server 3 shows as down with numerous incidents flagged with Agent down and Weblogic Server Is Down. In WLS Console, all three servers are Running and working properly.
What sort of emctl commands can be utilized to verify that the agent on Server 3 is communicating properly with the WLS? Or steps within OEM, perhaps to verfiy the communication. I have checked config files and setup procedures, and Server 3 is identical to Server 1 and 2, aside from the 1/2 vs 3 and corresponding hostnames. Looking for any sort of input...
This is a problem with agent-OMS communication.
- check the agent status: <AGENT_BASE_DIR>/core/188.8.131.52.0/bin/emctl status agent
- check the All Targets page in the console to confirm the agent is reported down
- check the agent log: <AGENT_BASE_DIR>/agent_inst/sysman/log/gcagent.log
Was the target ever up? If not, you might need to resecure the agent:
<AGENT_BASE_DIR>/core/184.108.40.206.0/bin/emctl secure agent
That is why I am confused. The agent is actually up. It is secured and sending metrics for the box perfectly. I can perform ./emctl upload agent successfully. When I navigate to Host Targets, the box shows as up. But when I navigate to Middleware Targets, on the WLS domain, Server 3, which is on that box, shows as down. The agent unreachable is cleared on that page, but it shows the WebLogic Server as down.
When I access the WLS console, the server is up, running and working fine. That is why I am lost when it comes to my next step. If agent is working fine and the setup is identical to server 1 and 2. Why is it down, when the rest are up? I noticed in the Server3.log file, I am getting spammed with a Certificate error. "CERTIFICATE_UNKNOWN alert was recieved from *Server 3*. The peer has an unspecified issue with the certificate." I am getting the cert warning from server 3 on server 3. So is there a problem with some certificate and that is why the agent on server 3 can not communicate with the WLS?
I am leaning towards this error as the root cause of my issues. If the agent sitting on Server 3 is trying to connect with the WLS on Server 3 over and over again and a certificate is missing someone for SSL....that would explain the constant spam of errors in the Server 3 log. The other 2 logs are empty. The question comes down to this: What certificate is it looking for that Server 2 has, but Server 3 is obviously missing? The agent is missing some vital piece to talk with WLS.