A week ago, finally, I successfully got the 12c Grid Control up with all the associated targets added making me very happy since I had previously had 11g Grid Control but didn't have it at my new job until now. We have a scheduled monthly downtime for Windows upgrades and patching. After yesterdays scheduled downtime and after everything came back up one of my 4 monitored servers started acting up. Here is the loop it is currently going through:
1) Message=Agent Unreachable (REASON = Unable to connect to the agent at https://ora-9001.mynetwork:3872/emd/main/ [Connection refused: connect]). Host is reachable.
2) Message=Agent Unreachable is cleared. The current status of the target is METRIC ERROR.
3) Message=Metric evaluation error start - Received an exception when evaluating sev_eval_proc for:Target name = PRI1_ora-9001.mynetwork_sys, metric_name = Response, metric_column = Status; Error msg = Target encountered metric erros; at least one member in in metric error
And one more I'm getting also but doesn't seem to be in the "loop":
Internal error detected: java.lang.Throwable:oracle.sysman.gcagent.tmmain.execution.LongOpManager$ZombieDetection:719.
The Agent will continually go down and then come back without any intervention, as if in some kind of loop. If it's down I'm able to manually start it, it will run for a while then go down again. I've checked to ensure credential and settings are correct but not sure how to troubleshoot from this point on, especially since all the other systems are doing fine and without issues.
All input is appreciated.
Looks like your agent is going into a crash and start loop. This usually points to something wrong with the Java or OS layer. The required debugging information will be in the log files. I suggest you open a Service Request with support. You can post the SR # here for the follow up.
I did quite a bit of testing in our test environment before installing into production and one thing I did noticed was that after a reboot I often had issues with the agent and would have do some manually intervention to get it all talking again. The initial agent push to this server went off without issue and all was working like a charm until the reboot. It seems that if there were issues with the Java or OS that the problem would have shown itself at that point. I'm not arguing that there isn't a problem with the Java or OS layer, just that it so often doesn't make sense in that all is working great, reboot and boom.
Can you please upload the trace files for the agent for which you are seeing the error.
Can you please also provide more information to below error i.e. where are you seeing this error, a screen shot of the screen etc.
Message=Metric evaluation error start - Received an exception when evaluating sev_eval_proc for:Target name = PRI1_ora-9001.mynetwork_sys, metric_name = Response, metric_column = Status; Error msg = Target encountered metric erros; at least one member in in metric error
Right now I'm working with our system admins since it appears their Windows patches/updates might have created a problem with the Java install on the machine in question. Unfortunately, this group doesn't move too quickly on issues so I'm stuck waiting on a reply from yesterdays email to them.