6 Replies Latest reply: Jan 4, 2010 10:57 PM by 807737 RSS

    Alarms not appearing for events - Agent Not Responding

    807567
      Hi

      I'm hoping someone may be able to help with a frustrating issue.

      I have a setup with SunMC 4.0 monitoring approx 15 SunMC agents. The problem is some of these agents are not displaying alarms for events that occur.

      E.g. if I trigger an alarm on a host by say filling up a filesystem, on some hosts an alarm will appear in the Alarm tab as exepcted, but on other hosts no alarm appears and hence I cannot acknowledge the alarm.

      After doing a bit of digging I have found the following:

      1) When the event is triggered on the client the trap is sent to the server

      2) The server then tries to communicate with the agent on the client and I see this error in the event.log

      [012578fd 0086 ]info Jul 28 14:59:06 event eventmgr: sending snmp collection request to XXXX 1161 XXXX for startline 0...................................................................
      [012578ff 0098 ]error Jul 28 14:59:06 event eventmgr: event collection failed for XXXX 1161 XXXX line 0 - error {} {Agent Not Responding}.................................................

      3) If I snoop between the 2 servers I can see communication between the server and the client on port 1161 so dont believe it is a network issue

      4) I have tried uninstalling and reinstalling the agent. This strangely cured it on one server with the problem, but on another server the problem persists even after reinstalling the agent multiple times

      5) I have logged into the postgress database and can see that events are not being logged in the events table (hence not being displayed in the Sun MC console).

      6) I have tried reseeding the agent and that makes no difference

      7) I have ensured the agent is listening on one network interface by adding snmpBindAnyAddr = "0" to the agent section of the domain-config.x file

      I am not sure where else to look for pointers as to what is going on. What beats me is that the servers which the agents are running on all have identical builds built from jumpstart and are on the same network, yet some work and some dont!

      Any help from anyone greatly received...

      thanks