2 Replies Latest reply on Jul 26, 2012 4:13 PM by 924908

    strange error message

      Hi @all,

      is there anyone who can guide me to a solution for this error message:

      Jul 25 03:23:52 krn630 SC[,SUNW.gds:6,sybase-rg,UC4Alias_SYB_ENTW,gds_probe]: [ID 419301 daemon.error] The probe command </opt/cluster/UC4Alias/monitor_hostgroup.sh KRN633 E> timed out
      Jul 25 03:23:52 krn630 SC[,SUNW.gds:6,sybase-rg,UC4Alias_SYB_ENTW,gds_probe]: [ID 335591 daemon.error] Failed to retrieve the resource group property RG_is_frozen: resource, resource type or resource group has been updated since last scha__open call.

      This is a special resource where we have disabled the monitoring of the process with

      /usr/cluster/bin/pmfadm -s ${RESOURCEGROUP},${RESOURCE},0.svc

      like here https://blogs.oracle.com/TF/entry/disabling_pmf_action_script_with

      There are several server on which this construct works perfect, but only on this server every night
      the message appeares once. The only thing I figured out is, that a database reorg runs at the same time
      when the error message appeares and when I skip the database reorg no error message appeared.
      Could there be a problem with a high cpu consumption?

      Solaris 10, Sun Cluster 3.3



      Edited by: 921905 on 25.07.2012 11:41
        • 1. Re: strange error message
          I cannot tell you what the 2nd error message means, but the first one is easy to explain. Actually there are usually two monitoring process that monitor the health of a service: a process monitor, implemented using pmf (process monitoring facility) and the "real" agent, that usually tries to do some more sophisticated things to find out whether your HA service is still alive and doing useful things.
          Now, you switched off process monitoring but not the agent based monitor. And it seems that, when the DB reorg runs, this monitor just does not finish in time, e.g. due to insufficient CPU cycles. To handle this better, you should do a "clrs show -v <reseource name>" and look for the various timeout values. I think you have to increase the PROBE_TIMEOUT value. Default value, is IIRC set to 30 seconds.
          Handle timeout settings with care!
          • 2. Re: strange error message

            I've changed this parameter already from 30 to 60, but the problem still exist.

            Today I installed a monitorscript to get data about CPU consumption.