This discussion is archived
2 Replies Latest reply: Jul 26, 2012 9:13 AM by 924908 RSS

strange error message

924908 Newbie
Currently Being Moderated
Hi @all,

is there anyone who can guide me to a solution for this error message:

Jul 25 03:23:52 krn630 SC[,SUNW.gds:6,sybase-rg,UC4Alias_SYB_ENTW,gds_probe]: [ID 419301 daemon.error] The probe command </opt/cluster/UC4Alias/monitor_hostgroup.sh KRN633 E> timed out
Jul 25 03:23:52 krn630 SC[,SUNW.gds:6,sybase-rg,UC4Alias_SYB_ENTW,gds_probe]: [ID 335591 daemon.error] Failed to retrieve the resource group property RG_is_frozen: resource, resource type or resource group has been updated since last scha__open call.

This is a special resource where we have disabled the monitoring of the process with

/usr/cluster/bin/pmfadm -s ${RESOURCEGROUP},${RESOURCE},0.svc

like here https://blogs.oracle.com/TF/entry/disabling_pmf_action_script_with

There are several server on which this construct works perfect, but only on this server every night
the message appeares once. The only thing I figured out is, that a database reorg runs at the same time
when the error message appeares and when I skip the database reorg no error message appeared.
Could there be a problem with a high cpu consumption?

Solaris 10, Sun Cluster 3.3

Thanks!

Regards,
Heinz

Edited by: 921905 on 25.07.2012 11:41
  • 1. Re: strange error message
    HartmutStreppel Explorer
    Currently Being Moderated
    I cannot tell you what the 2nd error message means, but the first one is easy to explain. Actually there are usually two monitoring process that monitor the health of a service: a process monitor, implemented using pmf (process monitoring facility) and the "real" agent, that usually tries to do some more sophisticated things to find out whether your HA service is still alive and doing useful things.
    Now, you switched off process monitoring but not the agent based monitor. And it seems that, when the DB reorg runs, this monitor just does not finish in time, e.g. due to insufficient CPU cycles. To handle this better, you should do a "clrs show -v <reseource name>" and look for the various timeout values. I think you have to increase the PROBE_TIMEOUT value. Default value, is IIRC set to 30 seconds.
    Handle timeout settings with care!
  • 2. Re: strange error message
    924908 Newbie
    Currently Being Moderated
    Hartmut,

    I've changed this parameter already from 30 to 60, but the problem still exist.

    Today I installed a monitorscript to get data about CPU consumption.

    Regards,
    Heinz

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points