I have a GDS resource ending with the following status:
Resource Name Node Name State Status Message
------------- --------- ------- --------------------------
Resource: CleanUpClient-res node2 Online Faulted - Application faulted, but not restarted. Probe quitting ..
I can see that no processes are controlled by PMF for this resource which means that the application does not run any more.
The question is:
Why does the cluster framework set the status to online and not to offline ?
On the opposite the status message copes with the reality.
For these resource we configured:
H i User446058-OC
This status is showing that the application was started OK however the probe you supplied to GDS
exited with non 0 status and there are no conditions to the an action any more. "online" means
that the stop method was not called so cluster RGM (resource group manager) did not run the stop method.
You obviously did not want this resource to fail over to other node as Failover_enabled is false.
You can try different settings to see which is better for your environment. (r_properties(5) (Sun Cluster Reference Manual for Solaris OS) )
The Retry_count is set to 0 which will result in continuous restarts with no limits.
If you want to know more we could give you more details, just post a #clrs show -v CleanUpClient-res
output and some relevant errors from /var/adm/messages to see what happened.
actually setting Retry_count=0 results in no restarts of the resource.
And combined with the setup of Failover_enabled=false, the resource will also not trigger a failover.
As such the observed behavior is what is expected to happen.
Setting Retry_count to a negative number (e.g. Retry_count=-1) would result into unlimited restarts.