I have a simple question actually more of a doubt . Hope will get cleared here
I have a 2 node SUN Cluster running
The application failed over recently from node 1 to node 2
The failed node 1 rebooted and came up again and is in cluster (I think)
The FAILBACK policy for the RG is set to FALSE
Is it required for the application to be failed back manually to node 1 now
or it can continue to run on node 2 and if node 2 fails in future the application will failover to node 1 again
Is there a way to check that the application will failover back to node1 if such a situation arises
The resource group (RG) property Failback is configurable and defaults to false.
As such it is up to the administrator to decide if an immediate failback of the RG is desired or not - mainly a question of the business need and policy of required uptime of a service.
Note that an automated failback will introduce a short outage of the service (the time to stop the RG and to start it on the node that came back).
Typically Failback=true makes sense where such a short outage is acceptable, and where multiple RG are configured. If full capacity of the cluster (ie. all cluster nodes are up) is available, then you might want to distribute the multiple RGs across the cluster nodes - and only in failure situations accept that the run combined on remaining nodes.
You can even influence through RG affinities and dependencies and RG loadlimits more fine grained control on which RGs are alowed on a given node and which might be desired to get offloaded, etc.
But it is also perfectly ok to just leave the RG running on node 2 after node 1 has failed (and later came back). That is why you have a cluster and get high availability. By setting Failback=false you give priority to availability of that service and can manually switch the RG whenever a downtime of the application/service this RG manages is acceptable to your business needs.
You can verify the status of the cluster with "cluster status". If the failed node has come back, you would expect to see the node status to be online.
If the RG has been successfully running on node 1 before, it can failback to that node.
You can also verify status of storage devices and network IPMP groups for that node, to verify if those are online too.
You could also run "cluster check -C S6708689 -n <node1>" to verify if all cluster resources can be validated on node 1:
S6708689 : (Variable) Can all Solaris Cluster resources be validated?
For a full list of available checks have a look at "cluster list-checks".