When you say "backup" I assume you are talking about your own mechanism that you have built to back the caches up to binary files on disc and you are not talking about Coherence's built in backup where it holds a cop of data on another node.
To automatically trigger an import of data when a node starts you are going to have to build something. There are a few places you could hook this code into depending on the type of caches you have. We typically do this using some sort of singleton that executes when we know the cluster is in a "ready" state ("ready" being all the nodes have joined and all the cache service partitions are allocated). There are a few things to bear in mind though.
Normally all the nodes in a cluster do not all start at exactly the same time.
Consequently the partition allocation for a node will change as more nodes join the cluster, so which node owns the data being loaded will change so it is a bad idea to be loading data at this point.
You can stop this with a custom quorum that will not allow partitions to move until all the members have joined - you need to know the expected size of the cluster (which most people do - I don't know many apps that have a randomly sized cluster).
Even with a quorum the partitions will start to be allocated when all the nodes have joined but you do not want to start loading data until the partition allocation has finished.
We do this by looking at the JMX stats for the cache service and wait for the PartitionsEndangered and PartitionsUnbalanced to drop down to zero.
In our application we have a class that monitors all of the above things so it knows when the cluster is "ready". This class has a blocking method that can be called by other threads that will bock until the cluster is ready. We then fire off various initialization threads from the main class of our app (which is pretty much an extension of DefaultCacheServer). These threads wait until the cluster is "ready" then run to do various bits of initialization, such as pre-loading caches.
For efficiency, when loading data it might be better to get each node to load the binary data from disc for the partitions that it owns rather than getting a single node to load everything.
How did you do the class than monitors the cluster status? I'm trying to connect with JMX to the cluster but I don't know what are the MBeans I have to check. Do you know any kind of tutorial/howto, i haven't found anyone.
On the other hand, do you know if exists any kind of event/listener i can trigger on the startup of the clusters servers?
The MBean is the service MBean for the cache service you want to monitor. So the ObjectName would be something like this
Where <service-name> is the name of the cache service and node is the id of the node.
Once you have the MBean you need to check the attributes
PartitionsEndangered = 0
PartitionsUnbalanced = 0
PartitionsVulnerable = 0
StatusHA = MACHINE-SAFE
I guess this will be useful for getting the services status, wouldn't be enough get the ClusterMBean (Coherence:type=Cluster) and then check the attribute running(Running).
Have i got to check each service separately? then can't I just check the attribute running for the ServiceMBean?
In theory you could just check to see if a service is running or the cluster is running. We check all of those attributes because we need to know that all the members we expect have joined (which we can get from the member count) and then we need to know that Coherence has finished moving all of the partitions around, so we wait until all those counts get to zero (we have a custom quorum that does not let partitions move until the required number of members have joined). We used to have a lot of issues where our Ops people would not wait for the cluster to finish starting before they started all the other processes, one of which is the database load job. As our bulk DB load job pulls in quite a bit of data very quickly it is not good if partitions are still moving when it runs.