I have a Master Solaris 10 with OSB 10.4.2
1 Sl500 with 1 LTo5
sometimes when I do a FS backup of several servers in the same dataset, suddenly the master server loses communication with the client due to network problems.
When this happens, the running backup hangs and others in the queue remain pending indefinitely.
So my question: Is there a way to forcibly close the "hang backup" to end all others? for example a "time out" to set?
If your client hosts have network connectivity issues, you may want to consider having a separate dataset for each host. You can set the max retry policy which indicates how many times OSB should retry backing up a client if the first attempt failed due to connectivity etc:
I think the right answer here is to fix the underlying network issue and make your server stable. Maybe it's only on a 100Mb connection and needs to be provided with 1Gb.
From the OSB side there isn't this kind of timeout. If the client is rebooted or goes off the network in this way, the backup job stays around until you kill it. OSB doesn't know the client has gone away, it might just be busy or slow, so backups aren't automatically terminated.
thanks for reply.
Ok, there isn't a time out for backups in running.
I read the default value for max retry policy (maxdataretries) is set to 6.
Should I decrease this value?..for example 2?
Also I have one big dataset with 60 client, so I can't create 50 different datasets but I could create 6 datasetes with 10 clients in each. Do you think that could be useful to solve the problem?
Yes, why not try adjusting the max retry...say to 3 and see how that works. I think reducing the number of clients per dataset would be helpful in reducing one host issue effecting as many other hosts.