I just had a disturbing conversation with a Senior OVM Manager. I've had an SR open for a few days (the second SR on this issue) about a storage rescan (looking for new Fibre Channel LUNs presented) causing OVS servers to reboot. I did a rescan and then the servers rebooted, killing all their guests.
The manager just told me that this is expected behavior. If you have any IO going on on the system and initiate a rescan, it may cause a reboot. In order to mitigate this, you need to shut down your entire environment every time you add or remove storage.
I have requested documentation backing up this assertion and haven't gotten it yet. Am I crazy to think this answer is blatantly incorrect? Are people actually bringing their entire environment offline to add or remove storage?
This is 2013, right? :-)
While I agree with you, your VM server should NOT reboot. I've seen it happened before without virutalization. I even had a issue this past week were a native Redhat client refused to see a new LUN from a EMC array and it was running the latest version of powerpath. I ended up having to reboot the server. A LUN rescan can be disruptive. While, I've never had it happen to me. I can see were it might happen.
In a production environment, I would recommend you live migrate your VM guests to another server in your cluster before running a rescan on that server. You can then move the guests back after a rescan. That way you can void the potential issue.
You have a "rescan physical disks" option on your individual servers. You can run a "refresh" from the "storage" perspective but this will take place on the admin servers you set. You do have to run "refresh" sometimes to add a repository type connection but not for pass through connections. It just depends on which you are doing. I should have made that distinction. Sorry.
I had the same problem with VNX on 3.1.1. With iSCSI works perfect, however with FC sometimes it'is trick.
When you full rescan on OVM Manager, for some reason it issues a lip on fc_host with can causes the reboot.
By that time , I had SR (as usual, works for nothing ...) that come to /dev/null.
I found a work-around to fix the problem; scanning the disk on each host, worked for me.
Set ssh keys between the server to avoid logins. (fix host1,host2 ... hostN to your proper config).
host_list=(IP1 IP2 IP3 ... IPN)
for (( i=0;i<$count;i++)); do
ssh $HOST echo "- - -" > /sys/class/scsi_host/host1/scan
ssh $HOST echo "- - -" > /sys/class/scsi_host/host2/scan
EMC decided a long time ago to use their own product for multipathing. EMC does not work well with most native MPIOs. I wouldn't even consider trying to run FC connections to any OS without powerpath. Just try to find supported mulipther configs for most linux distributions... Even if you can find one... expect your logs to fill up on your servers from all the nonsense errors generated.