Redundant switch failover causing 25 second DB pause
We have a required test for our system in which we shutdown one of two redundant distribution switches to verify the network HA design.
After one of the switches that control the FCoE SAN connections and other network traffic to the database cluster was rebooted, the cluster’s three servers showed that one of their two network interfaces (eth3) interface went down, which was expected. The other network interface (eth2) stayed up and the bonded interface (bond0) stayed up but there was a 25 second pause where nothing could connect to the database. We could ping each server in the cluster while this test was running with no visible pause. We also ran an Iperf command while this test was running which showed a