I am setting up an OVM 3.1.1 environment at a customer site who is presenting iSCSI LUNs from an EMC Filer. I have a few questions:
* what is the proper way to discover a set of iSCSI LUNs when the storage unit has 4 unique IP addresses on 2 different VLAN's? If I discover all 4 paths, they present in the GUI as 4 separate SAN servers. The LUNs seem to show up scattered across all 4 SAN servers. By my simple logic, my thinking is that if I were to lose access to one of those SAN servers, that the LUNs that happen to be presented via that SAN server would disappear and not be accessible. I know this isn't the case however because multipath -ll on the OVM server shows me that there are 4 distinct paths to each LUN that I'm expecting to see- and I've verified that multipath is working by downing one of the two NICs that are allocated to iSCSI and I can see that two paths of four are failed, but I can still access the disk just fine. Is this just me not setting things up the right way in the GUI, or is the GUI implemented poorly here and needs to be redesigned so it's clear to both myself AND the customer?
* has anyone used the storage connect plugins for either iSCSI or Fiber Channel storage with OVM? What does it actually do for you and is it easy or easier than unmanaged storage to implement? Is it worth the hassle?
I am not using the iSCSI storage plugin, simply since I couldn't get it to work. I went the manual route and set up the iscsi connections on my VM servers and tweaked the multipath.conf a bit. My current setup consists of FC-backed ASM-Servers, that each exports one of the two paths to the shared ADVM and all of my VM servers picked up the two connections and sorted the multipathing out on their own.
Currently, I am in the testing stage and I will start to play around with this setup a bit, since there need a couple of timeouts to be tuned, in order to prevent OCFS2 from fencing my hosts, once one of the paths gets unavailable. Pretty interesting, though…
I also had to change the timeout from 120 seconds to 5 seconds to avoid node fencing. Something I had to experience firsthand, research it on google, then implement. Why can't this stuff be already coded into OVM server to begin with? Very frustrating!
I agree that connecting to a SAN presenting LUNs via iSCSI is something that OVM really doesn't do well, but SHOULD.
I had a SAN which I set up to present multiple LUNs via a single iSCSI target. OVM couldn't handle this.
It would have been great to have managed storage this way.
Instead, what I had to do was to enable multipathd on the OVM servers by logging into them as root. (which is something that we shouldn't have to do, because technically these servers are "appliances").
From there OVMM saw the LUNs and we could use it this way. Also had to double check that CHAP settings were correct.
It just seems that there's too much hacking involved on the OVM servers to truly consider them as "appliances". More manual instruction is required if people have to log into them as root to get things done. Otherwise Oracle should do something about the issues.
Which timeout value did you change from 120 to 5 seconds? I'm experiencing random node fencing in a 3.0.3 environment that Oracle is not able to explain. We're using an Equallogic iscsi san.
Are you referring to settings in iscsid.conf?
Here are the notes I had written down:
== change iSCSI default timeout in /etc/iscsi/iscsid.conf for any future connections ==
* change node.session.timeo.replacement_timeout from 120 to 5
#node.session.timeo.replacement_timeout = 120
node.session.timeo.replacement_timeout = 5
== identify iSCSI lun's ==
# iscsiadm -m session
tcp:  xx.xx.xx.xx:3260,4 iqn.1992-04.com.emc:cx.apm00115000338.b9
tcp:  xx.xx.xx.xx:3260,3 iqn.1992-04.com.emc:cx.apm00115000338.b8
tcp:  xx.xx.xx.xx:3260,1 iqn.1992-04.com.emc:cx.apm00115000338.a8
tcp:  xx.xx.xx.xx:3260,2 iqn.1992-04.com.emc:cx.apm00115000338.a9
== confirm current active timeout value before the change ==
== manually change timeout on each iSCSI lun for current active connections ==
iscsiadm -m node -T iqn.1992-04.com.emc:cx.apm00115000338.b9 -p xx.xx.xx.xx:3260 -o update -n node.session.timeo.replacement_timeout -v 5
iscsiadm -m node -T iqn.1992-04.com.emc:cx.apm00115000338.b8 -p xx.xx.xx.xx:3260 -o update -n node.session.timeo.replacement_timeout -v 5
iscsiadm -m node -T iqn.1992-04.com.emc:cx.apm00115000338.a8 -p xx.xx.xx.xx:3260 -o update -n node.session.timeo.replacement_timeout -v 5
iscsiadm -m node -T iqn.1992-04.com.emc:cx.apm00115000338.a9 -p xx.xx.xx.xx:3260 -o update -n node.session.timeo.replacement_timeout -v 5
== restart iscsi to make changes take effect ==
service iscsi stop
service iscsi start
NOTE: service iscsi restart and /etc/init.d/iscsi restart doesn't seem to work. Only by stopping, then implicitly starting the iscsi service does it seem to work consistently.
== restart multipathd ==
# service multipathd restart
Stopping multipathd daemon: [ OK ]
Starting multipathd daemon: [ OK ]
== Verify new timeout value on active sessions ==