This content has been marked as final. Show 7 replies
Anyone from Oracle care to take a stab at this one?
I am not using the iSCSI storage plugin, simply since I couldn't get it to work. I went the manual route and set up the iscsi connections on my VM servers and tweaked the multipath.conf a bit. My current setup consists of FC-backed ASM-Servers, that each exports one of the two paths to the shared ADVM and all of my VM servers picked up the two connections and sorted the multipathing out on their own.
Currently, I am in the testing stage and I will start to play around with this setup a bit, since there need a couple of timeouts to be tuned, in order to prevent OCFS2 from fencing my hosts, once one of the paths gets unavailable. Pretty interesting, though…
I also had to change the timeout from 120 seconds to 5 seconds to avoid node fencing. Something I had to experience firsthand, research it on google, then implement. Why can't this stuff be already coded into OVM server to begin with? Very frustrating!
I agree that connecting to a SAN presenting LUNs via iSCSI is something that OVM really doesn't do well, but SHOULD.
I had a SAN which I set up to present multiple LUNs via a single iSCSI target. OVM couldn't handle this.
It would have been great to have managed storage this way.
Instead, what I had to do was to enable multipathd on the OVM servers by logging into them as root. (which is something that we shouldn't have to do, because technically these servers are "appliances").
From there OVMM saw the LUNs and we could use it this way. Also had to double check that CHAP settings were correct.
It just seems that there's too much hacking involved on the OVM servers to truly consider them as "appliances". More manual instruction is required if people have to log into them as root to get things done. Otherwise Oracle should do something about the issues.
Which timeout value did you change from 120 to 5 seconds? I'm experiencing random node fencing in a 3.0.3 environment that Oracle is not able to explain. We're using an Equallogic iscsi san.
Are you referring to settings in iscsid.conf?
Here are the notes I had written down:
== change iSCSI default timeout in /etc/iscsi/iscsid.conf for any future connections ==
* change node.session.timeo.replacement_timeout from 120 to 5
#node.session.timeo.replacement_timeout = 120
node.session.timeo.replacement_timeout = 5
== identify iSCSI lun's ==
# iscsiadm -m session
tcp:  xx.xx.xx.xx:3260,4 iqn.1992-04.com.emc:cx.apm00115000338.b9
tcp:  xx.xx.xx.xx:3260,3 iqn.1992-04.com.emc:cx.apm00115000338.b8
tcp:  xx.xx.xx.xx:3260,1 iqn.1992-04.com.emc:cx.apm00115000338.a8
tcp:  xx.xx.xx.xx:3260,2 iqn.1992-04.com.emc:cx.apm00115000338.a9
== confirm current active timeout value before the change ==
== manually change timeout on each iSCSI lun for current active connections ==
iscsiadm -m node -T iqn.1992-04.com.emc:cx.apm00115000338.b9 -p xx.xx.xx.xx:3260 -o update -n node.session.timeo.replacement_timeout -v 5
iscsiadm -m node -T iqn.1992-04.com.emc:cx.apm00115000338.b8 -p xx.xx.xx.xx:3260 -o update -n node.session.timeo.replacement_timeout -v 5
iscsiadm -m node -T iqn.1992-04.com.emc:cx.apm00115000338.a8 -p xx.xx.xx.xx:3260 -o update -n node.session.timeo.replacement_timeout -v 5
iscsiadm -m node -T iqn.1992-04.com.emc:cx.apm00115000338.a9 -p xx.xx.xx.xx:3260 -o update -n node.session.timeo.replacement_timeout -v 5
== restart iscsi to make changes take effect ==
service iscsi stop
service iscsi start
NOTE: service iscsi restart and /etc/init.d/iscsi restart doesn't seem to work. Only by stopping, then implicitly starting the iscsi service does it seem to work consistently.
== restart multipathd ==
# service multipathd restart
Stopping multipathd daemon: [ OK ]
Starting multipathd daemon: [ OK ]
== Verify new timeout value on active sessions ==