This discussion is archived
6 Replies Latest reply: Dec 9, 2012 9:37 PM by BillyVerreynne RSS

scsi id

BillyVerreynne Oracle ACE
Currently Being Moderated
We had an interesting failure on an Oracle RAC used for development - due to different scsi devices (on different h/w storage servers) suddenly giving the same Device Identification Vital Product Data (page 0x83) serial number.

At least, that is what we think as to our knowledge no config changes have been made recently to RAC servers and storage servers.

RHEL 5.3 installed (2.6.18-164.el5).

The default scsi id page setting used for udev/multipath disk naming was page 0x83. And have worked for well over a year on this specific cluster (AFAIK). We added an additional Interconnect switch (I/O fabric layer also runs over it using SRP) and this could have triggered the problem - that remains even when we removed the additional switch.

Here's the particular devices +/dev/sdm+ on storage server 2 and +/dev/sdm+ on storage server 1 - seen from RAC node 1:
[root@dev1 ~]# lsscsi | grep sdm
[5:0:0:11]   disk    SCST_BIO scst2_sdm         300  /dev/sdx
[6:0:0:11]   disk    SCST_BIO scst2_sdm         300  /dev/sdy
[7:0:0:11]   disk    SCST_BIO scst2_sdm         300  /dev/sdaw
[8:0:0:11]   disk    SCST_BIO scst2_sdm         300  /dev/sdav
[9:0:0:11]   disk    SCST_BIO scst1_sdm         300  /dev/sdbu
[10:0:0:11]  disk    SCST_BIO scst1_sdm         300  /dev/sdbo
The 1st 4 entries are 4 I/O paths to server 2's +/dev/sdm+ and the last 2 entries are 2 I/O paths to server 1's +/dev/sdm+.
 

Am using +/dev/sdx+ (1st I/O path of server 2) and +/dev/sdbo+ (2nd I/O path of server 1) as local device names for the following:
[root@dev1 ~]# ll  /dev/disk/by-path/ | grep sdx
lrwxrwxrwx 1 root root  9 Nov 22 17:12 pci-0000:04:00.0-scsi-2:0:0:11 -> ../../sdx

[root@dev1 ~]# ll  /dev/disk/by-path/ | grep sdbo
lrwxrwxrwx 1 root root 10 Nov 22 17:13 pci-0000:04:00.0-scsi-7:0:0:11 -> ../../sdbo
 

Using page 0x83 (Device Identification Vital Product Data) we are getting a duplicate identifier:
[root@dev1 ~]# /sbin/scsi_id -p 0x83 -g -u -s /block/sdx
26136623866316637

[root@dev1 ~]# /sbin/scsi_id -p 0x83 -g -u -s /block/sdbo
26136623866316637
 

Using page 0x80 (Unit Serial Number) we are getting unique identifiers:
[root@dev1 ~]# /sbin/scsi_id -p 0x80 -g -u -s /block/sdx
SSCST_BIOscst2_sdm_a6b8f1f7

[root@dev1 ~]# /sbin/scsi_id -p 0x80 -g -u -s /block/sdbo
SSCST_BIOscst1_sdm_92043a2d
 

We have switched udev/multipath naming config from 0x83 to 0x80 and this has resolved the name collision problem.

The questions now are:
a) why does 0x83 result in duplicates (out of 12+ scsi devices per storage server) for 2 devices?
b) has this been always an issue (and we were mistaken in that 0x83 was in use on the cluster by udev/multipath), or can this be introduced by changes to the I/O fabric layer?

The new switch (Infiniband) was cascaded from the existing one (valid reasons for that). For some reason, this resulted in a Subnet Manager to be started on the new cascaded switch, despite 2 existing Subnet Managers running. This could have impacted the I/O fabric. (btw, the switches are fairly old and not new as in brand new kit).

Comments and feedback will be appreciated. Thanks.
  • 1. Re: scsi id
    898553 Newbie
    Currently Being Moderated
    Just to clarify, you're running 6 paths to the same LUN, correct?

    Would you be able to post the output of just '/sbin/scsi_id -g -u -s /block/sdx'? Out of curiosity, what type of array are you running on?
  • 2. Re: scsi id
    BillyVerreynne Oracle ACE
    Currently Being Moderated
    theanswriz42 wrote:
    Just to clarify, you're running 6 paths to the same LUN, correct?
    No. The following shows 4 paths to scst2_sdm (2nd storage server) and 2 paths to scst1_sdm (1st storage server):
     [root@dev1 ~]# lsscsi | grep sdm
    [5:0:0:11]   disk    SCST_BIO scst2_sdm         300  /dev/sdx
    [6:0:0:11]   disk    SCST_BIO scst2_sdm         300  /dev/sdy
    [7:0:0:11]   disk    SCST_BIO scst2_sdm         300  /dev/sdaw
    [8:0:0:11]   disk    SCST_BIO scst2_sdm         300  /dev/sdav
    [9:0:0:11]   disk    SCST_BIO scst1_sdm         300  /dev/sdbu
    [10:0:0:11]  disk    SCST_BIO scst1_sdm         300  /dev/sdbo
    Would you be able to post the output of just '/sbin/scsi_id -g -u -s /block/sdx'?
    // scsi device from storage server 2
    [root@dev1 ~]# /sbin/scsi_id -g -u -s /block/sdx
    26136623866316637
    
    // scsi device from storage server 1
    [root@dev1 ~]# /sbin/scsi_id -g -u -s /block/sdbo
    26136623866316637
    Out of curiosity, what type of array are you running on?
    Self built.

    A storage server is basically a SuperMicro chassis, with 1TB SAS/SATA disks (max 24), and an HCA (Infiniband) card. RHEL/OL/Centos used as o/s with OFED driver stack. Dual port HCA connects to one or two Infiniband switches. SCST is used over Infiniband. This determines the number of I/O paths.

    The only real problem with this is that there is no memory caching on the storage server side. So if you do tons of I/O, do not expect stellar performance. However, it is relatively cheap and easy to put together - ideal in our case for development. Max of 24TB per chassis (assuming 1TB drives are used - higher capacity drives are also now available). The expensive part in this setup is the disks.

    2 chassis enable you to use ASM to mirror diskgroups across both servers. So in the above case devices scst1_sdm and scst2_sdm are mirrored by ASM.

    There are commercial solutions also available - running 3rd party storage software (running on the storage server) that provides the typical SAN type management controls, allowing you to created striped and mirrored LUNs on a number of disks and then "publish" these across the I/O fabric layer (using the Infiniband SRP protocol). This of course provides memory caching and so on. However, we had our share of issues with such s/w and getting it to work in a robust fashion on an OFED driver stack with RAC/ASM. SCST is a lot simpler with fewer moving parts. And Open Source that allows a certain level of hacking when needed.
  • 3. Re: scsi id
    898553 Newbie
    Currently Being Moderated
    Nice, it sounds like you guys are doing some pretty cool stuff.

    I'm definitely scratching my head at this one though. If it's a completely different storage server, I'd expect the WWID to be different for each LUN/disk/whatever.

    I wonder if there's some hardware layer in the middle that's trying to default or normalize to that particular WWID. I'll look into though as it's piqued my curiosity.
  • 4. Re: scsi id
    BillyVerreynne Oracle ACE
    Currently Being Moderated
    This has me confused too as I was under the impression that WWIDs will always be unique - which turned out not to be the case. And from reading old postings to Linux kernel driver mailing lists, this seemed to have been an issue in the past with certain h/w configs.

    I'm pretty certain we used the page 0x83 method in multipath.conf for generating WWIDs. And this particular RAC is more than a year old in its current incarnation. Which meant that WWIDs were unique and working - and then suddenly not after we turned on another IB switch (that has been racked and wired some time ago). Did this caused the problem? Or was the problem there all along but masked by something else?

    We installed sg3_utils and had a look at how a RAC node sees these "identical" scsi devices. The Device Identification VPD page for these 2 devices show the exact same designator:
    designator type: EUI-64 based,  code_set: Binary
          0x6136623866316637
    The vendor identifiers are however unique:
    vendor id: SCST_BIO
          vendor specific: a6b8f1f7-scst2_sdm
    // versus
    vendor id: SCST_BIO
          vendor specific: a6b8f1f7-scst1_sdm
    So did the EUI change (is that at all possible), or were the EUIs identical all along?

    Will have to put this down to a mystery for the time being. :-)
  • 5. Re: scsi id
    898553 Newbie
    Currently Being Moderated
    Out of curiosity, I was wondering if you happened to get to the bottom of this?
  • 6. Re: scsi id
    BillyVerreynne Oracle ACE
    Currently Being Moderated
    No.. unfortunately have very little time for doing proper post-mortems as there are always new and "exciting" problems to get into fights with..

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points