11 Replies Latest reply on May 23, 2018 1:13 PM by Billy~Verreynne

    multipath does not persist accross reboots

    3502190

      Hello

       

      We have an issue with a backup server.

       

      We have created two multipath deviced mpathbb and mpathbc

       

      and we have create a raid 1 using mdadm like this:

       

      mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 /dev/mapper/mpathbbp1 /dev/mapper/mpathbcp1

       

      everything was fine the raid was OK and everything

      but we had to reboot the device and all the multipath are gone and the md devices are faulty.

       

      mdadm --detail /dev/md1

      /dev/md1:

              Version : 1.2

        Creation Time : Wed May 16 11:33:11 2018

           Raid Level : raid1

           Array Size : 418211584 (398.84 GiB 428.25 GB)

        Used Dev Size : 418211584 (398.84 GiB 428.25 GB)

         Raid Devices : 2

        Total Devices : 2

          Persistence : Superblock is persistent

       

       

        Intent Bitmap : Internal

       

       

          Update Time : Tue May 22 11:52:06 2018

                State : active, degraded

      Active Devices : 1

      Working Devices : 1

      Failed Devices : 1

       

        Spare Devices : 0

       

       

                 Name : endor:1  (local to host endor)

                 UUID : 23999c17:96087b94:0d04bed5:43af255e

               Events : 120363

       

       

          Number   Major   Minor   RaidDevice State

             0       0        0        0      removed

             1       8      241        1      active sync   /dev/sdp1

       

       

             0      65       49        -      faulty   /dev/sdt1

       

      trying to rediscover the multipaths gives the errors:

       

      multipath -r -v 2

      May 22 12:15:33 | mpath target must be >= 1.5.0 to have support for 'retain_attached_hw_handler'. This feature will be disabled

      May 22 12:15:33 | mpathbc: ignoring map

      May 22 12:15:33 | mpath target must be >= 1.5.0 to have support for 'retain_attached_hw_handler'. This feature will be disabled

      May 22 12:15:33 | mpathbc: ignoring map

      May 22 12:15:33 | mpath target must be >= 1.5.0 to have support for 'retain_attached_hw_handler'. This feature will be disabled

      May 22 12:15:33 | mpathbc: ignoring map

      May 22 12:15:33 | mpath target must be >= 1.5.0 to have support for 'retain_attached_hw_handler'. This feature will be disabled

      May 22 12:15:34 | mpathbc: ignoring map

      May 22 12:15:34 | mpath target must be >= 1.5.0 to have support for 'retain_attached_hw_handler'. This feature will be disabled

      May 22 12:15:34 | mpathbb: ignoring map

      May 22 12:15:34 | mpath target must be >= 1.5.0 to have support for 'retain_attached_hw_handler'. This feature will be disabled

      May 22 12:15:34 | mpathbb: ignoring map

      May 22 12:15:34 | mpath target must be >= 1.5.0 to have support for 'retain_attached_hw_handler'. This feature will be disabled

      May 22 12:15:34 | mpathbb: ignoring map

      May 22 12:15:34 | mpath target must be >= 1.5.0 to have support for 'retain_attached_hw_handler'. This feature will be disabled

      May 22 12:15:34 | mpathbb: ignoring map

       

       

       

       

       

       

      in the dmesg we see the following:

       

      device-mapper: table: 252:15: multipath: error getting device

      device-mapper: ioctl: error adding target to table

      device-mapper: table: 252:15: multipath: error getting device

      device-mapper: ioctl: error adding target to table

      device-mapper: table: 252:15: multipath: error getting device

      device-mapper: ioctl: error adding target to table

      device-mapper: table: 252:15: multipath: error getting device

      device-mapper: ioctl: error adding target to table

       

      any idea how to recover the multipath and fixe the raid that appears faulty but really it isn't and how to prevent this to happen with reboots?

       

      thanks

        • 1. Re: multipath does not persist accross reboots
          Billy~Verreynne

          Kernel version?

           

          Contents of your server's /etc/multipath.conf file?

           

          Any customisation done to udev rules?

          • 2. Re: multipath does not persist accross reboots
            Dude!

            everything was fine the raid was OK and everything

            but we had to reboot the device and all the multipath are gone and the md devices are faulty.

            Reboot what device - what did you reboot exactly?

             

            I think it's rather obvious that your RAID mirror fails because one the RAID members is no longer available. Hence you see only 1 active and working device, and 1 device failure. It looks to me like there is a problem with your storage system or it's configuration. What changed since it worked?

            • 3. Re: multipath does not persist accross reboots
              3502190

              Hello the kernel is

               

              2.6.39-400.298.7.el6uek.x86_64 #1 SMP Mon May 7 18:14:23 PDT 2018 x86_64 x86_64 x86_64 GNU/Linux

               

              we rebooted the server

               

              what baffles me is that there are no more multipath devices after the reboot of the server and that the md device has changed to another device like /dev/sdx while i created it with /dev/mapper/mpathxx

              • 4. Re: multipath does not persist accross reboots
                3502190

                these are the contents of the multipath.conf:

                 

                cat /etc/multipath.conf

                 

                 

                ## IMPORTANT for OVS do not blacklist all devices by default.

                #blacklist {

                #        devnode "*"

                #}

                 

                 

                ## By default, devices with vendor = "IBM" and product = "S/390.*" are

                ## blacklisted. To enable mulitpathing on these devies, uncomment the

                ## following lines.

                #blacklist_exceptions {

                #       device {

                #               vendor  "IBM"

                #               product "S/390.*"

                #       }

                #}

                #blacklist_exceptions {

                #       device {

                #               vendor  "DGC"

                #               product "VRAID*"

                #       }

                #}

                 

                 

                ## IMPORTANT for OVS this must be no. OVS does not support user friendly

                ## names and instead uses the WWIDs as names.

                defaults {

                        user_friendly_names yes

                        getuid_callout "/lib/udev/scsi_id --whitelisted --replace-whitespace --device=/dev/%n"

                        path_grouping_policy    multibus

                 

                 

                }

                 

                 

                # List of device names to discard as not multipath candidates

                #

                ## IMPORTANT for OVS do not remove the black listed devices.

                blacklist {

                #       devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st|nbd)[0-9]*"

                #       devnode "^hd[a-z][0-9]*"

                #       devnode "^sd[a-n][0-9]*"

                wwid 3600605b00422aa801cdddf382fef19c9

                wwid 3600605b00422aa801cdce5ea53cfb37f

                wwid 3600605b00422aa801cdcad7bf6b8ae52

                wwid 3600605b00422aa8016dc9b0117da9efb

                wwid 3600605b00422aa8016dc9b0b1875b07e

                wwid 3600605b00422aa8016dc9b171920aeac

                wwid 3600605b00422aa8016dc9b1f19a5f38f

                wwid 3600605b00422aa8016dc9b281a23ab1c

                wwid 3600605b00422aa8016dc9b301aa166f8

                wwid 3600605b00422aa8016dc9b381b1e4a3d

                wwid 3600605b00422aa8016dc9b401b9c0458

                wwid 3600605b00422aa8016dc9b481c18e680

                wwid 3600605b00422aa8016dc9b511c96a135

                wwid 3600605b00422aa8016dc9b591d1533c7

                wwid 3600605b00422aa801b734117b88ff849

                wwid 3600605b00422aa8016dc9b6a1e132b56

                wwid 3600605b00422aa801b725669bba7e06f

                #       devnode "^etherd"

                #       devnode "^nvme.*"

                #        %include "/etc/blacklisted.wwids"

                }

                 

                 

                ##

                ## Here is an example of how to configure some standard options.

                ##

                #

                #defaults {

                #       udev_dir                /dev

                #       polling_interval        10

                #       selector                "round-robin 0"

                #       path_grouping_policy    multibus

                #       getuid_callout          "/lib/udev/scsi_id --whitelisted --device=/dev/%n"

                #       prio                    alua

                #       path_checker            readsector0

                #       rr_min_io               100

                #       max_fds                 8192

                #       rr_weight               priorities

                #       failback                immediate

                #       no_path_retry           fail

                #       user_friendly_names     no

                #}

                ##

                ## The wwid line in the following blacklist section is shown as an example

                ## of how to blacklist devices by wwid.  The 2 devnode lines are the

                ## compiled in default blacklist. If you want to blacklist entire types

                ## of devices, such as all scsi devices, you should use a devnode line.

                ## However, if you want to blacklist specific devices, you should use

                ## a wwid line.  Since there is no guarantee that a specific device will

                ## not change names on reboot (from /dev/sda to /dev/sdb for example)

                ## devnode lines are not recommended for blacklisting specific devices.

                ##

                #blacklist {

                #       wwid 26353900f02796769

                #       devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"

                #       devnode "^hd[a-z]"

                #       devnode "^sd[ab]"

                #}

                #multipaths {

                #       multipath {

                #               wwid                    3600508b4000156d700012000000b0000

                #               alias                   yellow

                #               path_grouping_policy    multibus

                #               path_selector           "round-robin 0"

                #               failback                manual

                #               rr_weight               priorities

                #               no_path_retry           10

                #       }

                #       multipath {

                #               wwid                    1DEC_____321816758474

                #               alias                   red

                #       }

                #}

                devices {

                #       device {

                #               vendor                  "COMPAQ  "

                #               product                 "HSV110 (C)COMPAQ"

                #               path_grouping_policy    multibus

                #               getuid_callout          "/lib/udev/scsi_id --whitelisted --device=/dev/%n"

                #               path_checker            readsector0

                #               path_selector           "round-robin 0"

                #               hardware_handler        "0"

                #               failback                15

                #               rr_weight               priorities

                #               no_path_retry           queue

                #       }

                #       device {

                #               vendor                  "COMPAQ  "

                #               product                 "MSA1000         "

                #               path_grouping_policy    multibus

                #       }

                #

                 

                 

                 

                 

                 

                 

                        #

                        # IBM DS4100 :: Active-Passive

                        # IBM storage expert says functionally equivalent to DS4300

                        #

                        device {

                                vendor                  "IBM"

                                product                 "1724-100"

                                hardware_handler        "1 rdac"

                                path_grouping_policy    group_by_prio

                                prio                    rdac

                                path_checker            rdac

                                no_path_retry           10

                        }

                 

                 

                 

                 

                        #

                        # IBM DS4400 (FAStT700) :: Active-Passive

                        # Verified @ Waltham, IBM

                        #

                        device {

                                vendor                  "IBM"

                                product                 "1742-900"

                                hardware_handler        "1 rdac"

                                path_grouping_policy    group_by_prio

                                prio                    "rdac"

                                failback                immediate

                                path_checker            rdac

                                no_path_retry           10

                        }

                 

                 

                 

                 

                        #

                        # IBM XIV Nextra - combined iSCSI/FC :: Active-Active

                        # Verified @ Waltham, IBM

                        #

                        device {

                                vendor                  "XIV"

                                product                 "NEXTRA"

                                path_grouping_policy    multibus

                                rr_min_io               1000

                                path_checker            tur

                                failback                immediate

                                no_path_retry           10

                        }

                 

                 

                        #

                        # Re-branded XIV Nextra

                        #

                        device {

                                vendor                  "IBM"

                                product                 "2810XIV"

                                path_grouping_policy    multibus

                                rr_min_io               1000

                                path_checker            tur

                                failback                immediate

                                no_path_retry           10

                        }

                 

                 

                        #

                        #       HP MSA1510i     :: Active-Active. Latest firmare (v2.00) supports Linux.

                        #       Tested @ HP, Marlboro facility.

                        #

                        device {

                                vendor                  "HP"

                                product                 "MSA1510i VOLUME"

                                path_grouping_policy    group_by_prio

                                path_checker            tur

                                prio                    "alua"

                                no_path_retry           10

                        }

                 

                 

                 

                 

                        #

                        #       DataCore SANmelody FC and iSCSI :: Active-Passive

                        #

                        device {

                                vendor                  "DataCore"

                                product                 "SAN*"

                                path_grouping_policy    failover

                                path_checker            tur

                                failback                10

                                no_path_retry           10

                        }

                 

                 

                        #

                        #       EqualLogic iSCSI :: Active-Passive

                        #

                        device {

                                vendor                  "EQLOGIC"

                                product                 "100E-00"

                                path_grouping_policy    failover

                                failback                immediate

                                no_path_retry           10

                        }

                 

                 

                        #

                        #       Compellent FC :: Active-Active

                        #

                        device {

                                vendor                  "COMPELNT"

                                product                 "Compellent *"

                                path_grouping_policy    multibus

                                path_checker            tur

                                failback                immediate

                                rr_min_io               1024

                                no_path_retry           10

                        }

                 

                 

                        #

                        #       FalconStor :: Active-Active

                        #

                        device {

                                vendor                  "FALCON"

                                product                 ".*"

                                path_grouping_policy    multibus

                                failback                immediate

                                no_path_retry           10

                        }

                 

                 

                        #

                        #     EMD FC (ES 12F) and iSCSI (SA 16i) :: Active-Active

                        #     Tested in-house.

                        device {

                                vendor                  "EMD.*"

                                product                 "ASTRA (ES 12F)|(SA 16i)"

                                path_grouping_policy    failover

                                failback                immediate

                                path_checker            tur

                                no_path_retry           10

                       }

                        #

                        #       Fujitsu :: Active-Passive (ALUA)

                        #

                        device {

                                vendor                  "FUJITSU"

                                product                 "E[234]000"

                                path_grouping_policy    group_by_prio

                                prio                    "alua"

                                failback                immediate

                                no_path_retry           10

                                path_checker            tur

                        }

                        #

                        #       Fujitsu :: Active-Active

                        #

                        device {

                                vendor                  "FUJITSU"

                                product                 "E[68]000"

                                path_grouping_policy    multibus

                                failback                immediate

                                no_path_retry           10

                                path_checker            tur

                        }

                        #

                        #       JetStor :: Active-Active

                        #       Tested in-house.

                        device {

                                vendor                  "AC&Ncorp"

                                product                 "JetStorSAS516iS"

                                path_grouping_policy    multibus

                                failback                15

                                no_path_retry           10

                                rr_weight               priorities

                                path_checker            tur

                        }

                        #

                        #       Xyratex/Overland :: Active-Active

                        #       Tested in-house

                        #

                        device {

                                vendor                  "XYRATEX"

                                product                 "F5402E|[FE]5412E|[FE]5404E|F6512E|[FEI]6500E"

                                path_grouping_policy    failover

                                failback                3

                                no_path_retry           10

                                path_checker            tur

                        }

                 

                 

                        device {

                                vendor "FUJITSU"

                                product "ETERNUS_DXM|ETERNUS_DXL|ETERNUS_DX400|ETERNUS_DX8000"

                                prio alua

                                path_grouping_policy group_by_prio

                                path_selector "round-robin 0"

                                failback immediate

                                no_path_retry 10

                         }

                 

                 

                        #

                        #       Revert to pre rel6 settings for OVS

                        #

                        device {

                                vendor "NETAPP"

                                product "LUN.*"

                                dev_loss_tmo 50

                        }

                 

                 

                        device {

                                vendor          "ATA*"

                                product         ".*"

                        }

                 

                 

                }

                • 5. Re: multipath does not persist accross reboots
                  Billy~Verreynne

                  3502190 wrote:

                   

                  what baffles me is that there are no more multipath devices after the reboot of the server and that the md device has changed to another device like /dev/sdx while i created it with /dev/mapper/mpathxx

                  The kernel sees the SAN/NAS disks or LUNs as scsi devices - thus expect to see these as /dev/sdXX devices.

                   

                  As a disk/LUN can (and often is) seen via multiple I/O paths (e.g. dual port fibre channels), the same disk will have 2 or more /dev/sdXX devices.

                   

                  These dev device names also do not remain the same after each reboot - the sequence in which the fibre channels, for example, discovers disks, are not always the same.

                   

                  This means

                  a) device names are dangerous to use directly as the devices names do not stay the same

                  b) with a device name, you are using a single specific I/O path to the disk - no redundancy and no load balancing/spreading via multiple I/O paths

                   

                  Multipath deals with this by

                  a) discovering which dev devices refers to the same disk (are different I/O paths to the same disk)

                  b) creates a logical (mpath) device for these dev devices to address that specific storage disk/LUN

                   

                  This is why the /etc/multipath.conf is important. It tells multipath how to identify the target disk at the end of an I/O path (dev device), the parameters to use for the mpath device (such as round robin I/O to the underlying dev devices), etc.

                   

                  To get the same mpath device for a disk/LUN, the WWID of that disk is mapped to the mpath device name to use.

                   

                  Likewise, some disks/LUNs seen are not-to-be-used (e.g. admin LUNs) - these disks are blacklisted via their WWIDs.

                   

                  It seems to me that your /etc/multipath.conf is mostly defaults, with some custom list of WWIDs to blacklist?

                   

                  In my case - we always use a custom /etc/multipath.conf file (since RHEL3). Configure the vendor parameters to use. Explicitly blacklist not relevant WWIDs, and map relevant WWIDs to mpath devices.

                   

                  My suggestion is getting a sane  /etc/multipath.conf config, and verify it by flushing it (multipath -f), and rerunning it in verbose mode (multipath -v3), until it works as expected.

                  • 6. Re: multipath does not persist accross reboots
                    3502190

                    the only thing that changed was reboot the server to install a new kernel. after a yum update.

                     

                    the raid deivce was created using the multipathed devices /dev/mapper/mpathxx after reboot it has change do /dev/sdx

                     

                    all the disk are fine.

                    this are the output of blkid with the disk having the uuid of the raid:

                     

                    /dev/sdz1: UUID="23999c17-9608-7b94-0d04-bed543af255e" UUID_SUB="92ccede0-8baa-728d-3680-c409a0e59681" LABEL="endor:1" TYPE="linux_raid_member"

                    /dev/sdp1: UUID="23999c17-9608-7b94-0d04-bed543af255e" UUID_SUB="92ccede0-8baa-728d-3680-c409a0e59681" LABEL="endor:1" TYPE="linux_raid_member"

                    /dev/sdv1: UUID="23999c17-9608-7b94-0d04-bed543af255e" UUID_SUB="8d69ddf6-32c5-d0e2-1c70-2b988e5b3e47" LABEL="endor:1" TYPE="linux_raid_member"

                    /dev/sdab1: UUID="23999c17-9608-7b94-0d04-bed543af255e" UUID_SUB="8d69ddf6-32c5-d0e2-1c70-2b988e5b3e47" LABEL="endor:1" TYPE="linux_raid_membe

                    • 7. Re: multipath does not persist accross reboots
                      Dude!

                      Can you reproduce the issue booting the previous kernel?

                      • 8. Re: multipath does not persist accross reboots
                        Billy~Verreynne

                        What does multipath -l display?

                        • 9. Re: multipath does not persist accross reboots
                          3502190

                          Hello,

                           

                          multipath -l returns nothing. so does multipath -ll

                           

                          multipath -ll -v 3 returns the following https://pastebin.com/TkxpZB22

                           

                          at the moment we are running backups so we cannot test rebooting on an old kernel..

                           

                          i will check how to configure back the mulitipathing without having to stop the raid.. which i fear would be the only option but then how to assemble back the array.. mdadm --assemble did not work last time i tested.

                          • 10. Re: multipath does not persist accross reboots
                            Dude!

                            I would not. You are going to increase the odds. I would suggest to wait for the backup to finish and restart the system using the previous working kernel. This should only take minutes. If everything's fine, you know that the problem is not the configuration and can look at things like driver version, storage firmware, kernel parameters, etc. If it no longer works with the previous kernel, then your configuration orstorage has changed.

                            • 11. Re: multipath does not persist accross reboots
                              Billy~Verreynne

                              Your server's kernel is seeing LUNs from an EMC SAN (Clariion).  These needs to be handled by EMC's Powerpath (taints the kernel), or by Multipath (default GPL Linux).

                               

                              A lspci command should show a couple of HCA fibre channels (8GB if new, 2GB if old).

                               

                              I do not recommend Powerpath - I prefer an untainted Linux kernel.

                               

                              For Multipath you need to configure SAN/LUN access, e.g.

                              devices{
                                      device {
                                              vendor "DGC"
                                              product ".*"
                                              product_blacklist "LUNZ"
                                              path_grouping_policy group_by_prio
                                              getuid_callout "/lib/udev/scsi_id --page=0x83 --whitelisted --device=/dev/%n"
                                              path_checker emc_clariion
                                              path_selector "round-robin 0"
                                              features "1 queue_if_no_path"
                                              prio emc
                                              hardware_handler "1 emc"
                                              no_path_retry 60
                                              failback immediate
                                              rr_weight uniform
                                              rr_min_io 1000
                                      }
                              }
                              

                               

                              And then for persistent mpath device name, you need to match a logical device name to the WWID of a LUN, e.g.

                              multipaths {
                                      multipath {
                                              wwid    360060160abf02e00b798ce428e1ce411
                                              alias   lun1
                                      }
                                      multipath {
                                              wwid    360060160abf02e00bbdc092e8e1ce411
                                              alias   lun2
                                      }
                              }
                              

                               

                              Ensure the rest are blacklisted.

                               

                              As for using EMC LUNs as local RAID devices - why? These LUNs are likely already RAID5 or RAID10 on the SAN. Why add another layer of RAID from the server's side?

                               

                              As for the errors you are seeing - likely the previously working multipath.conf file was trashed by the o/s upgrade.