SAN Migration - What did I do wrong? — oracle-tech

    Forum Stats

  • 3,708,760 Users
  • 2,241,122 Discussions
  • 7,840,595 Comments

Discussions

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

SAN Migration - What did I do wrong?

Alan3Alan3 Posts: 317 Bronze Badge

Hello all,

I had a wild situation this weekend that I could not figure out what went wrong. Luckily a reboot fixed everything but this should not have happened.

Simplified scenario:

Old SAN (XIV) off-lease, replacing with new SAN (PURE).

Oracle VM Host (3.4.6) connected to both SANs like so - with the local disks being used as the local repositories:

----------------

[[email protected] ~]# multipath -ll

3600605b00f40fa20251e5dd73eb5b299 dm-1 Lenovo,RAID 530-8i

size=930G features='1 queue_if_no_path' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

 `- 14:2:2:0 sdc 8:32 active ready running

3624a9370431f3587de4143400002e61c dm-4 PURE,FlashArray

size=750G features='0' hwhandler='1 alua' wp=rw

`-+- policy='queue-length 0' prio=50 status=active

 |- 15:0:5:1 sdh 8:112 active ready running

 |- 15:0:4:1 sdi 8:128 active ready running

 |- 15:0:2:1 sdj 8:144 active ready running

 |- 15:0:3:1 sdk 8:160 active ready running

 |- 16:0:2:1 sdl 8:176 active ready running

 |- 16:0:3:1 sdm 8:192 active ready running

 |- 16:0:4:1 sdn 8:208 active ready running

 `- 16:0:5:1 sdo 8:224 active ready running

3600605b00f40fa20251e5d7db7c80a96 dm-0 Lenovo,RAID 530-8i

size=830G features='1 queue_if_no_path' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

 `- 14:2:1:0 sdb 8:16 active ready running

20017380032af145f dm-2 IBM,2810XIV

size=385G features='1 queue_if_no_path' hwhandler='0' wp=rw

`-+- policy='round-robin 0' prio=1 status=active

 |- 15:0:0:1 sdd 8:48 active ready running

 |- 16:0:0:1 sdf 8:80 active ready running

 |- 15:0:1:1 sde 8:64 active ready running

 `- 16:0:1:1 sdg 8:96 active ready running

--------------------------

There is ONE volume mapped from the XIV that is attached to a running VM on the host as shown here:

xvdc>xvdc1>XIVVG-XIVLV mounted to /DOCSRV

When the new PURE SAN was configured we created a new, empty volume and attached to the running VM like:

xvdd>xvdd1>PUREVG-PURELV mounted to /DOCSRV-NEW

Both are formatted as ext4 filesystems. All of the above happened with all services running on the VM.

At this point we didn't want to chance any corruption so we shutdown the running services on the VM and copied all the contents on DOCSRV to DOCSRV-NEW. To maintain all permissions and ownerships we used tar as the copy mechanism. No problems here. It went as expected.

Then unmounted both and remounted the PURELV to /DOCSRV (leaving the XIVLV unmounted.)

All of this went as expected and services were brought online.

Fastforward 3 weeks and we are ready to remove the XIV from service. All we need to do is remove XIVVG and xvdc.

I issued the following:

# lvchange -an /dev/XIVVG/XIVLV

[[email protected] ~]# lvremove /dev/XIVVG/XIVLV

 Logical volume "XIVLV" successfully removed

At this point, I LOST the /DOCSRV mount. I could still see that PURELV was there, but I COULD NOT MOUNT IT.

No errors, no message, just no mount. All I could do was grit my teeth and start shutting everything down.

I shutdown all running VMs, shutdown the host machine, unmapped the host from the XIV SAN, and restarted the host.

At this point, OVM came back normally. The physical disk from the XIV was now showing a 'Warning' in the OVM console since there was no mapping to it.

I removed the XIV disk from the VM config and started the VM.

Luckily, it came online normally. The PURELV was mounted as expected and everything was intact.

What I CAN'T figure out is why removing the XIVVG caused the PUREVG to lose it's mount.

What did I do wrong and what should I do in the future to keep this from happening?

Comments

  • Alan3Alan3 Posts: 317 Bronze Badge

    I believe I've found the issue here with an ancient systemd problem.


  • Court_Court_ Posts: 126 Blue Ribbon

    You have a lot of info, but from what I can gather you did a lot of extra work. I am assuming you are using LVM, you don't state it directly. You could have simply add the new disk to the volume group. Then used pvmove or lvm mirroring to get the extents on the new disk. Depending on the amount of data, I prefer mirroring as it can just run in the background and i/o performance isn't degraded. Once complete you break the mirror, remove the old disk from the VG and you could pvremove the LVM structures on the old disk. Then get the list of multipath disks, delete the mutlipath decvice, then echo 1 to /sys/block/sdX/device/delete.

    In this scenario everything is donw online. No need to even take down apps. I hate to admit that I have moved petabyes, if not exabytes using these same methods over the past 10 years.

    Systemd uses a process called systemd-fstab-generator to create unit files for each mount. fstab is mainly there for us old sysadmins. I wouldn't like creating unit files for mounts. Anyway, one issue I have has is if you keep the same mount point, but change the device file for the fs. You need to run systemctl daemon-reload to get the unit file changed to the new device. If not the mount/umount will reference what is in the unit file, not what is in fstab.

Sign In or Register to comment.