My file server had a malfunctioning PSU and took out my mirrored boot drives on SSD and some of my member mechanical HDDs in RAID-z2 array. I seem to have enough member disks survive to rebuild, but I haven't verified the zpool is in fact intact as I haven't gotten around putting them in a new Solaris environment. The OS partition is gone, so I'd have to make a new Solaris install then try to import the zpool with missing members. I was hoping to know if it was possible at all or if I need to send out dead boot drives and/or dead members to data recovery. Thanks for the help.
Whether your RAIDZ pool can survive this failure depends on how many devices were UNAVAIL
and for how long. Do you know if this is a RAIDZ1-3 pool? The RAIDZ level determines how
many device failures it can survive per RAIDZ VDEV.
You could boot from media to see if this pool will import. The state of the boot devices should
not determine the health of this pool. In most cases, if the devices are AVAIL, you can import
a ZFS storage pool on any system that supports its version.
Hi. Thanks for the information. All the drives are off-line at the moment. It's 6 disk (identical 1tb disks) raidz2 pool. I just want to make sure if I understand you correctly when you said how long the devices were unavailable matters in the survivability. As long as all the member drives are off line, the pool shouldn't degrade further correct? It's going to take me at least a week to put everything back together. Just in case the pool is in faulted state, would trying to repair the dead member drives and putting them back online help? The dead drives seem repairable since it's likely electronic failure only. Thank you very much.
It is difficult to predict power failures on devices. I've seen bad things happen.
Yes, if you bring all the devices back online and they are cabled and seated
similarly to their previous configuration, ZFS will have a better chance of reading
the devices and pool info, rather than also changing the h/w config when the
devices are replaced.
We had a pool go UNAVAIL last week after our lab manager detached the redundant
devices from a previously mirrored pool, because he wanted to make some changes
to detached LUNs. He accidentally offlined a live pool LUN for about 60 seconds.
The recovery was to reboot the system so that the device info and the pool info was
reread. The pool was back to AVAIL and only one 1 file was clobbered after several
pool scrubs. This file (script) was running when the LUN went UNAVAIL.
Thank you very much for all your help. I just have one final question. How important is the original port positions of the drive? I forgot to save the drive positions out of panic after the failures unfortunately.
ZFS is pretty good about finding the devices (based on their devids) regardless of how they are
plugged back (be sure to do this while the pool is still offline) and just try to put it back to the
original configuration as best as possible.
Rereading this thread, I see you still don't know whether you have enough healthy drives to
import this pool so that will be key first step after you get everything back together.