This discussion is archived
4 Replies Latest reply: Jan 4, 2013 1:03 PM by cindys RSS

zfs doesn't detect missing/failed hard drives

982267 Newbie
Currently Being Moderated
Hi all members of Oracle,
I am having a weird issue with Solaris 11. I hope somebody could help me out.
I have installed Solariss 11.1 with Napp-it and created a zfs folder from mirrored pool (8 vdevs)

I am testing the ability to detect missing/failed HDDs and resilver time. However, the zfs status doesn't report any missing HDDs if I pull one of them out, but the command cfgadm -al know the device is missing.
Zfs Status won't report any problems until I actually run some scan command such as format, cfgadm...
The chassis and motherboard I'm using is Supermicro and they are Hot-pluggable.
2 months ago I have tested that functionality, the zfs works very well, it can detect and re-detect quickly. But i don't remember what OS it was ( I tested on OpenIndiana, Solaris 11.0, Solaris 11.1)

Thanks for your help.
  • 1. Re: zfs doesn't detect missing/failed hard drives
    cindys Pro
    Currently Being Moderated
    Hi--

    Yes, I think this is a known bug that is fixed in an upcoming SRU.
    I'll have access to the bug info on Wednesday.

    This is a bug in the notification of a hot-plug operation but not in an
    actual disk failure notification. So, if a disk were to fail, ZFS would be
    notified.

    I can provide an example of ZFS detecting a failed disk later in the week.

    Thanks, Cindy
  • 2. Re: zfs doesn't detect missing/failed hard drives
    982267 Newbie
    Currently Being Moderated
    Thanks for your reply.
    I'm looking forward to hearing news from you.

    I have tried to do Fresh OS installation but no luck.
    It has worked well before.
    I though It might be the chassis, motherboard or LSI Card but I have tried all different one of them.

    OpenIndiana seems to response to missing HDDs quicker but it still cannot detect new hot-plug HDD.

    anyway, How can I access to this bug ??
  • 3. Re: zfs doesn't detect missing/failed hard drives
    cindys Pro
    Currently Being Moderated
    If you are using a system that uses the mpt_sas driver, then this is the bug I was referring to:

    Bug 15823614 - SUNBT7154192-SOLARIS_11U1 mpt_sas does not generate hot-plug sysevents for hot-remove

    This is fixed in S11.1 but I'm not sure in which SRU the fix exists.

    The above bug is that ZFS/FMA do not report a hot-remove operation. This means that your hot-plug
    testing is hit by this bug but an actual failed disk will work as expected. See the example below.

    I'm having trouble getting the right hardware set up today to provide an example of the correct
    failed disk interaction but I included one below from October when another customer saw the same
    failure with hot-remove testing.

    Thanks,
    Cindy

    # beadm list
    BE Active Mountpoint Space Policy Created
    -- ------ ---------- ----- ------ -------
    s11u1_24b NR / 4.00G static 2012-10-22 14:29

    # zpool status pond
    pool: pond
    state: ONLINE
    scan: none requested
    config:

    NAME STATE READ WRITE CKSUM
    pond ONLINE 0 0 0
    mirror-0 ONLINE 0 0 0
    c0t5000C500335DC60Fd0 ONLINE 0 0 0
    c0t5000C500335F907Fd0 ONLINE 0 0 0
    mirror-1 ONLINE 0 0 0
    c0t5000C500335BD117d0 ONLINE 0 0 0
    c0t5000C500335E106Bd0 ONLINE 0 0 0
    spares
    c0t5000C500335FC3E7d0 AVAIL

    errors: No known data errors

    Fail a disk in pond:

    SUNW-MSG-ID: ZFS-8000-LR, TYPE: Fault, VER: 1, SEVERITY: Major
    EVENT-TIME: Mon Oct 22 15:20:41 MDT 2012
    PLATFORM: ORCL,SPARC-T3-4, CSN: 1120BDRCCD, HOSTNAME: gibberfish
    SOURCE: zfs-diagnosis, REV: 1.0
    EVENT-ID: 9e8cc30b-5b4f-c055-ef30-9e349f8e764c
    DESC: ZFS device 'id1,sd@n5000c500335e106b/a' in pool 'pond' failed to open.
    AUTO-RESPONSE: An attempt will be made to activate a hot spare if available.
    IMPACT: Fault tolerance of the pool may be compromised.
    REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-LR for the latest service procedures and policies regarding this diagnosis.
    root@gibberfish:/# zpool status pond
    pool: pond
    state: DEGRADED
    status: One or more devices are unavailable in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
    action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or 'fmadm repaired', or replace the device
    with 'zpool replace'.
    Run 'zpool status -v' to see device specific details.
    scan: resilvered 43.4M in 0h0m with 0 errors on Mon Oct 22 15:20:45 2012
    config:

    NAME STATE READ WRITE CKSUM
    pond DEGRADED 0 0 0
    mirror-0 ONLINE 0 0 0
    c0t5000C500335DC60Fd0 ONLINE 0 0 0
    c0t5000C500335F907Fd0 ONLINE 0 0 0
    mirror-1 DEGRADED 0 0 0
    c0t5000C500335BD117d0 ONLINE 0 0 0
    spare-1 DEGRADED 0 0 0
    c0t5000C500335E106Bd0 UNAVAIL 0 0 0
    c0t5000C500335FC3E7d0 ONLINE 0 0 0
    spares
    c0t5000C500335FC3E7d0 INUSE

    errors: No known data errors

    # fmadm faulty
    --------------- ------------------------------------ --------------
    ---------
    TIME EVENT-ID MSG-ID SEVERITY
    --------------- ------------------------------------ --------------
    ---------
    Oct 22 15:20:41 9e8cc30b-5b4f-c055-ef30-9e349f8e764c ZFS-8000-LR Major

    Problem Status : solved
    Diag Engine : zfs-diagnosis / 1.0
    System
    Manufacturer : unknown
    Name : ORCL,SPARC-T3-4
    Part_Number : unknown
    Serial_Number : 1120BDRCCD
    Host_ID : 84a02d28

    ----------------------------------------
    Suspect 1 of 1 :
    Fault class : fault.fs.zfs.open_failed
    Certainty : 100%
    Affects :
    zfs://pool=7e4c5f3e419b100a/vdev=65e0a4d47f39b86e/pool_name=pond/vdev_name=id1,sd@n5000c500335e106b/a
    Status : faulted and taken out of service

    FRU
    Name :
    "zfs://pool=7e4c5f3e419b100a/vdev=65e0a4d47f39b86e/pool_name=pond/vdev_name=id1,sd@n5000c500335e106b/a"
    Status : faulty

    Description : ZFS device 'id1,sd@n5000c500335e106b/a' in pool 'pond'
    failed to
    open.

    Response : An attempt will be made to activate a hot spare if available.

    Impact : Fault tolerance of the pool may be compromised.

    Action : Use 'fmadm faulty' to provide a more detailed view of this
    event.
    Run 'zpool status -lx' for more information. Please refer to the
    associated reference document at
    http://support.oracle.com/msg/ZFS-8000-LR for the latest service
    procedures and policies regarding this diagnosis.
  • 4. Re: zfs doesn't detect missing/failed hard drives
    cindys Pro
    Currently Being Moderated
    The fix for bug 15823614 is available in S11.1 SRU 2. Thanks, Cindy

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points