This discussion is archived
1 Reply Latest reply: Jun 19, 2013 12:14 PM by user13412770 RSS

Re: Warning message about SAN disk (HITACHI)

Gwidion Newbie
Currently Being Moderated
I have a large setup, 6 distinct Solaris 10u10 hosts (no clustering) at newest CPU patch release.
Each system has 2 dual-port 8g Oracle-branded QLogic HBAs -
Model: 371-4325-02
FCode/BIOS Version: BIOS: 2.10; fcode: 2.04; EFI: 2.04;

Kernel Setting everywhere:
set ssd:ssd_max_throttle=8
set ssd:ssd_io_time=0x3c

MPX-IO enabled using default round robin algorithm.

Storage platform is Hitachi VSP

About 300 LUNS presented to all HBAs (24 HBAs across 6 hosts)

When I issue a zpool import command to scan all disks for zfs pools availabe for import, i get a storm of scsi errors sent from the VSP to all hosts (not just the one I am doing the zpool import command):

May 13 12:11:42 host scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060e8016012b000001012b000001b9 (ssd251):
May 13 12:11:42 host Error for Command: read(10) Error Level: Retryable
May 13 12:11:42 host scsi: [ID 107833 kern.notice] Requested Block: 47360903 Error Block: 47360903
May 13 12:11:42 host scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 50 1012B01B9
May 13 12:11:42 host scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
May 13 12:11:42 host scsi: [ID 107833 kern.notice] ASC: 0x2a (parameters changed), ASCQ: 0x0, FRU: 0x0

This is just an example messages.. hundreds or thousands of these errors occur on each host, with various disks (ssd#/serial #) and various blocks, and either a read(10) or a write(10).

SAN performance seems to be impacted during the errors (qualitatively observed).

Any guidance out there?
  • 1. Re: Warning message about SAN disk (HITACHI)
    user13412770 Newbie
    Currently Being Moderated

    Did you try to match the ZFS Max Pending? It needs to match the queue depth.

     

    set zfs:zfs_vdev_max_pending=8

     

    I have 8 T4-4s with about a 400 LUNs each and I have had to play around with the settings until I found the best I/O performance was with a queue depth of 4:

    set ssd:ssd_max_throttle=4

    set ssd:ssd_io_time=0x3c

    set zfs:zfs_vdev_max_pending=4

     

    The problem that I have now is we are expecting to have over 1000 LUNs on each server and by following Hitachi's best practice my queue depth should now go to 2.

     

    Hitachi's Formula to calculate queue depth:

    (Number of LUs) ×  (queue_depth) ≤  2048 and queue_depth ≤  32

    1000 * 4 = 4000 (which is greater than 2048)

    1000 * 2 = 2000 (This works, but am I limited to only a 1000 luns before I/O problems occur?)

     

    Hope the zfs_vdev_max_pending helps.