1 Reply Latest reply: Jun 19, 2013 2:14 PM by user13412770 RSS

    Re: Warning message about SAN disk (HITACHI)

    Gwidion
      I have a large setup, 6 distinct Solaris 10u10 hosts (no clustering) at newest CPU patch release.
      Each system has 2 dual-port 8g Oracle-branded QLogic HBAs -
      Model: 371-4325-02
      FCode/BIOS Version: BIOS: 2.10; fcode: 2.04; EFI: 2.04;

      Kernel Setting everywhere:
      set ssd:ssd_max_throttle=8
      set ssd:ssd_io_time=0x3c

      MPX-IO enabled using default round robin algorithm.

      Storage platform is Hitachi VSP

      About 300 LUNS presented to all HBAs (24 HBAs across 6 hosts)

      When I issue a zpool import command to scan all disks for zfs pools availabe for import, i get a storm of scsi errors sent from the VSP to all hosts (not just the one I am doing the zpool import command):

      May 13 12:11:42 host scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60060e8016012b000001012b000001b9 (ssd251):
      May 13 12:11:42 host Error for Command: read(10) Error Level: Retryable
      May 13 12:11:42 host scsi: [ID 107833 kern.notice] Requested Block: 47360903 Error Block: 47360903
      May 13 12:11:42 host scsi: [ID 107833 kern.notice] Vendor: HITACHI Serial Number: 50 1012B01B9
      May 13 12:11:42 host scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention
      May 13 12:11:42 host scsi: [ID 107833 kern.notice] ASC: 0x2a (parameters changed), ASCQ: 0x0, FRU: 0x0

      This is just an example messages.. hundreds or thousands of these errors occur on each host, with various disks (ssd#/serial #) and various blocks, and either a read(10) or a write(10).

      SAN performance seems to be impacted during the errors (qualitatively observed).

      Any guidance out there?
        • 1. Re: Warning message about SAN disk (HITACHI)
          user13412770

          Did you try to match the ZFS Max Pending? It needs to match the queue depth.

           

          set zfs:zfs_vdev_max_pending=8

           

          I have 8 T4-4s with about a 400 LUNs each and I have had to play around with the settings until I found the best I/O performance was with a queue depth of 4:

          set ssd:ssd_max_throttle=4

          set ssd:ssd_io_time=0x3c

          set zfs:zfs_vdev_max_pending=4

           

          The problem that I have now is we are expecting to have over 1000 LUNs on each server and by following Hitachi's best practice my queue depth should now go to 2.

           

          Hitachi's Formula to calculate queue depth:

          (Number of LUs) ×  (queue_depth) ≤  2048 and queue_depth ≤  32

          1000 * 4 = 4000 (which is greater than 2048)

          1000 * 2 = 2000 (This works, but am I limited to only a 1000 luns before I/O problems occur?)

           

          Hope the zfs_vdev_max_pending helps.