1 Reply Latest reply on Dec 11, 2012 10:13 AM by 854078

    storage latency

      I've posted this question to the OpenSolaris storage mailing list already, but it looks rather inactive, so I am reposting it here:

      I have a Solaris Express 11 playbox with COMSTAR on an Emulex LP9802DC in target mode which is used as a storage backend in a 2 GB/s FC environment. It exposes zfs volumes from the "tank" pool. The FC client hosts are Linux machines connecting through QLA ISP2312 based HBAs. When accessing the target, I can see considerable latency for each command given. Running something like "dd if=/dev/sdc bs=1M of=/dev/null" along with iostat -x 1 -m /dev/sdc on the Linux machines gives me the following numbers most of the time:

      Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
      sdc 2666.00 0.00 86.00 0.00 10.75 0.00 256.00 1.97 22.63 11.57 99.50

      where "await" is the average wait time for a command in milliseconds - it varies from 20 to 25 and rMB/s is the throughput - here just around 10 MB/s.

      The "zpool iostat -v" run in parallel on the Solaris machine shows rather low throughput numbers, probably due to ARC:

      capacity operations bandwidth
      pool alloc free read write read write
      tank 1.07T 571G 3 13 26.4K 106K
      c8t0d0 1.07T 571G 3 0 26.4K 0
      mirror 40.7M 1.90G 0 13 0 106K
      c7t0d0s0 - - 0 13 0 106K
      c7t1d0s0 - - 0 13 0 106K
      cache - - - - - -
      c7t2d0p2 53.5M 20.8G 0 6 0 436K
      c7t3d0p2 16.9M 20.8G 0 0 0 0

      What I've found out so far is that the rate is mainly limited by the latency and the request size - the maximum queue length is 2, the request size 128K and combined with a latency of 22 ms you get 1000 ms/s * 2 commands / 22 ms = 90 commands / second. Multiply this by 128 K/command and you'll get a maximum throughput 11 MB/s. I have no idea how to effectively change the queue length and request size parameters, so I am stuck with finding out what is causing the latency.

      Every 5 seconds, when Solaris flushes the caches and the ZIL I see considerable latency decrease and thus a throughput increase on the Linux host:

      Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
      sdc 21839.00 0.00 722.00 0.00 88.25 0.00 250.33 1.89 2.67 1.38 99.90

      And of course, the storage device itself performs quite okay when accessed from Solaris itself:

      root@san1:/export/ISOs# dd if=ubuntu-10.04.1-server-amd64.iso of=/dev/null & iostat -xD c8t0d0 1
      device r/s w/s kr/s kw/s wait actv svc_t %w %b
      sd1 535.0 0.0 65735.5 0.0 0.0 3.1 5.8 1 33
      715644928 bytes (716 MB) copied, 8.46758 s, 84.5 MB/s

      Anybody with a clue for this? Any input greatly appreciated.


      Edited by: user13673072 on Apr 8, 2011 3:52 PM
        • 1. Re: storage latency
          Sorry, I forgot to update this question's status here:

          the power management (especially the CPU PM) on the Solaris machine was interfering so badly that all system latencies were ridiculously high. Disabling the PM (editing /etc/power.conf, setting "autopm" and "cpupm" to "disable" and running pmconfig afterwards) helped a lot.