This discussion is archived
1 2 Previous Next 15 Replies Latest reply: Sep 2, 2013 1:46 PM by RFujii RSS

Solaris 11.1 ZFS Throughput vs Solaris 11.0

1006599 Newbie
Currently Being Moderated
Hi everyone,

I couldn't find anyone here asking about it, so I'll throw this out there. I've noticed in my testing that ZFS throughput falls off pretty sharply with Solaris 11.1, largely due to processor usage. For some reason, it appears that Solaris 11.1 refuses to let ZFS use more than 50% of the processor (verified in mpstat and top, on 2 different bare-metal boxes and a VMware installation). Metal installations were on a Dell Poweredge 2950 (yeah I know, old), a Supermicro (Xeon E3-1240), and the VMware installation was using 6 vCPUs under ESXi 5.1 on a Supermicro server as well (Xeon E5-2670). All exhibit the same strange behavior on Solaris 11.1 - none of them have this problem on Solaris 11.0.

The long and the short of it is that if you look at mpstat, you'll see either half the cores in the box pegged or you'll see all the cores hovering around 50%. Being able to adjust Hyperthreading readily on the E3 box bore this out pretty well, the difference between 650MB/s and 320MB/s (with hyperthreading on, I see 4 cores pegged, with it disabled, I see the four physical cores hovering at 50%).

Is this a bug or a "feature"?

If the former, does anyone know if it's fixed in an SRU? I have a potential storage project going on with sales that is being jeopardized by this, and I either need to know that it's a bug that's been fixed already or is being fixed.

If it's the latter, is it tunable? If so, how?

Answers would be greatly appreciated, and if you've run into this as well that would be good to know.

-J
  • 1. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    Alex Fatkulin Explorer
    Currently Being Moderated
    Can you tell how is your ZFS storage server hooked up to the ESXi host? What kind of fabric and protocol?
  • 2. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    1006599 Newbie
    Currently Being Moderated
    For my tests, I'm using a variety of hosts as clients (Windows [XP,7,8], Mac OS X, ESXi 5.1). I am also using large dd commands locally for pool throughput testing (since testing with 16GB of RAM, I have been doing anywhere between 30-200GB for those).

    I have run Solaris 11.1 bare metal and within virtual hardware (as noted above). For my tests using ESXi as the client, I used the Dell Poweredge 2950 as a bare-metal installed host directly connected over 1Gbe using NFS as the connection protocol. I have disabled sync on the pool to prevent sync writes from skewing my tests, focusing on throughput and raw CPU usage. I noted the seemingly capped CPU usage at 50% for ZFS, and confirmed that this behavior didn't exist in Solaris 11.0. On my Xeon E3-1240, I used to be able to push more than 1.2GB/s of throughput with SHA256. Now, I'm sitting at about 600MB/s.

    As I said, it's pretty concerning when I'm trying to get an Oracle Server or SRS for an approved box to do pure ZFS storage duties. I made a post about this to Darren Moffat's blog, and he responded this morning saying that new code is being worked on to optimize SHA256 and AES on Intel processors. While I am extremely happy to hear this (and wait eagerly for it - Solaris already seems to have about a 25% more efficient SHA256 implementation vs FreeBSD), if I can only use half of the CPU power of the box, then it's all for naught.

    I'm simply trying to either get confirmation that it's a bug that's been addressed (or being worked on and will be fixed soon) or that it's a tunable that I'm not seeing. I'm not going to spend thousands of dollars on Solaris just to have egg on my face by only being able to use 50% of the CPU. This is more due diligence than anything else.
  • 3. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    cindys Pro
    Currently Being Moderated
    I'm hardly a performance expert...the only S11.1 ZFS performance issue I'm aware of is with SSDs (bug 15804599, has a workaround).

    Someone on comp.unix.solaris mentioned that S11.1 ZFS storage performance was slower (but seemingly unrelated to 15804599).
    I can't find this posting now.

    Are you saying that mpstat is only show 50% of CPU available in the VM? Or, when run on bare metal? The mpstat man page mentions zones as a virtual environment, where mpstat only displays data for the processor sets bound to the zone.

    I would gather more data on the bare metal performance by using other tools, like iostat, lockstat, and fmdump just to be sure.
    Do you have compression enabled?

    Thanks, Cindy
  • 4. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    Alex Fatkulin Explorer
    Currently Being Moderated
    Try to do this test with NFS...

    Mount the same ZFS mount into different NFS mount points. Do a dd test using all these mounts simultaneously. Do you see better CPU utilization on the ZFS server when doing it that way compared to just using a single NFS mount? Of course have sync=disabled.
  • 5. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    Alex Fatkulin Explorer
    Currently Being Moderated
    cindys,

    I can tell you what I saw with iSCSI when upgrading from 11.0 to 11.1 which was very similar to what OP is observing but I don't know if that's directly relevant since he seems to be running NFS.

    Basically after upgrading to 11.1 my iSCSI IOPs dropped almost by half compared to 11. What I discovered is that my client systems could not push the storage server CPUs utilization either. MPSTAT shows some of the cores pegged while other cores completely idle. The number of cores utilized was directly proportional to number of iSCSI connections used. So for two iSCSI connections I could never utilize more than two cores on the storage server. So what I did is setup more iSCSI connections over the same initiator/target from the clients using iSCSI MPIO and that solved the problem by allowing more storage server cores to be utilized in 11.1 and I know that 11.0 did not need this.

    So to me it appears that there definitely has been some sort of a change made in 11.1 compared to 11.0 in regarding to a network stack. I don't think it's relevant to ZFS.
  • 6. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    1006599 Newbie
    Currently Being Moderated
    Cindy,

    I do not have compression enabled on any datasets. I first noticed this when I did a test scrub on the Poweredge 2950, and the scrub took twice as long to complete vs 11.0 (was using SHA256 on the pool). I noticed that in mpstat each of the four CPU cores in this box were at about 50% utilization. No zones on this box, just raw ZFS. I also tested with dd, and came up with the same results.

    I'm trying to avoid involving the network stack at all, since even scrubs seem affected by this behavior. This happens both virtualized and bare metal.

    As an example...

    SHA256, Solaris 11.0 > ~*600MB/s, CPU cores pegged*
    Fletcher4, Solaris 11.0 > ~900MB/s, CPU ~25% per core
    SHA256, Solaris 11.1 > ~*300MB/s, CPU ~50% per core*
    Fletcher4, Solaris 11.1 > ~900MB/s, CPU ~25% per core

    Looking at Alex's post related to threads, I will whip up a quick dd test using many threads (probably 4, then 8 if necessary). I am loading two Intel SSDs into the bare metal Poweredge 2950 installation right now (these can do 500MB/s apiece). I'm using an LSI 9201-16e using the mpt-sas driver on this machine.

    I will post back with results of that as soon as possible. I will also try capturing the below iostat and lockstat in addition to the obligatory zpool iostat and mpstat as well (never used fmdump before, but I will try).
  • 7. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    cindys Pro
    Currently Being Moderated
    Performance issues are not easy to resolve when you add iSCSI and networking too. You might file a service request if you have a support contract.

    I can't tell if the SSDs were attached and being used on you system when you ran the previous S11.1 tests. If you add them now, they might things worse due to 15804599.

    Thanks,Cindy
  • 8. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    1006599 Newbie
    Currently Being Moderated
    Cindy,

    Here are some results...I couldn't include lockstat because I don't know what you want out of it (could you list the ops you want run?).

    For SHA256...two Intel 330 Series SSD 120GB

    4x DD writers simultaneous (same results with 1-4, and I'd imagine using 8 won't change it either)

    iostat

    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
    2.0 1233.4 6.8 161526.3 0.0 1.1 0.0 0.9 1 56 c0t5001517BB28A615Fd0
    2.0 1233.8 6.8 161479.2 0.0 1.2 0.0 1.0 1 60 c0t5001517BB28A6238d0

    zpool iostat

    capacity operations bandwidth
    pool alloc free read write read write

    super 6.02G 216G 0 2.31K 0 310M
    c0t5001517BB28A615Fd0 3.01G 108G 0 1.16K 0 155M
    c0t5001517BB28A6238d0 3.01G 108G 0 1.15K 0 155M


    capacity operations bandwidth
    pool alloc free read write read write

    super 7.42G 215G 0 2.44K 0 320M
    c0t5001517BB28A615Fd0 3.71G 107G 0 1.22K 0 160M
    c0t5001517BB28A6238d0 3.71G 107G 0 1.21K 0 160M


    capacity operations bandwidth
    pool alloc free read write read write

    super 8.85G 213G 0 2.45K 0 316M
    c0t5001517BB28A615Fd0 4.43G 107G 0 1.22K 0 158M
    c0t5001517BB28A6238d0 4.43G 107G 0 1.22K 0 158M



    mpstat

    CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
    0 0 0 4648 12076 125 1840 50 110 357 0 169 0 52 0 47
    1 0 0 1809 18926 12 320 72 36 146 0 124 0 93 0 7
    2 0 0 2053 18726 31 1076 60 73 325 0 130 0 70 0 30
    3 0 0 42391 4274 2114 2289 134 123 414 0 338 0 46 0 53
    CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
    0 0 0 30703 9929 295 1826 46 125 336 0 277 0 49 0 51
    1 0 0 0 22823 5 220 80 25 305 0 139 0 96 0 4
    2 0 0 16193 15511 20 987 96 69 509 0 263 0 78 0 22
    3 0 0 7386 9230 2192 3186 101 162 442 0 400 0 27 0 73
    CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
    0 0 0 537 4166 230 2583 56 124 497 0 400 0 22 0 77
    1 0 0 3672 886 5 563 140 28 458 0 353 0 94 0 6
    2 0 0 0 4062 5 134 91 18 380 0 170 0 99 0 1
    3 0 0 5073 3287 2161 2582 66 126 515 0 455 0 28 0 71


    Pardon the mess...I cleaned up iostat, then gave up trying to make them pretty. One thing that did change when I used four DD writers (and it's revealed here in mpstat) is that there is more processor usage now (except it's used for the additional DD writers from /dev/zero, instead of for ZFS). ZFS throughput is linear at around 320MB/s (about 50% CPU). So if I had to guess, the checksum processing is what's impacted by this. I will post back with my Fletcher4 results.

    Edited by: Jason Keller on May 8, 2013 1:33 PM
  • 9. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    1006599 Newbie
    Currently Being Moderated
    Here's what happens during a scrub...

    SHA256 pool (scrub)

    mpstat

    CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
    0 0 0 0 345 134 952 0 20 43 0 178 1 2 0 97
    1 0 0 0 59 0 4 53 0 29 0 0 0 100 0 0
    2 0 0 0 72 3 6 52 1 22 0 0 0 100 0 0
    3 0 0 0 741 630 4364 0 14 158 0 173 0 3 0 97


    zpool iostat

    capacity operations bandwidth
    pool alloc free read write read write

    super 80.0G 142G 826 0 320M 0
    c0t5001517BB28A615Fd0 40.0G 71.0G 415 0 160M 0
    c0t5001517BB28A6238d0 40.0G 71.0G 411 0 160M 0


    capacity operations bandwidth
    pool alloc free read write read write

    super 80.0G 142G 816 0 321M 0
    c0t5001517BB28A615Fd0 40.0G 71.0G 396 0 161M 0
    c0t5001517BB28A6238d0 40.0G 71.0G 420 0 160M 0


    iostat

    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
    425.9 0.0 163830.8 0.0 0.0 0.6 0.0 1.4 1 35 c0t5001517BB28A615Fd0
    384.9 0.0 164344.2 0.0 0.0 0.6 0.0 1.5 1 34 c0t5001517BB28A6238d0
  • 10. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    1006599 Newbie
    Currently Being Moderated
    And here's the Fletcher4 results (4x DD writers from /dev/zero)


    zpool iostat

    capacity operations bandwidth
    pool alloc free read write read write

    super 21.0G 201G 0 613 0 572M
    c0t5001517BB28A615Fd0 10.5G 101G 0 311 0 289M
    c0t5001517BB28A6238d0 10.5G 101G 0 302 0 284M


    capacity operations bandwidth
    pool alloc free read write read write

    super 23.0G 199G 0 552 0 530M
    c0t5001517BB28A615Fd0 11.5G 99.5G 0 282 0 271M
    c0t5001517BB28A6238d0 11.5G 99.5G 0 269 0 259M


    mpstat

    CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
    0 0 0 18227 15459 105 624 106 66 1810 0 485 0 62 0 37
    1 0 0 9414 17912 38 593 115 68 1596 0 678 0 62 0 37
    2 0 0 16570 14988 41 668 93 64 1752 0 586 0 61 0 39
    3 0 0 17979 15356 513 539 101 53 1273 0 439 0 68 0 32

    CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
    0 0 0 1 361 105 449 41 58 1169 0 225 0 49 0 51
    1 0 0 0 216 65 496 42 53 634 1 266 1 49 0 50
    2 0 0 1 214 61 411 47 49 1063 0 316 1 47 0 51
    3 0 0 1 549 427 471 55 58 862 0 359 1 48 0 51


    iostat

    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
    2.0 294.9 6.8 288329.5 0.0 9.4 0.1 31.7 3 95 c0t5001517BB28A615Fd0
    2.0 303.6 6.8 293739.9 0.0 9.7 0.1 31.8 3 99 c0t5001517BB28A6238d0


    Look at the nice, even distribution we're getting now for CPU load vs SHA256, we're bursting a bit above 50% for CPU on some runs (again due to the 4x DD writers), but the disks are slammed already for writes. Let's see what a scrub does...


    zpool iostat

    capacity operations bandwidth
    pool alloc free read write read write

    super 80.0G 142G 1.83K 27 1007M 164K
    c0t5001517BB28A615Fd0 40.0G 71.0G 913 11 504M 55.7K
    c0t5001517BB28A6238d0 40.0G 71.0G 957 15 504M 108K


    capacity operations bandwidth
    pool alloc free read write read write

    super 80.0G 142G 1.83K 24 990M 153K
    c0t5001517BB28A615Fd0 40.0G 71.0G 937 13 495M 102K
    c0t5001517BB28A6238d0 40.0G 71.0G 936 10 495M 51.4K


    mpstat

    CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
    0 0 0 0 399 109 3407 22 122 472 0 86 0 42 0 58
    1 0 0 0 122 3 4948 14 151 308 0 120 0 26 0 73
    2 0 0 0 107 7 5630 11 151 324 0 80 0 20 0 80
    3 0 0 12 1115 1009 3122 19 109 480 0 40 0 38 0 62
    CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
    0 0 0 0 374 105 5210 15 150 378 0 61 0 31 0 69
    1 0 0 0 130 3 4112 19 146 366 0 31 0 35 0 65
    2 0 0 6 108 6 4456 17 134 394 0 23 0 33 0 66
    3 0 0 0 1085 986 3627 13 110 375 0 39 0 30 0 70


    iostat

    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
    925.1 16.6 506420.5 105.5 0.0 5.3 0.0 5.6 3 98 c0t5001517BB28A615Fd0
    937.7 13.0 506288.6 53.2 0.0 5.2 0.0 5.5 3 98 c0t5001517BB28A6238d0


    r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
    925.5 18.4 503771.3 63.3 0.0 5.3 0.0 5.6 3 97 c0t5001517BB28A615Fd0
    906.7 15.6 503916.6 99.3 0.0 5.3 0.0 5.8 3 97 c0t5001517BB28A6238d0


    Clearly, we are disk capped here at about 1GB/s - notice however that we're well under 50% processor utilization for the checksum processing.
  • 11. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    1006599 Newbie
    Currently Being Moderated
    Cindy,

    I wouldn't worry about that particular SSD TRIM bug in this case - I think I've more than proved the point that checksum processing in ZFS seems to be petering out at around 50% processor utilization in S11.1, whereas it can go all the way to 100% in S11.0. I added them as a separate pool (striped) for testing, destroying and recreating the pool entirely for changing checksums.

    The atime property is also off for both tests, otherwise the pools were completely vanilla (no compression, no dedup, etc). I can't file a report or a ticket because we're in a pre-sales stage, and as I previously explained I can't pitch this through with such a glaring defect.

    I've been waiting on Brad Woolfe and Chris Curry since April 17th for an explanation on this, and haven't received a single call or email. I realize Chris had a pretty serious injury from softball (as Tim Festa, my previous account rep told me) but I still have yet to receive any contact from them.

    So I'm posting here to try and get some traction on the issue, and to see if anyone else has run into this.

    Edited by: Jason Keller on May 8, 2013 4:46 PM
  • 12. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    cindys Pro
    Currently Being Moderated
    I like to rule out all known issues. For a people with SSDs, the performance drop-off after migrating to S11.1 was noticeable.

    I don't have the experience to look at your data and says its xy or z, but some of the experts in this forum might be able to
    recognize some issue. I pretty sure Darren would be aware of any known crypto issues so my sense is that something else
    is causing the crypto perf to drop off.

    Thanks, Cindy
  • 13. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    TimCreswick Newbie
    Currently Being Moderated

    Hi - was wondering if any of you made progress with this issue.

     

    At the risk of cross-posting, I have started the following new discussion which I think might be a related issue: Poor performance on SSD-backed ZFS pools in mirror configuration only

  • 14. Re: Solaris 11.1 ZFS Throughput vs Solaris 11.0
    cindys Pro
    Currently Being Moderated

    At this point, I think it would be helpful to disconnect or unconfigure the SSDs to see

    if the performance is still poor for non SSD devices. Build the same pools with regular

    devices and test that performance.

     

    Thanks, Cindy

1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points