I couldn't find anyone here asking about it, so I'll throw this out there. I've noticed in my testing that ZFS throughput falls off pretty sharply with Solaris 11.1, largely due to processor usage. For some reason, it appears that Solaris 11.1 refuses to let ZFS use more than 50% of the processor (verified in mpstat and top, on 2 different bare-metal boxes and a VMware installation). Metal installations were on a Dell Poweredge 2950 (yeah I know, old), a Supermicro (Xeon E3-1240), and the VMware installation was using 6 vCPUs under ESXi 5.1 on a Supermicro server as well (Xeon E5-2670). All exhibit the same strange behavior on Solaris 11.1 - none of them have this problem on Solaris 11.0.
The long and the short of it is that if you look at mpstat, you'll see either half the cores in the box pegged or you'll see all the cores hovering around 50%. Being able to adjust Hyperthreading readily on the E3 box bore this out pretty well, the difference between 650MB/s and 320MB/s (with hyperthreading on, I see 4 cores pegged, with it disabled, I see the four physical cores hovering at 50%).
Is this a bug or a "feature"?
If the former, does anyone know if it's fixed in an SRU? I have a potential storage project going on with sales that is being jeopardized by this, and I either need to know that it's a bug that's been fixed already or is being fixed.
If it's the latter, is it tunable? If so, how?
Answers would be greatly appreciated, and if you've run into this as well that would be good to know.
For my tests, I'm using a variety of hosts as clients (Windows [XP,7,8], Mac OS X, ESXi 5.1). I am also using large dd commands locally for pool throughput testing (since testing with 16GB of RAM, I have been doing anywhere between 30-200GB for those).
I have run Solaris 11.1 bare metal and within virtual hardware (as noted above). For my tests using ESXi as the client, I used the Dell Poweredge 2950 as a bare-metal installed host directly connected over 1Gbe using NFS as the connection protocol. I have disabled sync on the pool to prevent sync writes from skewing my tests, focusing on throughput and raw CPU usage. I noted the seemingly capped CPU usage at 50% for ZFS, and confirmed that this behavior didn't exist in Solaris 11.0. On my Xeon E3-1240, I used to be able to push more than 1.2GB/s of throughput with SHA256. Now, I'm sitting at about 600MB/s.
As I said, it's pretty concerning when I'm trying to get an Oracle Server or SRS for an approved box to do pure ZFS storage duties. I made a post about this to Darren Moffat's blog, and he responded this morning saying that new code is being worked on to optimize SHA256 and AES on Intel processors. While I am extremely happy to hear this (and wait eagerly for it - Solaris already seems to have about a 25% more efficient SHA256 implementation vs FreeBSD), if I can only use half of the CPU power of the box, then it's all for naught.
I'm simply trying to either get confirmation that it's a bug that's been addressed (or being worked on and will be fixed soon) or that it's a tunable that I'm not seeing. I'm not going to spend thousands of dollars on Solaris just to have egg on my face by only being able to use 50% of the CPU. This is more due diligence than anything else.
I'm hardly a performance expert...the only S11.1 ZFS performance issue I'm aware of is with SSDs (bug 15804599, has a workaround).
Someone on comp.unix.solaris mentioned that S11.1 ZFS storage performance was slower (but seemingly unrelated to 15804599).
I can't find this posting now.
Are you saying that mpstat is only show 50% of CPU available in the VM? Or, when run on bare metal? The mpstat man page mentions zones as a virtual environment, where mpstat only displays data for the processor sets bound to the zone.
I would gather more data on the bare metal performance by using other tools, like iostat, lockstat, and fmdump just to be sure.
Do you have compression enabled?
Mount the same ZFS mount into different NFS mount points. Do a dd test using all these mounts simultaneously. Do you see better CPU utilization on the ZFS server when doing it that way compared to just using a single NFS mount? Of course have sync=disabled.
I can tell you what I saw with iSCSI when upgrading from 11.0 to 11.1 which was very similar to what OP is observing but I don't know if that's directly relevant since he seems to be running NFS.
Basically after upgrading to 11.1 my iSCSI IOPs dropped almost by half compared to 11. What I discovered is that my client systems could not push the storage server CPUs utilization either. MPSTAT shows some of the cores pegged while other cores completely idle. The number of cores utilized was directly proportional to number of iSCSI connections used. So for two iSCSI connections I could never utilize more than two cores on the storage server. So what I did is setup more iSCSI connections over the same initiator/target from the clients using iSCSI MPIO and that solved the problem by allowing more storage server cores to be utilized in 11.1 and I know that 11.0 did not need this.
So to me it appears that there definitely has been some sort of a change made in 11.1 compared to 11.0 in regarding to a network stack. I don't think it's relevant to ZFS.
I do not have compression enabled on any datasets. I first noticed this when I did a test scrub on the Poweredge 2950, and the scrub took twice as long to complete vs 11.0 (was using SHA256 on the pool). I noticed that in mpstat each of the four CPU cores in this box were at about 50% utilization. No zones on this box, just raw ZFS. I also tested with dd, and came up with the same results.
I'm trying to avoid involving the network stack at all, since even scrubs seem affected by this behavior. This happens both virtualized and bare metal.
As an example...
SHA256, Solaris 11.0 > ~*600MB/s, CPU cores pegged*
Fletcher4, Solaris 11.0 > ~900MB/s, CPU ~25% per core
SHA256, Solaris 11.1 > ~*300MB/s, CPU ~50% per core*
Fletcher4, Solaris 11.1 > ~900MB/s, CPU ~25% per core
Looking at Alex's post related to threads, I will whip up a quick dd test using many threads (probably 4, then 8 if necessary). I am loading two Intel SSDs into the bare metal Poweredge 2950 installation right now (these can do 500MB/s apiece). I'm using an LSI 9201-16e using the mpt-sas driver on this machine.
I will post back with results of that as soon as possible. I will also try capturing the below iostat and lockstat in addition to the obligatory zpool iostat and mpstat as well (never used fmdump before, but I will try).
Pardon the mess...I cleaned up iostat, then gave up trying to make them pretty. One thing that did change when I used four DD writers (and it's revealed here in mpstat) is that there is more processor usage now (except it's used for the additional DD writers from /dev/zero, instead of for ZFS). ZFS throughput is linear at around 320MB/s (about 50% CPU). So if I had to guess, the checksum processing is what's impacted by this. I will post back with my Fletcher4 results.
Look at the nice, even distribution we're getting now for CPU load vs SHA256, we're bursting a bit above 50% for CPU on some runs (again due to the 4x DD writers), but the disks are slammed already for writes. Let's see what a scrub does...
capacity operations bandwidth
pool alloc free read write read write
I wouldn't worry about that particular SSD TRIM bug in this case - I think I've more than proved the point that checksum processing in ZFS seems to be petering out at around 50% processor utilization in S11.1, whereas it can go all the way to 100% in S11.0. I added them as a separate pool (striped) for testing, destroying and recreating the pool entirely for changing checksums.
The atime property is also off for both tests, otherwise the pools were completely vanilla (no compression, no dedup, etc). I can't file a report or a ticket because we're in a pre-sales stage, and as I previously explained I can't pitch this through with such a glaring defect.
I've been waiting on Brad Woolfe and Chris Curry since April 17th for an explanation on this, and haven't received a single call or email. I realize Chris had a pretty serious injury from softball (as Tim Festa, my previous account rep told me) but I still have yet to receive any contact from them.
So I'm posting here to try and get some traction on the issue, and to see if anyone else has run into this.
I like to rule out all known issues. For a people with SSDs, the performance drop-off after migrating to S11.1 was noticeable.
I don't have the experience to look at your data and says its xy or z, but some of the experts in this forum might be able to
recognize some issue. I pretty sure Darren would be aware of any known crypto issues so my sense is that something else
is causing the crypto perf to drop off.