I have a T5240 running S10u9 that serves as a zone server. Zones are ldap servers and web servers (Weblogic) and all zone roots are on a separate zpool comprised of a two-way mirror of local SAS drives (146GB each).
We have several zone servers like this but one in particular is experiencing issues. And looking at the zpool-<zpoolname> process with iotop, it is doing some crazy writes 24x7 at the rate of 30MB/s on each mirror disk. iostat shows an average of 400 iops 24x7 (>80% writes) each disk which is crazy for a SAS drive. No scrub and no re-silvering is going on.
What could be causing this behavior???
Ok we opened a case with Oracle and the response was that the zpool was approaching full (it was 89%) and ZFS switched into space saving mode (from performance mode) and the ARC cache was flushing causing the excessive writes. OooooooKaaaaaaaa, so we cleared up some space and viola! disk performance for the users went back to normal. However, it appears the process is still chewing up gobs of I/O.
Further investigation revealed that pretty much all writes to the disk seem to be performed by the zpool-<poolname> process. I ran a bunch of compresses and with iotop I could see the compress process read data but never saw it write. In-fact, I never saw any process write data except the zpool-<poolname> process. I'm guessing it's due to all the writes going to cache???
So now I am left to wonder are the writes by zpool-<poolname> actual user process writes or is there still an issue with the zpool??? Should I set prmarycache= and secondarycache= to none??? And what is this magical number zpool capacity that causes ZFS to switch modes??? I'm getting less amorous with ZFS every day...
The zpool-pool process threads are performing the pool I/O in the ZIO pipeline. This is a new process
that is just exposing the I/O workload so you can track CPU usage and so on.
Full pools can have different behavior issues, not just ARC flushing. You only set the secondarycache
property when you have a L2ARC or a secondary cache device.
Is this pool busy? If you are seeing a specific performance problem then that is what needs to be
For performance problems, I always start with reviewing all device statistics to rule out a slow or
failing disk, but further analysis really isn't my area.
You can review our basic best practices, including the full pool issue, here:
Thanks for the reply. Oddly enough, after freeing up space in the pool, the process churned for a while and then it magically settled down to nothing like the other systems.
Guess we'll never know what the true cause was...
The zpool-pool-name process threads are busy processing I/O. If the pool is busy and also full,
then these processes will take more time writing the I/O. Your rkpool is 90% full:
rkpool 199G 178G 21G 90% /opt
I would use zpool list over df -h to review pool capacity.
If you can bring this pool's capacity to 80%, it should perform better. Let us know if it doesn't.
See the ZFS best practices, here: