This discussion is archived
7 Replies Latest reply: Nov 13, 2012 12:40 PM by cindys RSS

zpool-<zpoolname> process going crazy

bobthesungeek76036 Pro
Currently Being Moderated
I have a T5240 running S10u9 that serves as a zone server. Zones are ldap servers and web servers (Weblogic) and all zone roots are on a separate zpool comprised of a two-way mirror of local SAS drives (146GB each).

We have several zone servers like this but one in particular is experiencing issues. And looking at the zpool-<zpoolname> process with iotop, it is doing some crazy writes 24x7 at the rate of 30MB/s on each mirror disk. iostat shows an average of 400 iops 24x7 (>80% writes) each disk which is crazy for a SAS drive. No scrub and no re-silvering is going on.

What could be causing this behavior???
  • 1. Re: zpool-<zpoolname> process going crazy
    bobthesungeek76036 Pro
    Currently Being Moderated
    Update: I performed a "scrub" on the zpool and no help. The process is still heavy writes to the disk. iosnoop shows this:

    <pre>
    ....
    0 101 W 268082948 9216 zpool-zdata <none>
    0 101 W 268313914 7680 zpool-zdata <none>
    0 101 W 270585972 4608 zpool-zdata <none>
    0 101 W 270545401 4608 zpool-zdata <none>
    0 101 W 270602661 10240 zpool-zdata <none>
    0 101 W 270548763 4608 zpool-zdata <none>
    0 101 W 270605008 5120 zpool-zdata <none>
    0 101 W 270620234 9216 zpool-zdata <none>
    0 101 W 270585972 4608 zpool-zdata <none>
    0 101 W 270635915 5120 zpool-zdata <none>
    0 101 W 270602661 10240 zpool-zdata <none>
    0 101 W 270641113 9728 zpool-zdata <none>
    0 101 W 270657677 5120 zpool-zdata <none>
    0 101 W 270675710 9216 zpool-zdata <none>
    0 101 W 270605008 5120 zpool-zdata <none>
    0 101 W 270730152 4608 zpool-zdata <none>
    0 101 W 270736961 4608 zpool-zdata <none>
    0 101 W 270620234 9216 zpool-zdata <none>
    0 101 W 270742543 9216 zpool-zdata <none>
    0 101 W 270635915 5120 zpool-zdata <none>
    0 101 W 270772381 5120 zpool-zdata <none>
    0 101 W 270772526 4608 zpool-zdata <none>
    0 101 W 270641113 9728 zpool-zdata <none>
    0 101 W 270657677 5120 zpool-zdata <none>
    0 101 W 270778681 5120 zpool-zdata <none>
    0 101 W 270675710 9216 zpool-zdata <none>
    0 101 W 270815605 9216 zpool-zdata <none>
    0 101 W 270824506 5120 zpool-zdata <none>
    0 101 W 270730152 4608 zpool-zdata <none>
    0 101 W 270900251 9728 zpool-zdata <none>
    0 101 W 270736961 4608 zpool-zdata <none>
    0 101 W 270916789 5120 zpool-zdata <none>
    0 101 W 270923309 4608 zpool-zdata <none>
    0 101 W 270742543 9216 zpool-zdata <none>
    0 101 W 270928522 5120 zpool-zdata <none>
    0 101 W 270772381 5120 zpool-zdata <none>
    0 101 W 270930510 4608 zpool-zdata <none>
    0 101 W 270772526 4608 zpool-zdata <none>
    0 101 W 270778681 5120 zpool-zdata <none>
    0 101 W 270815605 9216 zpool-zdata <none>
    0 101 W 270824506 5120 zpool-zdata <none>
    0 101 W 270945477 4608 zpool-zdata <none>
    0 101 W 270900251 9728 zpool-zdata <none>
    0 101 W 270945510 4608 zpool-zdata <none>
    0 101 W 270948276 5120 zpool-zdata <none>
    0 101 W 270916789 5120 zpool-zdata <none>
    0 101 W 270923309 4608 zpool-zdata <none>
    0 101 W 270928522 5120 zpool-zdata <none>
    0 101 W 270930510 4608 zpool-zdata <none>
    0 101 W 270945477 4608 zpool-zdata <none>
    0 101 W 270948666 5120 zpool-zdata <none>
    0 101 W 270945510 4608 zpool-zdata <none>
    0 101 W 270948276 5120 zpool-zdata <none>
    0 101 W 270948666 5120 zpool-zdata <none>
    0 101 W 270961023 4608 zpool-zdata <none>
    0 101 W 270961023 4608 zpool-zdata <none>
    0 101 W 270989195 5120 zpool-zdata <none>
    0 101 W 270989195 5120 zpool-zdata <none>
    0 101 W 270989620 5120 zpool-zdata <none>
    0 101 W 270992439 4608 zpool-zdata <none>
    0 101 W 270989620 5120 zpool-zdata <none>
    0 101 W 271025998 4608 zpool-zdata <none>
    0 101 W 270992439 4608 zpool-zdata <none>
    0 101 W 271048072 5120 zpool-zdata <none>
    0 101 W 271058176 5120 zpool-zdata <none>
    0 101 W 271025998 4608 zpool-zdata <none>
    0 101 W 271072499 4608 zpool-zdata <none>
    0 101 W 271078573 4608 zpool-zdata <none>
    0 101 W 271048072 5120 zpool-zdata <none>
    0 101 W 271058176 5120 zpool-zdata <none>
    ....
    </pre>
  • 2. Re: zpool-<zpoolname> process going crazy
    bobthesungeek76036 Pro
    Currently Being Moderated
    Here's what "zpool iostat" looks like:

    <pre>
    $ zpool iostat -T d -v zdata 5 5
    capacity operations bandwidth
    pool alloc free read write read write

    zdata 122G 14.1G 135 481 549K 1.69M
    mirror 122G 14.1G 135 481 549K 1.69M
    c1t2d0 - - 62 308 380K 1.69M
    c1t3d0 - - 62 309 379K 1.69M

    <pre>
    capacity operations bandwidth
    pool alloc free read write read write

    zdata 122G 14.1G 19 592 14.1K 2.26M
    mirror 122G 14.1G 19 592 14.1K 2.26M
    c1t2d0 - - 8 427 6.10K 2.28M
    c1t3d0 - - 10 426 8.00K 2.27M

    <pre>
    capacity operations bandwidth
    pool alloc free read write read write

    zdata 122G 14.1G 151 358 182K 1.15M
    mirror 122G 14.1G 151 358 182K 1.15M
    c1t2d0 - - 59 226 143K 1.14M
    c1t3d0 - - 62 229 92.0K 1.15M

    <pre>
    capacity operations bandwidth
    pool alloc free read write read write

    zdata 122G 14.1G 36 400 50.8K 1.59M
    mirror 122G 14.1G 36 400 50.8K 1.59M
    c1t2d0 - - 11 358 20.2K 1.59M
    c1t3d0 - - 21 359 32.6K 1.59M

    <pre>
    capacity operations bandwidth
    pool alloc free read write read write

    zdata 122G 14.1G 19 586 37.5K 1.96M
    mirror 122G 14.1G 19 586 37.5K 1.96M
    c1t2d0 - - 6 399 7.90K 1.96M
    c1t3d0 - - 10 399 29.6K 1.96M


    $</pre>
  • 3. Re: zpool-<zpoolname> process going crazy
    bobthesungeek76036 Pro
    Currently Being Moderated
    Ok we opened a case with Oracle and the response was that the zpool was approaching full (it was 89%) and ZFS switched into space saving mode (from performance mode) and the ARC cache was flushing causing the excessive writes. OooooooKaaaaaaaa, so we cleared up some space and viola! disk performance for the users went back to normal. However, it appears the process is still chewing up gobs of I/O.

    Further investigation revealed that pretty much all writes to the disk seem to be performed by the zpool-<poolname> process. I ran a bunch of compresses and with iotop I could see the compress process read data but never saw it write. In-fact, I never saw any process write data except the zpool-<poolname> process. I'm guessing it's due to all the writes going to cache???

    So now I am left to wonder are the writes by zpool-<poolname> actual user process writes or is there still an issue with the zpool??? Should I set prmarycache= and secondarycache= to none??? And what is this magical number zpool capacity that causes ZFS to switch modes??? I'm getting less amorous with ZFS every day...
  • 4. Re: zpool-<zpoolname> process going crazy
    cindys Pro
    Currently Being Moderated
    Hi Bob,

    The zpool-pool process threads are performing the pool I/O in the ZIO pipeline. This is a new process
    that is just exposing the I/O workload so you can track CPU usage and so on.

    Full pools can have different behavior issues, not just ARC flushing. You only set the secondarycache
    property when you have a L2ARC or a secondary cache device.

    Is this pool busy? If you are seeing a specific performance problem then that is what needs to be
    investigated.

    For performance problems, I always start with reviewing all device statistics to rule out a slow or
    failing disk, but further analysis really isn't my area.

    You can review our basic best practices, including the full pool issue, here:

    http://docs.oracle.com/cd/E23823_01/html/819-5461/practice-1.html#scrolltoc

    Thanks, Cindy
  • 5. Re: zpool-<zpoolname> process going crazy
    bobthesungeek76036 Pro
    Currently Being Moderated
    Thanks for the reply. Oddly enough, after freeing up space in the pool, the process churned for a while and then it magically settled down to nothing like the other systems.

    Guess we'll never know what the true cause was...
  • 6. Re: zpool-<zpoolname> process going crazy
    KirkMcNeilFJ Newbie
    Currently Being Moderated
    I am having very slow performance issues on my M4000 server attached to a Hitachi SAN with the same symptoms. Affected filesystem /opt (rkpool). I will try to clean up some space on this filesystem and post the results. I hope this works for me.

    0 89 W 361423399 6656 zpool-rkpool <none>
    0 89 W 361423399 6656 zpool-rkpool <none>
    0 89 W 361428007 3072 zpool-rkpool <none>
    0 89 W 361428007 3072 zpool-rkpool <none>
    0 89 W 361442228 6656 zpool-rkpool <none>
    0 89 W 361442228 6656 zpool-rkpool <none>
    0 89 W 361444472 3072 zpool-rkpool <none>
    0 89 W 361444472 3072 zpool-rkpool <none>
    0 89 W 361445669 6656 zpool-rkpool <none>
    0 89 W 361445669 6656 zpool-rkpool <none>
    0 89 W 361461758 3072 zpool-rkpool <none>
    0 89 W 361484477 6656 zpool-rkpool <none>
    0 89 W 361484477 6656 zpool-rkpool <none>
    0 89 W 361488223 3072 zpool-rkpool <none>
    0 89 W 361488223 3072 zpool-rkpool <none>
    0 89 W 361500067 3072 zpool-rkpool <none>
    0 89 W 361500067 3072 zpool-rkpool <none>
    0 89 W 361520587 3072 zpool-rkpool <none>
    0 89 W 361520587 3072 zpool-rkpool <none>
    0 89 W 361529431 3072 zpool-rkpool <none>
    0 89 W 361529431 3072 zpool-rkpool <none>
    0 89 W 361536829 3072 zpool-rkpool <none>

    # zpool iostat -T d -v rkpool 5 3
    capacity operations bandwidth
    pool alloc free read write read write
    ----------------------- ----- ----- ----- ----- ----- -----
    rkpool 178G 23.9G 18 119 815K 779K
    c3t50060E8005730440d0 178G 23.9G 18 119 815K 779K
    ----------------------- ----- ----- ----- ----- ----- -----

    capacity operations bandwidth
    pool alloc free read write read write
    ----------------------- ----- ----- ----- ----- ----- -----
    rkpool 178G 23.9G 0 1 0 181K
    c3t50060E8005730440d0 178G 23.9G 0 1 0 181K
    ----------------------- ----- ----- ----- ----- ----- -----

    capacity operations bandwidth
    pool alloc free read write read write
    ----------------------- ----- ----- ----- ----- ----- -----
    rkpool 178G 23.8G 0 198 101 1.17M
    c3t50060E8005730440d0 178G 23.8G 0 198 101 1.17M
    ----------------------- ----- ----- ----- ----- ----- -----

    # df -h
    Filesystem size used avail capacity Mounted on

    rkinvoice1 392G 307G 85G 79% /invoices
    rkpool 199G 178G 21G 90% /opt
    rpool 134G 97K 66G 1% /rpool
    rkdbpool1 392G 251G 141G 65% /vol1
    rkdbpool2 392G 148G 244G 38% /vol2
    rkdbpool3 392G 132G 260G 34% /vol3
  • 7. Re: zpool-<zpoolname> process going crazy
    cindys Pro
    Currently Being Moderated
    The zpool-pool-name process threads are busy processing I/O. If the pool is busy and also full,
    then these processes will take more time writing the I/O. Your rkpool is 90% full:

    rkpool 199G 178G 21G 90% /opt

    I would use zpool list over df -h to review pool capacity.

    If you can bring this pool's capacity to 80%, it should perform better. Let us know if it doesn't.

    See the ZFS best practices, here:

    http://docs.oracle.com/cd/E23823_01/html/819-5461/practice-1.html#scrolltoc

    Thanks, Cindy

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points