This discussion is archived
5 Replies Latest reply: Jan 14, 2010 4:10 PM by 807559 RSS

iopending oddness

gleng Expert
Currently Being Moderated
I've been using various Dtrace scripts to try and shed some light on a performance problem with a 6140, a couple of V490s and Oracle. This is the first time I've used Dtrace so maybe I'm missing something obvious ...

The script iopending seems to work for awhile and then as the server gets busy it starts showing only >=32. Here's some resent output:
2010 Jan 12 16:50:42,  load: 3.42,  disk_r:    392 KB,  disk_w:   1690 KB

           value  ------------- Distribution ------------- count
              28 |                                         0
              29 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     35618
              30 |@@@@                                     4157
              31 |                                         86
           >= 32 |                                         139

2010 Jan 12 16:50:47,  load: 3.39,  disk_r:    320 KB,  disk_w:   1443 KB

           value  ------------- Distribution ------------- count
              28 |                                         0
              29 |@@@@@@@@@@@@@@@                          14958
              30 |@@@@@@@@@@@@@@@@@@@@@@@@                 24003
              31 |@                                        699
           >= 32 |                                         340

2010 Jan 12 16:50:52,  load: 3.36,  disk_r:    280 KB,  disk_w:   5052 KB

           value  ------------- Distribution ------------- count
              29 |                                         0
              30 |@                                        1387
              31 |@@@@@@@@@@@@@@@@@@@@@@                   22333
           >= 32 |@@@@@@@@@@@@@@@@                         16280

2010 Jan 12 16:50:57,  load: 3.37,  disk_r:    208 KB,  disk_w:   2077 KB

           value  ------------- Distribution ------------- count
              31 |                                         0
           >= 32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 40000

2010 Jan 12 16:51:02,  load: 3.38,  disk_r:    112 KB,  disk_w:   1364 KB

           value  ------------- Distribution ------------- count
              31 |                                         0
           >= 32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 40000

2010 Jan 12 16:51:07,  load: 3.34,  disk_r:   1424 KB,  disk_w:   2531 KB

           value  ------------- Distribution ------------- count
              31 |                                         0
           >= 32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 40000
If I stop the script (cntl-c) and rerun it works for awhile.


Is there some basic limitation with Dtrace?

I'm I expecting too much by thinking Dtrace scripts should run indefinitely.

I've run into a similar problem with one called process_io_top.



Thanks for any insight
Glen
  • 1. Re: iopending oddness
    807559 Newbie
    Currently Being Moderated
    What value are you lquantizing? A look at the aggregation you're using would be helpful.
  • 2. Re: iopending oddness
    gleng Expert
    Currently Being Moderated
    Here's the lquantize statement:
     @out = lquantize(pending, 0, MAX_PENDING, 1);
    Thanks,
    Glen

    I see from the form header
    IMPORTANT CHANGE: To get your questions answered and continue discussion on Solaris Dynamic Tracing Framework, please visit the OpenSolaris Forums
    that the Dtrace community should move to the OpenSolaris discussion area.

    Do you follow any of the OpenSolaris discussions?

    Edited by: sysglen on Jan 13, 2010 9:39 AM
  • 3. Re: iopending oddness
    807559 Newbie
    Currently Being Moderated
    Well, that's a peculiar thing.

    Ok, so you create a linear distribution from pending to MAX_PENDING. What you are plotting along that chart appears to be the number of probe hits for which a given pending value applies. It seems at some point that you exceed whatever MAX_PENDING is, and so your distribution literally goes off the chart.

    Can you show the whole probe clause, including the value of MAX_PENDING?

    The OpenSolaris DTrace discussion group isn't terribly active, but I do subscribe to it. Lots of high-powered DTrace talent listening to. You can get to it using this link:

    http://www.opensolaris.org/jive/forum.jspa?forumID=7

    Regards,

    Michael
  • 4. Re: iopending oddness
    gleng Expert
    Currently Being Moderated
    The script is from the example collection [DTraceToolkit-0.99.tar.gz|http://hub.opensolaris.org/bin/view/Community+Group+dtrace/dtracetoolkit] .


    MAX_PENDING is defined and referenced once:

    :g/MAX_PENDING/nu
    134 inline int MAX_PENDING = 32; /* max pending value */
    212 @out = lquantize(pending, 0, MAX_PENDING, 1);

    This is the parts that update pending:
     io:genunix::start
     /this->ok/
     {
            /* track bytes */
            disk_r += args[0]->b_flags & B_READ ? args[0]->b_bcount : 0;
            disk_w += args[0]->b_flags & B_READ ? 0 : args[0]->b_bcount;
    
            /* increase event pending count */
            pending++;
     }
    
     /*
      * Process and Print completion
      */
     io:genunix::done
     /this->ok/
     {
            /* decrease event pending count */
            pending--;
     }
    
     /*
      * Prevent pending from underflowing
      * this can happen if this program is started during disk events.
      */
     io:genunix::done
     /pending < 0/
     {
            pending = 0;
     }
    Thanks,
    Glen
  • 5. Re: iopending oddness
    807559 Newbie
    Currently Being Moderated
    Hm, interesting. So, the script increments the variable pending when io:genunix::start fires, and decrements when io:genunix::done fires.

    As that number is bounced by I/O events, the script samples over time. In your script, it appears you sample 40k times over 5-second interval. The quantization then distributes pending over the range 0-31. That is, if the number of pending events at 4 sampling points is 12, you'll get a

    value ------------------Distribution---------------- count
    ...
    12 | 4
    ...

    in one of your graphs. You won't get an "@" for hitting .0001 of total though.

    Using this script and a "find / -type f" command, I was able to exceed MAX_PENDING several times over the life of the run. So MAX_PENDING is not a hard limit. The variable pending is set to zero when the script starts, so that explains why restarting "clears" the symptom you're seeing.

    So it looks rather like an "I/O leak," if you will. Over time, the value of pending gets above 32 and -- I would guess -- keeps rising. Something that increments your count isn't getting decremented.

    I have some thoughts on what's causing that; unfortunately, no time to investigate. But it would be a pretty good question to pose on the OpenSolaris discussion group. I wouldn't be surprised if one of the heavyweights chimed in on the problem in less than a few hours.