This discussion is archived
4 Replies Latest reply: Feb 1, 2011 11:18 AM by Centinul RSS

STATSPACK Performance Question / Discrepancy

Centinul Guru
Currently Being Moderated
I'm trying to troubleshoot a performance issue and I'm having trouble interpreting the STATSPACK report. It seems like the STATSPACK report is missing information that I expect to be there. I'll explain below.

Header
STATSPACK report for

Database    DB Id    Instance     Inst Num  Startup Time   Release     RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
          2636235846 testdb              1 30-Jan-11 16:10 11.2.0.2.0  NO

Host Name             Platform                CPUs Cores Sockets   Memory (G)
~~~~ ---------------- ---------------------- ----- ----- ------- ------------
     TEST             Microsoft Windows IA (     4     2       0          3.4

Snapshot       Snap Id     Snap Time      Sessions Curs/Sess Comment
~~~~~~~~    ---------- ------------------ -------- --------- ------------------
Begin Snap:       3427 01-Feb-11 06:40:00       65       4.4
  End Snap:       3428 01-Feb-11 07:00:00       66       4.1
   Elapsed:      20.00 (mins) Av Act Sess:       7.3
   DB time:     146.39 (mins)      DB CPU:       8.27 (mins)

Cache Sizes            Begin        End
~~~~~~~~~~~       ---------- ----------
    Buffer Cache:       192M       176M   Std Block Size:         8K
     Shared Pool:       396M       412M       Log Buffer:    10,848K

Load Profile              Per Second    Per Transaction    Per Exec    Per Call
~~~~~~~~~~~~      ------------------  ----------------- ----------- -----------
      DB time(s):                7.3                2.0        0.06        0.04
       DB CPU(s):                0.4                0.1        0.00        0.00
       Redo size:            6,366.0            1,722.1
   Logical reads:            1,114.6              301.5
   Block changes:               35.8                9.7
  Physical reads:               44.9               12.1
 Physical writes:                1.5                0.4
      User calls:              192.2               52.0
          Parses:              101.5               27.5
     Hard parses:                3.6                1.0
W/A MB processed:                0.1                0.0
          Logons:                0.1                0.0
        Executes:              115.1               31.1
       Rollbacks:                0.0                0.0
    Transactions:                3.7
As you can see a significant amount of time was spent in database calls (DB Time) with relatively little time on CPU (DB CPU). Initially that made me think there were some significant wait events.
Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
log file sequential read                        48,166         681     14    7.9
CPU time                                                       484           5.6
db file sequential read                         35,357         205      6    2.4
control file sequential read                    50,747          23      0     .3
Disk file operations I/O                        16,518          18      1     .2
          -------------------------------------------------------------
However, looking at the Top 5 Timed Events I don't see anything out of the ordinary given my normal operations. the log file sequential read may be a little slow but it doesn't make up a significant portion of the execution time.

Based on an Excel/VB spreadsheet I wrote, which converts STATSPACK data to graphical form, I suspected that there was a wait event not listed here. So I decided to query the data directly. Here is the query and result.
SQL> SELECT wait_class
  2       , event
  3       , delta/POWER(10,6) AS delta_sec
  4  FROM
  5  (
  6          SELECT syev.snap_id
  7               , evna.wait_class
  8               , syev.event
  9               , syev.time_waited_micro
 10               , syev.time_waited_micro - LAG(syev.time_waited_micro) OVER (PARTITION BY syev.event ORDER BY syev.snap_id) AS delta
 11          FROM   perfstat.stats$system_event syev
 12          JOIN   v$event_name                evna  ON  evna.name     = syev.event
 13          WHERE  syev.snap_id IN (3427,3428)
 14  )
 15  WHERE delta > 0
 16  ORDER BY delta DESC
 17  ;
?WAIT_CLASS               EVENT                                                                        DELTA_SEC
------------------------- --------------------------------------------------------------------------- ----------
Idle                      SQL*Net message from client                                                  21169.742
Idle                      rdbms ipc message                                                            19708.390
Application               enq: TM - contention                                                       7199.819
Idle                      Space Manager: slave idle wait                                             3001.719
Idle                      DIAG idle wait                                                             2382.943
Idle                      jobq slave wait                                                            1258.829
Idle                      smon timer                                                                 1220.902
Idle                      Streams AQ: qmn coordinator idle wait                                      1204.648
Idle                      Streams AQ: qmn slave idle wait                                            1204.637
Idle                      pmon timer                                                                 1197.898
Idle                      Streams AQ: waiting for messages in the queue                              1197.484
Idle                      Streams AQ: waiting for time management or cleanup tasks                    791.803
System I/O                log file sequential read                                                    681.444
User I/O                  db file sequential read                                                     204.721
System I/O                control file sequential read                                                 23.168
User I/O                  Disk file operations I/O                                                     17.737
User I/O                  db file parallel read                                                        14.536
System I/O                log file parallel write                                                       7.618
Commit                    log file sync                                                                 7.150
User I/O                  db file scattered read                                                        3.488
Idle                      SGA: MMAN sleep for component shrink                                          2.461
User I/O                  direct path read                                                              1.621
Other                     process diagnostic dump                                                       1.418
... snip ...
So based on the above it looks like there was a significant amount of time spent in enq: TM - contention

Question 1

Why does this wait event not show up in the Top 5 Timed Events section? Note that this wait event is also not listed in any of the other wait events sections either.

Moving on, I decided to look at the Time Model Statistics
Time Model System Stats  DB/Inst: testdb  /testdb    Snaps: 3427-3428
-> Ordered by % of DB time desc, Statistic name

Statistic                                       Time (s) % DB time
----------------------------------- -------------------- ---------
sql execute elapsed time                         8,731.0      99.4
PL/SQL execution elapsed time                    1,201.1      13.7
DB CPU                                             496.3       5.7
parse time elapsed                                  26.4        .3
hard parse elapsed time                             21.1        .2
PL/SQL compilation elapsed time                      2.8        .0
connection management call elapsed                   0.6        .0
hard parse (bind mismatch) elapsed                   0.5        .0
hard parse (sharing criteria) elaps                  0.5        .0
failed parse elapsed time                            0.0        .0
repeated bind elapsed time                           0.0        .0
sequence load elapsed time                           0.0        .0
DB time                                          8,783.2
background elapsed time                             87.1
background cpu time                                  2.4
Great, so it looks like I spent >99% of DB Time in SQL calls. I decided to scroll to the SQL ordered by Elapsed time section. The header information surprised me.
SQL ordered by Elapsed time for DB: testdb    Instance: testdb    Snaps: 3427 -3
-> Total DB Time (s):           8,783
-> Captured SQL accounts for    4.1% of Total DB Time
-> SQL reported below exceeded  1.0% of Total DB Time
If I'm spending > 99% of my time in SQL, I would have expected the captured % to be higher.

Question 2

Am I correct in assuming that a long running SQL that started before the first snap and is still running at the end of the second snap would not display in this section?

Question 3

Would that answer my wait event question above? Ala, are wait events not reported until the action that is waiting (execution of a SQL statement for example) is complete?

So I looked a few snaps past what I have posted here. I still haven't determined why the enq: TM - contention wait is not displayed anywhere in the STATSPACK reports. I did end up finding an interesting PL/SQL block that may have been causing the issues. Here is the SQL ordered by Elapsed time for a snapshot that was taken an hour after the one I posted.
SQL ordered by Elapsed time for DB: testdb    Instance: testdb    Snaps: 3431 -3
-> Total DB Time (s):           1,088
-> Captured SQL accounts for ######% of Total DB Time
-> SQL reported below exceeded  1.0% of Total DB Time

  Elapsed                Elap per            CPU                        Old
  Time (s)   Executions  Exec (s)  %Total   Time (s)  Physical Reads Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
  26492.65           29     913.54 ######    1539.34             480 1013630726
Module: OEM.CacheModeWaitPool
BEGIN EMDW_LOG.set_context(MGMT_JOB_ENGINE.MODULE_NAME, :1); BEG
IN MGMT_JOB_ENGINE.process_wait_step(:2);END; EMDW_LOG.set_conte
xt; END;
I'm still not sure if this is the problem child or not.

I just wanted to post this to get your thoughts on how I correctly/incorrectly attacked this problem and to see if you can fill in any gaps in my understanding.

Thanks!
  • 1. Re: STATSPACK Performance Question / Discrepancy
    sybrand_b Guru
    Currently Being Moderated
    Centinul,

    A few remarks

    1 The top 5 section suppresses 'Idle events'. What qualifies as idle events in a separate statspack table
    2 The statspack code exclusively consists of inner joins. This means: if the SQL is not in snapshot 1, is in snapshot 2 and 3, and you report on snapshot 1 and 3, it will NOT be listed.
    3 Also the SQL is tracked based on thresholds. If the values in V$sql do not exceed the threshold, the SQL is not in the snapshot.

    -----------
    Sybrand Bakker
    Senior Oracle DBA
  • 2. Re: STATSPACK Performance Question / Discrepancy
    Centinul Guru
    Currently Being Moderated
    I was able to find the right set of snapshots to encompass the wait events.
  • 3. Re: STATSPACK Performance Question / Discrepancy
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    Centinul wrote:

    I'm still not sure if this is the problem child or not.

    I just wanted to post this to get your thoughts on how I correctly/incorrectly attacked this problem and to see if you can fill in any gaps in my understanding.
    I think you've attacked the problem well.
    It has prompted me to take a little look at what's going on, running 11.1.0.6 in my case, and something IS broken.

    The key predicate in statspack for reporting top 5 is:
                      and e.total_waits         > nvl(b.total_waits,0)
    In other words, an event gets reported if total_waits increased across the period.
    So I've been taking snapshots of v$system_event and looking at 10046 trace files at level 8. The basic test was as simple as:
    <ul>
    Session 1: lock table t1 in exclusive mode
    Session 2: lock table t1 in exclusive mode
    </ul>


    About three seconds after session 2 started to wait, v$system_event incremented total_waits (for the "enq: TM - contention" event). When I committed in session 1 the total_waits figure did not change.

    Now do this after waiting across a snapshot:
    We start to wait, after three seconds we record a wait, a few minutes later perfstat takes a snapshot.
    30 minutes later "session 1" commits and our wait ends, but we do not increment total_waits, but we record 30+ minutes wait time.
    30 minutes later perfstat takes another snapshot
    The total_waits has not changed between the start and end snapshot even though we have added 30 minutes to the "enq: TM - contention" in the interim.

    The statspack report loses our 30 minutes from the Top N.

    It's a bug - raise an SR.

    Edit: The AWR will have the same problem, of course.

    Regards
    Jonathan Lewis

    Edited by: Jonathan Lewis on Feb 1, 2011 7:07 PM
  • 4. Re: STATSPACK Performance Question / Discrepancy
    Centinul Guru
    Currently Being Moderated
    Jonathan Lewis wrote:
    I think you've attacked the problem well.
    Thanks I appreciate that!
    The statspack report loses our 30 minutes from the Top N.

    It's a bug - raise an SR.
    I'm glad I was on the right track about information missing from the STATSPACK report. I'll try and put a simple test case together (thanks for your scenario, I'll use that as my base) and raise an SR.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points