11 Replies Latest reply: Apr 1, 2010 5:22 PM by Charles Hooper RSS

    Bad performance - Comparing stats pack information from a good time period:

    dmaddox
      Users are complaining about a spike in performance. I pulled AWR reports 1 hour before and then during the issue.

      I pulled 1 hour in length reports.

      In comparing the trouble hour with the good hour I found the following:

      1. Massive Redo. 270K/sec vs. 35K/sec
      2. dbfile sequential reads higher
      3. Higher hard parse/parse percentages
      4. Higher db time
      5. SQL had 32 elap per execs vs. nearly 0 (not sure what this means).
      6. massive undo - transactions were 180K vs. 18K the hour previous.
      7. Specifically at 12:14 there were 86k ransactions with 20K undo blocks (the spike time).
      8. SQL area in the library cache stats showed a 52% miss ratio.

      Any ideas where I can go with this data? What do I point the finger at? I am guessing there was some massing ad-hoc sql being run? Sorry I am new to P/T.

      thanks for any insight.

      -dallas
        • 1. Re: Bad performance - Comparing stats pack information from a good time period:
          sb92075
          I am guessing there was some massing ad-hoc sql being run?
          SELECT does not generate redo/undo
          • 2. Re: Bad performance - Comparing stats pack information from a good time period:
            dmaddox
            So what would be your guess? What information in the AWR reports would or could I look into for a RCA. At this point I'm just asking to see where you guys would start digging. If DML-ish type stuff is not going to create what appears to be a redo issue and DDL would - do I look into the sql executions? The highest (time-wise) sql is a select statement (900 seconds, taking 17% of the dbtime). /shrug

            -dallas
            • 3. Re: Bad performance - Comparing stats pack information from a good time period:
              sb92075
              Post Operating System (OS) name & version for DB server system.
              Post results of
              SELECT * from v$version

              In my opinion, AWR is next to useless to identifying performance issue.
              • 4. Re: Bad performance - Comparing stats pack information from a good time period:
                sybrand_b
                1 hour intervals are fully useless.

                --------
                Sybrand Bakker
                Senior Oracle DBA
                • 5. Re: Bad performance - Comparing stats pack information from a good time period:
                  Timur Akhmadeev
                  sb92075 wrote:
                  SELECT does not generate redo/undo
                  SQL> select n.name, s.value from v$sesstat s, v$statname n
                    2  where s.statistic#=n.statistic# and s.sid=userenv('sid') and (n.name like 'undo%' or n.name = 'redo size');
                  
                  NAME                                                                  VALUE
                  ---------------------------------------------------------------- ----------
                  redo size                                                                 0
                  undo change vector size                                                   0
                  undo segment header was pinned                                            0
                  
                  SQL> with t as (select null from v$statname)
                    2  select count(*)
                    3    from t t1, t t2;
                  
                    COUNT(*)
                  ----------
                      132496
                  
                  SQL> select n.name, s.value from v$sesstat s, v$statname n
                    2  where s.statistic#=n.statistic# and s.sid=userenv('sid') and (n.name like 'undo%' or n.name = 'redo size');
                  
                  NAME                                                                  VALUE
                  ---------------------------------------------------------------- ----------
                  redo size                                                               736
                  undo change vector size                                                 168
                  undo segment header was pinned                                            0
                  There is delayed block cleanout too.
                  • 6. Re: Bad performance - Comparing stats pack information from a good time period:
                    sybrand_b
                    You look in the TOP-5 events first

                    -------
                    Sybrand Bakker
                    Senior Oracle DBA
                    • 7. Re: Bad performance - Comparing stats pack information from a good time period:
                      dmaddox
                      10.2.0.4.0 64bit on 64bit linux

                      Top 5 Timed Events

                      Event_____________|__Waits_|_Time(s)_|_Avg Wait(ms)_|_% Total Call Time_|_Wait Class
                      CPU time_______________________1,538__________________________34.0
                      db file sequential read___181,367_____902_____________5____________20.0____User I/O
                      gc buffer busy__________40,632_____640____________16____________14.2_____Cluster
                      gc current block busy___13,575______405____________30_____________9.0_____Cluster
                      log file parallel write_____95,390______293_____________3_____________6.5__System I/O
                      • 8. Re: Bad performance - Comparing stats pack information from a good time period:
                        dmaddox
                        IS or HOW would disk contention on index reads be related to high REDO?
                        • 9. Re: Bad performance - Comparing stats pack information from a good time period:
                          avramits
                          there is clear indication that some DML statements were running on database. RUN ASH report and check for insert, update and delete statements executed in that period.
                          • 10. Re: Bad performance - Comparing stats pack information from a good time period:
                            sb92075
                            IS or HOW would disk contention on index reads be related to high REDO?
                            High REDO results from DML.
                            Index reads could result from using indexes to decide which rows get changed.

                            It would help if you could identify the SQL producing the high redo rates.
                            • 11. Re: Bad performance - Comparing stats pack information from a good time period:
                              Charles Hooper
                              Top 5 Timed Events 
                               
                              Event                       Waits | Time(s) | Avg Wait(ms) | % Total Call Time | Wait Class
                              CPU time                              1,538                              34.0 
                              db file sequential read   181,367       902              5               20.0    User I/O 
                              gc buffer busy             40,632       640             16               14.2     Cluster 
                              gc current block busy      13,575       405             30                9.0     Cluster 
                              log file parallel write    95,390       293              3                6.5  System I/O
                              I agree with Sybrand in this case. The average waits do not show much of a problem for this time period, and this instance in the RAC configuration. For this one hour time period, if the server only had a single CPU, it would only be about 43% busy - you can use that statistic to get a handle on the average impact of the other true wait events that are listed in the top 5. There is no log file sync wait i(foreground) n the top 5, which is somewhat interesting given that there were 95,390 waits on log file parallel write (the background wait). The average wait time for the global cache wait events seems to be a little high.

                              Suggestions:
                              1. Check the documentation: http://download.oracle.com/docs/cd/B19306_01/rac.102/b28759/racmon.htm
                              2. Since you are looking at an AWR report, I assume that you have the appropriate license for that feature - if so that means that you should be able to view an ADDM report.
                              3. Take a look at the activity of the other nodes in the RAC configuration, generate an AWR report for those nodes.
                              4. Try to generate an AWR report for a 10 to 15 minute time period during the period of poor performance.
                              5. Find a session that is exhibiting a performance problem and enable a 10046 trace at level 8 or 12 for that session.
                              6. When posting information where spacing is critical, use a { code } tag (without spaces) before and after that section of your post.

                              Charles Hooper
                              Co-author of "Expert Oracle Practices: Oracle Database Administration from the Oak Table"
                              http://hoopercharles.wordpress.com/
                              IT Manager/Oracle DBA
                              K&M Machine-Fabricating, Inc.