2 Replies Latest reply: Feb 25, 2013 2:21 AM by PhHein RSS

    Buffer busy waits on UNDO data in Active Dataguard

    Yasu
      Oracle Version: 11.1.0.7.0
      Active Dataguard
      Statspack has been configured for Active Dataguard on Primary database.

      We got an spike of Buffer busy waits for about 5 min in Active Dataguard, this was causing worse Application SQL's response time during this 5 min window.

      Below is what i got from statspack report for one hour
      Snapshot       Snap Id     Snap Time      Sessions Curs/Sess Comment
      ~~~~~~~~    ---------- ------------------ -------- --------- -------------------
      Begin Snap:      18611 21-Feb-13 22:00:02      236       2.2
        End Snap:      18613 21-Feb-13 23:00:02      237       2.1
         Elapsed:               60.00 (mins)
      
      
      Top 5 Timed Events                                                    Avg %Total
      ~~~~~~~~~~~~~~~~~~                                                   wait   Call
      Event                                            Waits    Time (s)   (ms)   Time
      ----------------------------------------- ------------ ----------- ------ ------
      buffer busy waits                                2,359       2,133    904   76.2
      reliable message                                 7,210         179     25    6.4
      parallel recovery control message reply          8,831         109     12    3.9
      CPU time                                                       100           3.6
      latch free                                          13          85   6574    3.1
                -------------------------------------------------------------
      Host CPU  (CPUs: 16)
      ~~~~~~~~              Load Average
                            Begin     End      User  System    Idle     WIO     WCPU
                          ------- -------   ------- ------- ------- ------- --------
                             1.07    0.82      0.68    0.39   98.88    0.00
      Since this is 11g version I was able to drill down on the segment on which buffer busy waits were occurring by using v$active_session_history on Active Dataguard.
      SQL> select count(*),p1,p2 from v$active_session_history where event='buffer busy waits' and sample_time between to_date('21-FEB-2013 21:55:00','DD-MON-YYYY HH24:MI:SS') and to_date('21-FEB-2013 22:09:00','DD-MON-YYYY HH24:MI:SS') group by p1,p2
      
        COUNT(*)         P1         P2
      ---------- ---------- ----------
            2136          3      99405
              17          3       7293
      
      SQL> select owner,segment_name,segment_type from dba_extents where file_id = 3 and 99405 between block_id AND block_id + blocks - 1 
      OWNER                          SEGMENT_NAME              SEGMENT_TYPE
      ------------------------------ ------------------------- --------------------
      SYS                            _SYSSMU14_1303827994$     TYPE2 UNDO
      
      SQL> select owner,segment_name,segment_type from dba_extents where file_id = 3 and 7293 between block_id AND block_id + blocks - 1;
      OWNER                          SEGMENT_NAME              SEGMENT_TYPE
      ------------------------------ ------------------------- --------------------
      SYS                            _SYSSMU11_1303827994$     TYPE2 UNDO
      Thought to check the SQL_ID which were waiting on this buffer busy waits.
      SQL> select count(*),sql_id,session_state from v$active_session_history where event='buffer busy waits' and sample_time between to_date('21-FEB-2013 21:55:00','DD-MON-YYYY HH24:MI:SS') and to_date('21-FEB-2013 22:09:00','DD-MON-YYYY HH24:MI:SS') group by sql_id,session_state order by 1;
      
        COUNT(*) SQL_ID        SESSION_STATE
      ---------- ------------- ----------------------------
               1 cvypjyh0mm56x WAITING
               1 02dtz82as4y42 WAITING
               1 80gz2r4hx1wrj WAITING
               2 6tfk1t4mwt7hu WAITING
               9 0q63qhsbqmpf0 WAITING
              12 0jgnx96ur0bmb WAITING
              12 7pguapqcc6372 WAITING
              14 4t6hqk5r2zbqs WAITING
              18 1qwt0qkd59xj3 WAITING
              23 5phgg8btvhh6p WAITING
              23 banp2v6yttym7 WAITING
              30 a1kdmb1x084yh WAITING
              30 8hxuagk22f8jz WAITING
              30 9r0nysyp360hn WAITING
              31 cackx62yu477k WAITING
              32 40zxqg1qrdvuh WAITING
              32 0jqrd56ds1rbm WAITING
              32 7009zmuhvac54 WAITING
              38 1jb37ryn1c871 WAITING
              60 aum74caa623rs WAITING
              63 cr8mv0wawhak9 WAITING
              63 3xgk3vsh3nm08 WAITING
              86 3k9cq3jv0c3rg WAITING
              95 0sy9vjuutgwqu WAITING
             122 bhn2kk76wpg12 WAITING
             134 4pkfqgyt7rh34 WAITING
             139 1sbzsw7y88c7t WAITING
             146 92y0ha2nqd6zj WAITING
             163 djjqcp1sg2twb WAITING
             173 arxq6au12zazw WAITING
             256 fa0gzxmgyyxj2 WAITING
             282 2f17qywcgu751 WAITING
      So top 10 sql_id's were on tables TAB1 and TAB2 under schemas SCHEMA1 to SCHEMA8.

      Checked DML's occurred on Primary using dba_tab_modifications view since last stats job ran on these was about 10 hours ago from when the issue occurred on Active Dataguard.
      SQL> select TABLE_OWNER,TABLE_NAME,INSERTS,UPDATES,DELETES from dba_tab_modifications where TABLE_NAME='TAB1' order by 3;
      
      TABLE_OWNER               TABLE_NAME              INSERTS    UPDATES    DELETES 
      ------------------------------ --------------------- ---------- ---------- ---------- 
      SCHEMA1                    TAB1                     4448          0       3728 
      SCHEMA2                    TAB1                     4547          0       4022 
      SCHEMA3                    TAB1                     4612          0       4152 
      SCHEMA4                    TAB1                     4628          0       3940 
      SCHEMA5                    TAB1                     4719          0       4258 
      SCHEMA6                    TAB1                     4809          0       4292 
      SCHEMA7                    TAB1                     4853          0       4356 
      SCHEMA8                    TAB1                     5049          0       4536 
      
      
      SQL> select TABLE_OWNER,TABLE_NAME,INSERTS,UPDATES,DELETES from dba_tab_modifications where TABLE_NAME='TAB2' order by 3;
      
      TABLE_OWNER               TABLE_NAME                  INSERTS    UPDATES    DELETES 
      --------------------- ------------------------- ---------- ---------- ---------- 
      SCHEMA1                    TAB2                        25546          0      26360 
      SCHEMA2                    TAB2                        26728          0      27565 
      SCHEMA3                    TAB2                        27403          0      27763 
      SCHEMA4                    TAB2                        27500          0      28149 
      SCHEMA5                    TAB2                        28408          0      30440 
      SCHEMA6                    TAB2                        30453          0      31906 
      SCHEMA7                    TAB2                        31469          0      31988 
      SCHEMA8                    TAB2                        32875          0      34670 
      But confused about Why there could sudden spike of demand on UNDO data in Active Data Guard ? Could any one please shed some lights on finding the reason for this issue ?
        • 1. Re: Buffer busy waits on UNDO data in Active Dataguard
          CKPT
          But confused about Why there could sudden spike of demand on UNDO data in Active Data Guard ? Could any one please shed some lights on finding the reason for this issue ?
          It's been interesting, The job runs only on ADG?
          Even it is only reporting purposes, Which runs only select statements. Are you sure that issue is only because of this job?
          Moreover am interested to know, How you able to monitor at the same time? Using EM?
          What all are the jobs ran on primary at the same time?
          Then, is it possible to run the job on primary and see whats the response?
          I suggest you to run the same job again on standby and see the ET of the job and also gather the statspack to check whether you got same buffer busy waits or not.
          What storage you are using for primary and standby? In terms of I/O and performance all are same?
          You got chance to take statspack even on primary database?
          What are the parameters differ in primary and standby?

          Also check this note, Heresome work around provided.
          *Resolving Intense and "Random" Buffer Busy Wait Performance Problems [ID 155971.1]*
          • 2. Re: Buffer busy waits on UNDO data in Active Dataguard
            PhHein
            Yasu, please don't multipost!

            I'm locking this thread.