7 Replies Latest reply on Oct 1, 2013 2:41 AM by sb92075

    Understanding an application outage showing high free buffer waits

    Dropbear67

      Oracle 11.2.0.3 SP3 on HP-UX

      Web Logic Middleware


      We had an interesting application outage yesterday.  The application was basically non responsive due to a high number of active sessions experiencing "free buffer waits".


      We could see that a number of users had been resubmitting the same queries a number of times (same clieint_info, sql_id etc). The bulk of the activity on the system is query

      related and I do not believe (although I have no stats to prove this) that the cause of the free buffer waits was due to DBWR not being able to clear dirty blocks fast enough. I

      have a suspicion that the cause of the waits may be due to the buffers being 'pinned' in the cache whilst being accessed, and hence the processes not being able to find a free

      buffer within the allowable threshhold.


      Q1. what sort of queries can I perform to determine the cause of the free buffer waits, while in the middle of this sort of situation. I can get cumulative counts of

      free buffer waits from v$buffer_pool_statistics, but that's not really telling me much.


      Q2. Looking at  x$kcbwds I found the following which I thought interesting;


      [code]

      SQL> select set_id, fbwait, wcwait, bbwait from x$kcbwds;


          SET_ID     FBWAIT     WCWAIT     BBWAIT

      ---------- ---------- ---------- ----------

               1          0          0          0

               2          0          0          0

      ..

              33          0         60    6495339

       

       

          SET_ID     FBWAIT     WCWAIT     BBWAIT

      ---------- ---------- ---------- ----------

              34          0          3    6469842

              35          0          1    6490057

              36          0          1    6491714

              37          0         15    6479095

              38          0         25    6486739

              39          0          0    6481095

              40          0          0    6501595

              41          0         29    6487908

              42          0          3    6483047

              43          0         34    6487499

              44          0          7    6493323

       

       

          SET_ID     FBWAIT     WCWAIT     BBWAIT

      ---------- ---------- ---------- ----------

              45    1054670       1155    6481360

              46     343269       2109    6486831

              47    3992728      10843    6465402

              48    1977952      10875    6457829

      ..

      [/code]

       

      also querying v$buffer_pool gives

       

      [code]

      SQL> select lo_setid, hi_setid, set_count from v$buffer_pool;

       

       

        LO_SETID   HI_SETID  SET_COUNT

      ---------- ---------- ----------

              33         48         16

      [/code]

       

      Which corresponds to the set_id in  x$kcbwds; containing the FBWAIT (free buffer wait)

       

      Q2. So why are only a small subset of "sets" being used?

        • 1. Re: Understanding an application outage showing high free buffer waits
          sb92075

          Free buffer waits (%)

           

          At what value do the free buffer waits go from acceptable to being "high"?

           

          IMO, you have reported an observation; which may or may not have anything relating the concurrent non-responsive application.

          A few year ago, I had an application "hang" due to one admin. assistant go to lunch while having SELECT FOR UPDATE waiting for COMMIT or ROLLBACK

          She did often enough that I was able to eventually track it done instead of restarting the DB.

           

          what clues exist within AWR report or STATSPACK report

          • 2. Re: Understanding an application outage showing high free buffer waits
            Dropbear67

            "At what value do the free buffer waits go from acceptable to being "high"?"

             

            When we have 75% of "active" sessions waiting on "Free Buffer Waits" - I take that as being unacceptable. The users of the application were reporting it as unresponsive.

             

            The nature of Middle-tier applications with large number of spawned but inactive processes can often mean when these inactive processes suddenly spring to life, spawning multiples of the same query, the overall responsiveness of the application drops to near zero.

             

            Some clues from the AWR report..

            Duration: 60 mins, DB Time: 4000 mins


            Top 5 Foreground Events: #1 is Free Buffer Waits - % DB Time = 50%

            Foreground Wait Class: #1 Configuration - %DB Time = 59%  (very high in my experience)

            Wait Event Histogram: Free Buffer Waits (500,000) for the 60 min duration

            latch: cache buffers chains: 130,000

            • 3. Re: Understanding an application outage showing high free buffer waits
              sb92075

              >Duration: 60 mins, DB Time: 4000 mins

               

              how many cores or CPUs exist in this system?

               

              are the "Free Buffer Waits" the cause of the unresponsiveness or symptom of unknown cause?

               

              I suspect that the root cause is high DML activity.

              When no DML occurs, then low or no "Free Buffer Waits" occur.

              Do you are agree with above?

               

              >Top 5 Foreground Events: #1 is Free Buffer Waits - % DB Time = 50%

              >Foreground Wait Class: #1 Configuration - %DB Time = 59%  (very high in my experience)

               

              how do you report more than 100 %DB Time (50+59)??????????????????????????????????????????

              • 4. Re: Understanding an application outage showing high free buffer waits
                Dropbear67

                4 cpus / 8 cores ....  (

                 

                "are the "Free Buffer Waits" the cause of the unresponsiveness or symptom of unknown cause?"

                 

                IMO they are a symptom of a high number of concurrent sessions executing the same SQL.statements with the same bind parameters

                 

                "

                I suspect that the root cause is high DML activity.

                When no DML occurs, then low or no "Free Buffer Waits" occur.

                Do you are agree with above?"

                 

                Normally,I would say yes, but I *believe* (although I could be wrong) that we can get Free Buffer Waits when our busy buffers are pinned in the cache by being hit by a relatively large number of processes, all trying to get to the same block. So it's not a matter of dirty blocks , rather pinned blocks..  Hence I was after a query which would help me ascertain if I had a high number of pinned vs dirty blocks (not sure if such a query exists).

                • 5. Re: Understanding an application outage showing high free buffer waits
                  sb92075

                  Free buffer waits (%)

                  Description

                  This event occurs mainly when a server process is trying to read a new buffer into the buffer cache but too many buffers are either pinned or dirty and thus unavailable for reuse. The session posts to DBWR then waits for DBWR to create free buffers by writing out dirty buffers to disk.

                  DBWR may not be keeping up with writing dirty buffers in the following situations:

                  • The I/O system is slow.
                  • There are resources it is waiting for, such as latches.
                  • The buffer cache is so small that DBWR spends most of it's time cleaning out buffers for server processes.
                  • The buffer cache is so big that one DBWR process is not enough to free enough buffers in the cache to satisfy requests.
                  • 6. Re: Understanding an application outage showing high free buffer waits
                    Dropbear67

                    "but too many buffers are either pinned or dirty"  - This is what I am trying to determine .. how many pinned vs how many dirty..

                     

                    There are resources it is waiting for, such as latches. - this may also be the case, with a relatively high contention rate on "latch: cache buffers chains"..

                    • 7. Re: Understanding an application outage showing high free buffer waits
                      sb92075

                      Dropbear67 wrote:

                       

                      "but too many buffers are either pinned or dirty"  - This is what I am trying to determine .. how many pinned vs how many dirty..

                       

                      There are resources it is waiting for, such as latches. - this may also be the case, with a relatively high contention rate on "latch: cache buffers chains"..

                      Why does it matter?

                      What action will you take based upon having a number or percentage of pinned buffers?

                       

                      How can you change the number of either dirty or the number of pinned buffers?

                       

                      The only "solution" either is do less DML or increase the DBW throughput.