This discussion is archived
1 2 3 Previous Next 34 Replies Latest reply: May 24, 2013 7:22 AM by Bobby Durrett RSS

free buffer waits

Bobby Durrett Explorer
Currently Being Moderated
We are seeing some free buffer waits contention and I wanted to get some input from the forum participants if you have time to address it.

It is on an HP-UX 11.31 Itanium system running 11.2.0.3 of Oracle. This is a datawarehouse staging database and there are about 60 merge statements running in parallel (themselves also doing parallel dml) doing several million updates against 60 different tables with a total rowcount around 2 billion.

The dev team is putting in a more efficient update statement so that may in itself resolve these issues but still I'd like to see if the number of db writer processes we have makes sense. As far as I know we do not have asych io configured on our OS due to some bugs we have seen in the past. The server has 14 cpus and 95 gig of memory. Here are the top 5 events from our AWR report:
Top 5 Timed Foreground Events

Event                   Waits      Time(s)  Avg wait (ms) % DB time Wait Class 
free buffer waits       319,324    261,188            818     46.08 Configuration 
db file parallel read   134,710    62,404             463     11.01 User I/O 
DB CPU                             60,818                     10.73   
db file sequential read 11,783,603 26,032               2      4.59 User I/O 
write complete waits    4,015      13,828            3444      2.44 Configuration 
Does it make sense that I should increase the number of db writers?

db_writer_processes=4 in our system. With 14 cpus this is supposed to be enough as I understand it. But we have 60 dedicated server processes doing updates and only four db writer processes doing writes so it kind of makes sense to increase the number of writers.

I'm researching this on my own but I would appreciate any input you have on this issue.

Thanks,
Bobby
  • 1. Re: free buffer waits
    sb92075 Guru
    Currently Being Moderated
    Bobby Durrett wrote:

    Does it make sense that I should increase the number of db writers?
    No increase is warranted, IMO.
    db_writer_processes=4 in our system. With 14 cpus this is supposed to be enough as I understand it. But we have 60 dedicated server processes doing updates and only four db writer processes doing writes so it kind of makes sense to increase the number of writers.
    The bottleneck will be in disk subsystem; either at the controller or physical spindles.
    I am willing to give you odds that if you do increase the number of DB Writers,
    that there will be NO measurable increase in performance or reduced run times.
  • 2. Re: free buffer waits
    Mark D Powell Guru
    Currently Being Moderated
    Bobby, what kind of memory managment scheme is in use? How big is the database buffer cache?
    - -
    Whenever PQO and parallel DML is in use one thing that concerns me is what is the maximum degree of parallelism allowed. Over allocation of process can bog the system as a whole down. How is parallelism assigned (manaual vs system)? How many parallel sessions are being allocated for any one task.

    You said you have 14 cpu's so I would question the advisability of allowing any process to exceed 14. The other limiting factor is as SB mentioned: disk io capacity. How many channels to the disk are there, how many physical disk drives are available, and what is the total IO capacity of the disk system?

    If the parallel SQL is executed concurrently then again the number of slave sessions any one task can have may need to be limited. Such a statement might need to be limited to 2, 4, or 6 slaves.

    HTH -- Mark D Powell --
  • 3. Re: free buffer waits
    JohnWatson Guru
    Currently Being Moderated
    You say that you do not have asynch IO enabled. In that case, I tihnk you need to try a different approach: set db_writer_processes=1 and dbwr_io_slaves=32 (32 is a guess, try it and see hpow it goes).
    --
    John Watson
    Oracle Certified Master DBA
    http://skillbuilders.com
  • 4. Re: free buffer waits
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    Bobby Durrett wrote:
    The server has 14 cpus and 95 gig of memory. Here are the top 5 events from our AWR report:
    Top 5 Timed Foreground Events
    
    Event                   Waits      Time(s)  Avg wait (ms) % DB time Wait Class 
    free buffer waits       319,324    261,188            818     46.08 Configuration 
    db file parallel read   134,710    62,404             463     11.01 User I/O 
    DB CPU                             60,818                     10.73   
    db file sequential read 11,783,603 26,032               2      4.59 User I/O 
    write complete waits    4,015      13,828            3444      2.44 Configuration 
    What's the duration of this snapshot ?

    With 14 CPUs it must be longer than 1 hour.

    The db file sequential reads are fast - which suggests you don't have an I/O problem unless it's a locking problem between reads and writes at the file system level, so I would check the filesystem_io_options (Oracle) and the HP-UX filesystem options - I think HP-UX has a "concurrent I/O" or "direct I/O" option that bypasses the filesystem cache.

    Apart from that I'd look at ASH to get some idea of which objects are involved in the free buffer waits and write complete waits, and I'd also check that the problem wasn't related (at least in part) to having a recycle buffer pool ( http://jonathanlewis.wordpress.com/2006/11/21/free-buffer-waits/ ) I'd also cross-check against the timing for db file parallel writes, and look at the SQL that was doing the largest volume of reads in case there were a few easy options for reducing the read load.


    Regards
    Jonathan Lewis
  • 5. Re: free buffer waits
    Bobby Durrett Explorer
    Currently Being Moderated
    Jonathan

    It is a two hour time span. I read your article yesterday and checked that we don't have a recycle or keep pool defined. I really need to check on the filesytem options like you suggest. Years ago I tried to get our unix team to move the datafile filesystems to direct io to bypass the filesystem buffer cache without any luck so I think we are using it but I'll double check.

    - Bobby
  • 6. Re: free buffer waits
    Bobby Durrett Explorer
    Currently Being Moderated
    We have automatic memory management and the buffer cache is about 11 gig.

    Parallelism is set manually on the tables. I'm sure the system is maxed out. I doubt that all 60 of the merge statements are really getting parallel slaves.

    I'm trying to get the I/O capacity info from my unix/san teams and to go back to my os monitoring tools to see if we are maxed out. it is a pretty nice SAN - HP XP24000 and the box this is on has been running for a few years now with a lot of heavy updating so it seems specific to this new bunch of merge statements - a new application with a quarterly peak in activity.

    I'm with you on limiting the parallelism. I recommended replacing the merge statements with updates that run serially. If you have 60 running at once you don't need parallel DML on each one.

    - Bobby
  • 7. Re: free buffer waits
    Bobby Durrett Explorer
    Currently Being Moderated
    Here is a link to a single ASH sample during one of the times of peak free buffer waits:

    http://www.bobbydurrettdba.com/uploads/ashupload.txt

    It shows the db writer processes banging away on the disk and a bunch of user processes waiting on free buffer waits.

    - Bobby
  • 8. Re: free buffer waits
    Bobby Durrett Explorer
    Currently Being Moderated
    It does look like we have the async device installed:

    $ ls -al /dev/asy*
    crw-rw-rw- 1 bin bin 101 0x000000 May 17 2007 /dev/async
    crw-rw-rw- 1 bin bin 101 0x000000 May 17 2007 /dev/asyncdsk

    But the filesystems aren't mounted with mincache=direct,convosync=direct as is recommended in Oracle document 457063.1.

    Here are the mount options:

    ioerror=mwdisable,largefiles,delaylog

    The following parameters seem relevent:

    disk_asynch_io boolean FALSE
    filesystemio_options string asynch
    dbwr_io_slaves integer 0
    db_writer_processes integer 4

    - Bobby
  • 9. Re: free buffer waits
    Bobby Durrett Explorer
    Currently Being Moderated
    Well, I think I'm back to increasing the number of db writers. Here is part of Oracle note 139272.1: HP-UX: Asynchronous i/o

    "On HP-UX, when the Oracle datafiles reside on a filesystem, then the DBWR
    process(es) make synchronous writes to these datafiles. This means that each
    write must return with a 'succesful completion' before the next write is
    issued. This serial operation can lead to a i/o bottleneck. There are two ways
    to counteract this:
    a. configure multiple DBWR processes
    b. use asynchronous i/o

    Before deciding on one of these two options, it should be noted that on HP-UX,
    aio is *only* possible on a raw device. Put in another way, aio *cannot* be used
    on a filesystem.

    Multiple DBWRs can be used on a filesystem."

    So, I think on an HP-UX filesystem the conventional wisdom on the number of db_writers goes out the window. Anyway, the only question is whether it is worth my effort to test it and now I'm convinced that it is.

    I read an interesting post by Kevin Closson on how you don't need as many db writers as you think but it looks like it comes down to async i/o. In HP-UX on a regular filesystem the writes are synchronous so surely doubling the number of db writers would double the output. Anyway, I need to do a test now I just need to find a system to test it on.

    - Bobby
  • 10. Re: free buffer waits
    Bobby Durrett Explorer
    Currently Being Moderated
    I'm going to mark this as answered even though I need to test out what I think the real answer is. Thanks for everyone's help. If anyone has additional follow up comments that would be great.
  • 11. Re: free buffer waits
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    Bobby Durrett wrote:
    Well, I think I'm back to increasing the number of db writers. Here is part of Oracle note 139272.1: HP-UX: Asynchronous i/o
    It may help - but I'm doubtful.

    If nothing else, the comments in the note look as if they're at least 10 years old. (Note, in particular, that one of the references at the end of the note refers to a document about Oracle 8.1.7; I'm also slightly surprised by the lack of reference to I/O Slaves which were introduced to emulate async I/O for platforms that didn't support it; (update) and then there's a reference to an HP-UX patch is for 11.00, and descriptions of what to do for Oracle versions 7.3 and earlier!)
    This serial operation can lead to a i/o bottleneck. There are two ways to counteract this:
    a. configure multiple DBWR processes
    b. use asynchronous i/o
    You haven't told us anything about the db file parallel write times yet. If they're slow then increasing the number of db writers is likely to make them slower.

    On the information you've given so far, I'd investigate setting filesystem_io_options to setall, in combination with mounting the file system with convosync={whatever it was}, as I suspect locking at the file level within the filesystem cache. (I'm not sure whether you need to set mincache=direct as well to see the write benefit - the problem is that your 11M single block reads might slow down because they would be bypassing the filesystem cache.)

    It's at this point on a client site that I tend to get into detailed discussion with the System Admin to understand exactly what filesystem and O/S options are currently available and exactly what they do. These things are too variable (and often badly understood) between operating systems and versions.


    Regards
    Jonathan Lewis

    Edited by: Jonathan Lewis on Apr 24, 2013 9:38 AM
  • 12. Re: free buffer waits
    Bobby Durrett Explorer
    Currently Being Moderated
    Jonathan,

    Thanks for your reply. This is the top two items from the background wait events section of the awr report I referred to earlier:
    Event                     Waits %Time-outs Total Wait Time (s) Avg wait (ms) Waits /txn % bg time 
    db file parallel write    7,108          0              22,171          3119       0.01     85.60 
    log file parallel write 138,918          0                 851             6       0.23      3.29 
    It looks like the db file parallel write waits are averaging 3 seconds. I'm not sure what that means. I know that Oracle has 3 second timeouts that apply in certain cases.

    139272.1 is an older note, but it looks like it was updated in 2011 so that made me think it might still be applicable. So, when you talk about locking at the filesystem level and the convosync option are you saying that as blocks are written out to the filesystem cache and then to disk there is some sort of locking that can limit the amount of parallel writes that can occur? I'm not familiar with that. I can search for it but are you aware of any blog posts or papers out there that describe this effect?

    Thanks again for your feedback and that of the other posters.

    - Bobby
  • 13. Re: free buffer waits
    Bobby Durrett Explorer
    Currently Being Moderated
    Jonathan,

    This is some parallel write wait infromation from ASH for a peak free buffer waits time:
    EVENT                  P1TEXT           P1
    ---------------------- -------- ----------
    db file parallel write requests       3917
    db file parallel write requests       3931
    db file parallel write requests       3897
    db file parallel write requests       3849
    It appears that each free buffer wait event has about 3900 blocks requested. So, if we are taking 3000 milliseconds per db file parallel write wait and each parallel write has a pile of 3900 blocks that sounds like roughly 1 ms per block which isn't bad.

    The full ASH query and its log are at http://www.bobbydurrettdba.com/uploads/ashupload.txt

    Note that this ASH output is from a different time period than the AWR report we have been reviewing but with the same profile and same database and batch jobs.

    - Bobby
  • 14. Re: free buffer waits
    Bobby Durrett Explorer
    Currently Being Moderated
    Interestingly enough I did a test and setting db_writer_processes=1 and dbwr_io_slaves=32 made a big improvement while increasing the number of db writers did nothing.

    15 concurrent updates updating 100,000 rows each

    db_writer_processes=4 and dbwr_io_slaves=0 averaged 93 elapsed seconds
    db_writer_processes=1 and dbwr_io_slaves=32 averaged 23 elapsed seconds
    db_writer_processes=36 and dbwr_io_slaves=0 averaged 99 elapsed seconds

    So, it looks like increasing the io slaves could make a big improvement.

    - Bobby
1 2 3 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points