This content has been marked as final. Show 19 replies
here it is: MAX(0.5M, (128K * number of cpus))
32M seems to me way too much.
Well, the ADDM recomendation is good since that tool analyzes a set of parameters and statistics to get that figure. If you want the rationale, then you should take a look to the Oracle Wait Interface views or Statspack reports.
These are (some) of the events related to this structure
log buffer space
log file switch completion
log file sync
Yes, that formula is good for first estimation... then you have to tune it checking for wait events related to log processing.
Here is my report from AWR
%Time Total Wait wait Waits
Event Waits -outs Time (s) (ms) /txn
---------------------------- -------------- ------ ----------- ------- ---------
log file sync 82,341 1.6 3,899 47 1.2
log file switch completion 286 40.6 159 556 0.0
log buffer space 10 50.0 6 625 0.0
My redo size is of 750 MB each for total size 1.5 TB database
Well, one of the wait events is related to the log buffer: the one that takes most time.
The other: 'log file switch completion' may require some IO tunning. I would recommend metalink Note:223117.1
1. Your 2M log buffer is a bit small compared with data volumes these days. ADDM recommendation seems reasonable. I've seen customers with hundreds of megabytes of lob buffer with no side effect.
2. Your top events mean followings(If you're not familiar...)
- log file sync: You "commit" the transaction and wait for the redo write to be completed
- log file switch completion: You want to write to redo log but log file switch is not completed yet.
- log buffer space: You want to generate redo data to log buffer, but your log buffer is full. You wait till the specific region of log buffer space is released by other transactions.
3. These whole things mean following possiblities:
- You're generating lots of redo data
- You're committing very frequently
- You're redo log size is somewhat small that log file switch happens too frequently
- And other many possibilities like slow I/O performance...
More investigation will show which one is main cause. For instance, there are couple of ways to alleviate massive redo generation.
4. If your application is load-oriented, redo contention is natural and inevitable. You need to tell apart whether there is a "real" performance problem in your app or you're just seeing what Oracle is doing.
Reading your explanation, doesn't it look like the bottleneck is the redo logs size?
Meaning that ... the "log buffer space" wait is due to the "log file switch completion" wait that can be fixed by increasing the size of the redo logs?
I mean my impression is that while increasing the redolog size we might fix both waits, increasing the log buffer alone will definitely not fix the "log file switch completion" wait.
My reluctancy to believe that the log buffer should be increased was also due to this statement at the link I posted above:
"On most systems, sizing the log buffer larger than 1M does not provide any performance benefit. Increasing the log buffer size does not have any negative implications on performance or recoverability. It merely uses extra memory."
Anyway i understand that this system may not fall in this main category.
Message was edited by:
Reading your explanation, doesn't it look like theI think that I was poor at expressing my opinion. I wasn't telling that the redo buffer size should be increased. Just wanted to say "You need to analyze your app in more detail and determine whether to tune the app to decrease redo data or tune the Oracle parameters/configurations or just let it be. And fortunately, there are couple of ways for you to solve the problem."
bottleneck is the redo logs size?
Meaning that ... the "log buffer space" wait is due
to the "log file switch completion" wait that can be
fixed by increasing the size of the redo logs?
I mean my impression is that while increasing the
redolog size we might fix both waits, increasing
the log buffer alone will definitely not fix the "log
file switch completion" wait.
The wait events in original post are definitely caused by log file switch which was also caused by massive redo generation and small redo log(only when compared with redo volumes)
My reluctancy to believe that the log buffer shouldAbove statement is absolutely true for typical OLTP systems. But these days, no "real" OLTP system exists. Most of them are actually hybrid. Part of OLTP, part of DSS. In these systems, 1M redo buffer "can"(not always) cause redo buffer contetntion, especially when multiple concurrent sessions generate massive redo data. If this is the case(althought it's not very common), we need to increase redo buffer size.
be increased was also due to this statement at the
link I posted above:
"On most systems, sizing the log buffer larger
than 1M does not provide any performance benefit.
Increasing the log buffer size does not have any
negative implications on performance or
recoverability. It merely uses extra memory."
32M is really very tiny in the scheme of things.
As long as it doesn't "hurt" performance to increase it, it would be much quicker to just do it.
Which version of 10g are you running ?
Is this snapshot the standard 1 hour AWR ?
How many CPUs on the system, and what was the CPU usage over the period ?
What were the figures for waits for log file parallel writes ?
For comparative purposes what were the figures for waits for db file parallel writes and db file sequential reads.
Can you post the stats for anything to do with "redo " from the system stats session of the report. Please use the 'code' tag (see FAQ at top right of page) to get the output here in fixed font.
According to these figures, you don't seem to have a problem with the size of the log buffer - you have had a total of 10 waits for buffer space in what I assume is an hour. The fact that 50% of them have timed out (and 40% of the log file switch completion have timed out) suggest that the odd times are the result of an odd glitch on the log switch(es).
Avg %Time Total Wait wait Waits Event Waits -outs Time (s) (ms) /txn ---------------------------- -------------- ------ ----------- ------- --------- log file sync 82,341 1.6 3,899 47 1.2 log file switch completion 286 40.6 159 556 0.0 log buffer space 10 50.0 6 625 0.0
The apparent persistently slow time for log file syncs is more likely to be indicative of a real problem - and one of the possible causes of log file sync waits is actually having a log buffer that is too big (though, from your comment to date I doubt if that is the issue in this case).
If you take out the effect of the log file sync timeouts, your average log file sync time is about 3 c/s. (Timeout is 1 second, 1.6% of 82,341 is 1,300 so the non-timout time is 2,599 seconds for the remaining 81,000 waits - which is 31 milliseconds - which could just be slow disc response time). The timeouts could be part of the same glitches that appear in the other two stats.
If your log file sync average is much higher than the log file parallel write average then it is possible that the log file sync time is due to excessive CPU usage and delays in returning to the top of the run queue. If the averages are similar, you probably have a problem with slow write times.
In any case, we need more statistics to cross-check before we can come to any sensible conclusion about what your figures mean.
P.S. it would be useful to look at v$event_histogram for these events to see if it shows an odd scatter pattern - it may be that many of the waits for a particular event were very short, but a few were very long, distorting the average. Unfortunately AWR does not capture this information so, unless you are also running statspack, you only have the figures since instance startup.
Could you please substantiate this claim by a recommendation from Tom Kyte on http://asktom.oracle.com, or else withdraw this advice?
Senior Oracle DBA
You can find out for yourself by doing a test-case.
I already know from experience your advice is ridiculous.
By your refusal to substantiate it, I assume you can't substantiate it.
However, there is enough advice to the contrary.
Quoting from asktom
so, tell me how a larger redo log buffer REDUCES the amount of redo written. Article says:
.. Larger LOG_BUFFER values reduce log file I/O ...
maybe, in some extreme cases, larger log buffer values can reduce the NUMBER OF TIMES lgwr writes to the log files, but it won't reduce the AGGREGATE AMOUNT of log data written really.
Senior Oracle DBA
Doesn't disprove anything I've written. Never claimed it reduces the amount of log data written.
You are confusing writes to log buffer with log file I/O.