This discussion is archived
1 2 3 Previous Next 38 Replies Latest reply: Jul 18, 2007 1:56 AM by 16279 RSS

log_buffer

user511621 Newbie
Currently Being Moderated
hi

in 9i we can modify shared pool size and db cache size

but can we alter log_buffer parameter... anyway???
  • 1. Re: log_buffer
    The Human Fly Oracle ACE Director
    Currently Being Moderated
    You can't modify the log_buffer parameter online unlike the other two you mentioned. Any change on this required immediate shutdown and startup.

    If you are using spfile, do the following:


    alter system set log_buffer=<size> scope=spfile;
    shutdown immediate
    startup

    If you are using pfile:

    Change the value of log_buffer in the pfile.
    shutdown immediate
    startup

    Jaffar
  • 2. Re: log_buffer
    108476 Journeyer
    Currently Being Moderated
    Hi,

    Also, note the rules for log_buffer sizing:

    http://www.dba-oracle.com/t_log_buffer_optimal_size.htm
  • 3. Re: log_buffer
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    Burleson,

    I note that you've posted a URL to that article on your website twice in the last few days. May I suggest that you don't publish the URL in this forum again until you have read and corrected the article.

    In paragraph one, you claim:

    Oracle documentation recommends that the maximum size for the log_buffer parameter be either 500k, or (128K * cpu_count) whichever is greater.

    You are simply restating the formula for the default value for this parameter as given in the 8.1 reference manual. From the 8.1.6. Database Reference, under parameter log_buffer:

    Default value: Operating system specific. Maximum: 500K or 128K * CPU_COUNT, whichever is greater

    The fact that Oracle has a default does not make that default the "maximum recommended".

    In the same paragraph you also state

    The number of CPUs is important to the value of log_buffer, because multiple log writer (LGWR) processes may be spawned by Oracle to asynchronously offload the redo information.

    This is incorrect, there is only ever one LGWR process. Here's some output from a 9.2.0.8 instance, started on a machine with 64 CPUs, with log_parallelism set to 4, and no setting for log_buffer:
    select  name, value
    from    v$parameter
    where   name in ('cpu_count','log_parallelism','log_buffer')
    ;
    NAME            VALUE
    --------------- ---------------
    cpu_count       64
    log_buffer      8388608
    log_parallelism 4
    select name, description 
    from V$bgprocess
    where paddr !='00'
    ;
    NAME            DESCRIPTION
    --------------- ------------------------
    PMON            process cleanup
    DBW0            db writer process 0
    DBW1            db writer process 1
    DBW2            db writer process 2
    DBW3            db writer process 3
    DBW4            db writer process 4
    DBW5            db writer process 5
    DBW6            db writer process 6
    DBW7            db writer process 7
    ARC0            Archival Process 0
    ARC1            Archival Process 1
    LGWR            Redo etc.
    CKPT            checkpoint
    SMON            System Monitor Process
    RECO            distributed recovery
    Note - only one log writer, despite invoking explicit parallelism in logging, and despite the significant number of CPUs.

    In paragraph 4 you state: "Even though Oracle does not recommend a log_buffer greater than one meg, ...". This is despite the fact that the previous two paragraphs reference and quote a Metalink Document that (a) points out that 10MB is a reasonable value for the log_buffer (... for Oracle Applications 11i) and (b) explains why this is a reasonable suggestion.

    Paragraph 5 says: If you have waits associated to log_buffer size “db log sync wait”, try increasing log_buffer to a value over 1 megabyte.. But there is no such wait; the closest is 'log file sync', and waits for this event may indicate that your log buffer is too big. The wait that suggests you need to increase the log buffer is "log buffer space".

    A few lines further on you state that it is a 'common mistake' that: "The log_buffer is not a multiple of the db_block_size. In Oracle9i with multiple block sizes, the log_buffer should be a multiple of 2048 bytes." This is incorrect - the log buffer has to be a multiple of the redo log block size - typically 512 bytes, though some O/S may be configured to 1KB or 2KB unit sizes: and the redo block size follows the unit I/O size of the O/S. Here's some output from a 9i database using multiple (data)block sizes. The value for the log_buffer size here is not a multiple of 2KB
    SQL> show parameters log_buffer
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- --------
    log_buffer                           integer     524800
    You have a section of an AWR report (from a 10.1 database, by the look of it) which you say shows a database with an undersized log buffer where the dba did not set the log_buffer parameter. The figures quoted are as follows:
                                                                       Avg
                                                         Total Wait   wait    Waits
    Event                               Waits   Timeouts   Time (s)   (ms)     /txn
    ---------------------------- ------------ ---------- ---------- ------ --------
    log file sequential read            4,275          0        229     54      0.0
    log buffer space                       12          0          3    235      0.0
     
    Top 5 Timed Events
    ~~~~~~~~~~~~~~~~~~                                                     % Total
    Event                                               Waits    Time (s) Ela Time
    -------------------------------------------- ------------ ----------- --------
    CPU time                                                      163,182    88.23
    db file sequential read                         1,541,854       8,551     4.62
    log file sync                                   1,824,469       8,402     4.54
    log file parallel write                         1,810,628       2,413     1.30
    SQL*Net more data to client                    15,421,202         687      .37
    The time lost on log buffer space waits is 3 seconds in a total of 12 waits. Compare this with 1.8Million log file parallel write waits, and log file sync waits. Note - A single transaction (prior to 10.2) has to create a minimum of two redo records, which means a minimum of two log buffer space allocations - so we're seeing 12 waits out of 3.6M allocations.

    The most remarkable thing about this database is that the CPU usage is such a high fraction of the total database time - perhaps that's the result of very poor SQL being protected by throwing excess memory at the problem. If you want to make this database go faster don't worry about the log buffer, find out where the CPU is going.

    While talking about sample AWR reports - let's take a look at the stats in the other 'log buffer size' article that you reference in this article. The second article states:

    Today, most large database use a log buffer between 5 meg to 10 meg. Here is an example of an Oracle 10g database [ed: 10.1 again] with an undersized log buffer, in this example 512k [ed: which means 1 to 4 CPUs]:
    Top 5 Timed Events
    ~~~~~~~~~~~~~~~~~~                                        % Total
    Event                                 Waits    Time (s)   DB Time     Wait Class
    ------------------------------ ------------ ----------- --------- -------------
    log file parallel write               9,670         291     55.67 System I/O
    log file sync                         9,293         278     53.12 Commit
    CPU time                                            225     43.12
    db file parallel write                4,922         201     38.53 System I/O
    control file parallel write           1,282          65     12.42 System I/O
    But there are no log buffer space waits in this list at all. There are log file parallel write waits, and log file sync waits - but they aren't a symptom of an undersized log buffer.

    If you want to know the problem with this database, work out the average wait times - the log writes are 30 milliseconds, the db file writes are 40 milliseconds. the control file writes are 50 milliseconds.

    If you have a performance problem with this system it's probably in the slow I/O subsystem - not in the log buffer. (It's odd that we see no time lost on reads - but perhaps this is another system where a lot of memory has been thrown at the db cache to hide the real problem).


    Coming back to the original article, you have a section on "log buffer related parameters". You claim that one of these is the transactions_per_rollback_segment. Apart from the fact that this is irrelevant from Oracle 9i onwards if you are using automatic undo, it's irrelevant anyway because it has nothing to do with redo ... "rollback" is undo.

    In passing, if you re-read the article you've quoted you will find that that the comments it makes about the maximum value for this parameter were in relation to an 8.1.6 database using 2KB blocks. That's not particularly helpful for some one asking about a 9i database in the year 2007.

    The final, and valuable, bit of your article is the bit you have copied from Steve Adams' website, along with the associated URL.

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk
  • 4. Re: log_buffer
    The Human Fly Oracle ACE Director
    Currently Being Moderated
    >>
    The number of CPUs is important to the value of log_buffer, because multiple log writer (LGWR) processes may be spawned by Oracle to asynchronously offload the redo information.

    “db log sync wait”, try increasing log_buffer to a value over 1 megabyte..
    >>

    There are really blunders.

    I am sure people (who ask questions here) seeks the quality of answers to prevent production issues or overcome from their production database agony, and being an experts, one should really double check the contents before replies publicly.

    Jaffar
  • 5. Re: log_buffer
    108476 Journeyer
    Currently Being Moderated
    Hi Jaffar,
    There are really blunders.
    Yes, I mis-typed “db log sync wait”, please forgive me!

    As to the multiple log writers, that's from Metalink, not me:

    Metalink note 109582.1 says that log I/O slave processes started way-back in Oracle8, and that multiple LGWR processes will only appear as DML activity increases:

    “Starting with Oracle8, I/O slaves are provided.  These slaves can perform asynchronous I/O even if the underlying OS does not support Asynchronous I/O. These slaves can be deployed by DBWR, LGWR, ARCH and the process doing Backup. . . 

    In Oracle8i, the DBWR_IO_SLAVES parameter determines the number of IO slaves for LGWR and ARCH. . .

    As there may not be substantial log writing taking place, only one LGWR IO slave has been started initially.  This may change when the activity increases.”
  • 6. Re: log_buffer
    108476 Journeyer
    Currently Being Moderated
    Lewis,
    May I suggest that you don't publish the URL in this forum again until you have read and corrected the article.
    May I suggest that you stop publishing “incomplete” proofs?

    In this case, you offer-up a questionable proof, never mentioning the all-important lgwrio_slaves parameter, nor the Metalink statement that multiple LGWR processes only appear as activity increases.

    Jonathan, your “incomplete” test “proofs” can be deceptive, especially to beginners, and I’ve documented just a few of them here:

    http://www.dba-oracle.com/t_biased_test_cases.htm
    The final, and valuable, bit of your article is the bit you have copied from Steve Adams' website, along with the associated URL
    No Jonathan, it’s the Metalink references that are authoritative, not Steve (although he has great info, it’s getting a tad out-of-date).

    Oh Jonathan, I like your snarky word-trick, saying the “bit that you have copied” instead of “citing a reference”, like everyone else does.

    Lewis, you have “copied” loads of content onto your domain from my web sites, but, unlike me, you don’t bother documenting them with hyperlinks.

    Why is that?

    **************************************

    PLEASE NOTE THAT IN THIS RESPONSE I CITE ONLY JONATHAN LEWIS AND METALINK.

    If anybody wants to hurl gratuitous insults, direct them at the cited sources, not me!

    OK, let’s start with your most glaring inconsistency.

    MYTH? Experts can suggest the root cause of a performance problem with only a short description and the top-5 timed events
    ************************************************

    Lewis claims to be able to diagnose a problem from one of my clients, using only the data from the top-5 timed events: He stated:

    “If you have a performance problem with this system it's probably in the slow I/O subsystem - not in the log buffer.”

    Unless Lewis is the charter member of the “Oracle Psychic Friends Network, it’s irresponsible to make such a bold assertion with such little data. (in the instance example, his “guess” is completely incorrect, but that’s not surprising, given the tiny amount of data available to him).

    But what’s confusion is the fact that Lewis admits that it’s “not sensible” to draw conclusions using only the top-5 timed events (unless, of course, it’s to try to insult me!). But don’t take my word for it, see Lewis’ own words here:

    http://www.jlcomp.demon.co.uk/statspack_01.html

    “Even though the [top-5 timed events] numbers look unusual, the first thing you should notice is that I haven’t supplied enough information for you to make a sensible decision. .

    As this example shows, not only is it possible for the top (or even top 2) of the Top 5 report to be totally misleading; sometimes you may even have to think about that’s implied by the summary, but not in the summary. Be careful when you use Statspack – it’s very easy to look at the headline numbers and be totally fooled.”

    OK, let’s continue with your contradictions to Metalink about multiple LGWR processes.

    MYTH? Multiple LGWR processes
    *************************************************

    Lewis claims: “Note - only one log writer, despite invoking explicit parallelism in logging, and despite the significant number of CPUs.”

    Here we go again, another incomplete proof that contradicts both the Oracle documentation and Metalink. You failed to mention the hidden parameter lgwrio_slaves and the Metalink note that clearly states that multiple LGWR processes will only appear under high activity. The Oracle docs are very clear on this:

    “Prior to Oracle8i you could configure multiple log writers using the LGWR_IO_SLAVES parameter.”

    In Oracle10g it becomes a hidden parameter (_lgwr_io_slaves). Metalink note 109582.1 says:

    https://metalink.oracle.com/metalink/plsql/f?p=130:14:1634364682833660660::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,109582.1,1,1,1,helvetica

    “Starting with Oracle8, I/O slaves are provided. These slaves can perform asynchronous I/O even if the underlying OS does not support Asynchronous I/O. These slaves can be deployed by DBWR, LGWR, ARCH and the process doing Backup. . .

    In Oracle8i, the DBWR_IO_SLAVES parameter determines the number of IO slaves for LGWR and ARCH. . .

    As there may not be substantial log writing taking place, only one LGWR IO slave has been started initially. This may change when the activity increases.”

    Lewis, is Metalink wrong? If so, please clear-up this confusion!

    MYTH?: There is no relationship between log_buffer size and CPU count.
    *********************************************************************************

    Lewis wrote: “You are simply restating the formula for the default value for this parameter as given in the 8.1 reference manual.”

    The obsolete Oracle8 docs note that the value for the parameter log_simultaneous_copies is dependent on the number of CPU’s on the server:

    “On multiple-CPU computers, multiple redo copy latches allow multiple processes to copy entries to the redo log buffer concurrently. The default value of LOG_SIMULTANEOUS_COPIES is the number of CPUs available to your Oracle instance”

    Starting in Oracle8i, it’s a hidden parameter (_log_simultaneous_copies). From Metalink note 147471.1, we see that the default is set to cpu_count * 2.

    Also, Metalink note 147471.1 “Tuning the Redo log Buffer Cache and Resolving Redo Latch Contention”, notes that multiple redo allocation latches become possible by setting the parm logparallelism, and that the log buffer is split in multiple LOG_PARALLELISM areas that each have a size of init.ora LOG_BUFFER. Further, it shows the relationship to the number of CPU’s:

    https://metalink.oracle.com/metalink/plsql/f?p=130:14:1634364682833660660::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,147471.1,1,1,1,helvetica

    “The number of redo allocation latches is determined by init.ora LOG_PARALLELISM. The redo allocation latch allocates space in the log buffer cache for each transaction entry. If transactions are small, or if there is only one CPU on the server, then the redo allocation latch also copies the transaction data into the log buffer cache.”

    We also see that log file parallel writes are related to the number of CPU’s. Metalink note 34583.1 “WAITEVENT: "log file parallel write" Reference Note”, shows that the log_buffer size is related to parallel writes (i.e. the number of CPU’s), and discusses how LGWR must wait until all parallel writes are complete. It notes that solutions to high “log file parallel write” waits are directly related to I/O speed, recommending that redo log members be on high-speed disk, and that redo logs be segregated onto

    “on disks with little/no IO activity from other sources.
    (including low activity from other sources against the same disk controller)”.

    This is a strong argument for using super-fast solid-state disk.


    MYTH? – The log_buffer should remain small.
    ***********************************************************

    This is perpetuated with several Metalink notes that have become somewhat obsolete:

    “In a busy system, a value 65536 or higher is reasonable [for log_buffer].

    “It has been noted previously that values larger than 5M may not make a difference.”

    Metalink notes that in 10gr2, we see a case where a customer cannot reduce the log_buffer size from 16 meg:

    https://metalink.oracle.com/metalink/plsql/f?p=130:15:1634364682833660660::::p15_database_id,p15_docid,p15_show_header,p15_show_help,p15_black_frame,p15_font:BUG,4930608,1,1,1,helvetica

    “In 10G R2, Oracle combines fixed SGA area and redo buffer [log buffer] together. If there is a free space after Oracle puts the combined buffers into a granule, that space is added to the redo buffer. Thus you see redo buffer has more space as expected. This is an expected behavior.. .

    In 10.2 the log buffer is rounded up to use the rest of the granule. The granule size can be found from the hidden parameter "_ksmg_granule_size" and in your case is probably 16Mb. The calculation for the granule size is a little convoluted but it depends on the number of datafiles”

    MYTH? – Disk I/O speed is not the primary concern for tuning the log_buffer
    **************************************************************************************
    Below, Jonathan Lewis confirms that disk I/O speed is critical to redo throughput:

    http://www.jlcomp.demon.co.uk/not_proof.html

    http://www.jlcomp.demon.co.uk/ops_01.html

    “Your main worry is how fast the I/O subsystem might be, and what happens with the log buffer and log files.”

    “Each instance has only one redo log buffer, and the ability to generate redo can ultimately be what defines the maximum speed of your database.”

    TO THOSE WHO LIKE TO HURL INSULTS, PLEASE NOTE THAT I HAVE CONTRIBUTED NOTHING PERSONAL, CITING ONLY JONATHAN LEWIS, THE ORACLE DOCUMENTATION AND METALINK!
  • 7. Re: log_buffer
    The Human Fly Oracle ACE Director
    Currently Being Moderated
    >>
    Yes, I mis-typed “db log sync wait”, please forgive me!
    As to the multiple log writers, that's from Metalink, not me:
    >>
    Burleson, I didn't mean that.

    Many of DBA, including myself, here expect great and quality of information from the experts.

    However, thanks for your clarification.
  • 8. Re: log_buffer
    108476 Journeyer
    Currently Being Moderated
    Hi Jaffar,
    Many of DBA, including myself, here expect great and quality of information from the experts.
    That's great, but the core issue is that Oracle technology changes so fast that what was true yesterday will likely be un-true tomorrow!

    For example, correct information from ancient OTN content comes-up on Google, which was perfectly fine when it was published, but it's now incorrect.

    You cannot nuke it, because there are still Oracle shops using Oracle7, where it's good advice!

    How can anyone offer-up advise that's of perfect quality, when there are so many "running" releases . . .

    It's a formidable challenge for anyone, expert or not . . .
  • 9. Re: log_buffer
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    Burleson,.

    Was "db log sync wait" a mis-type for "log buffer space wait" or "log file sync wait" ? There's a significant difference in name, cause, and treatment.

    Your quote from Metalink 109582.1 in response to The Human Fly says: "multiple LGWR [i]processes will only appear as DML activity increases".

    The text you seem to be paraphrasing appears in the Metalink note as: "As there may not be substantial log writing taking place, only one LGWR [i]IO slave has been started initially. This may change when the activity increases."

    Note that that's "IO Slaves" - not "processes", there's a very big difference.


    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk
  • 10. Re: log_buffer
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    Burleson - extracts from your posting are in italic script:

    "Never mentioning the all-important parameter _lgwr_io_slaves nor the Meatlink note ..."

    I don't mention that parameter because (a) it's hidden, and has been hidden since 8i (long before 10g as you seem to suggest later in your note) and (b) even when you set it to a non-zero value you still don't get multiple LGWR processes.

    The Metalink note doesn't say that you get multiple LGWR processes, it says: "Starting with Oracle8, I/O slaves are provided. These slaves can perform asynchronous I/O even if the underlying OS does not support Asynchronous I/O. These slaves can be deployed by DBWR, LGWR, ARCH and the process doing Backup"

    So the I/O slaves are emulating async I/O, they are not assuming the functions of LGWR (or DBWR etc.). You still get just one process responsible for destaging the log buffer and reporting log write completion back to the user sessions; but now LGWR is handing off to I/O slaves and waiting for them to complete rather than handing off to underlying disk system - there's a very important distinction there. Moreover, it is probably fairly common knowledge that the order of preference for configuring the options for database writer mechanisms is:
    * async I/O at machine level
    * multiple db writers
    * IO slaves
    which probably means there aren't many systems using IO slaves. And, just for the record, you can't mix multiple db writers with IO slaves.


    By the way, your article says: "The number of CPUs is important to the value of log_buffer, because multiple log writer (LGWR) processes may be spawned by Oracle to asynchronously offload the redo information."

    How does that use of "because" work? Paraphrasing the Metalink note: if you set dbwr_io_slaves to a non-zero value, then lgwrio_slave defaults to 4, which allows up to 4 extra I/O slaves to be started for LGWR to hand off its i/o load. That statement has no logical connection to the fact that the number of CPUs affects the value of log_buffer.


    "I like your snarky word trick"
    You mean the trick of making a true statement - but expressing it in a way that might make people think it means more than it does ? I learned that one from someone on the Internet - I can't think who it was right now, but I'm sure it will come back to me.


    "The Metalink references"
    So what do you do when you find two Metalink references that contradict each other ?

    you have "copied" loads of content onto your domain from my web sites
    What's the difference between "copied" with and without quotes ? I've certainly copied a few bits from your website from time to time - but only to point out the mistakes. I didn't think the volume had got up to "loads" yet. Since you've clearly been tracking the volume better than I have, give me a list and I'll add the appropriate URL - and include an apology for failing to include it in the first place.


    "it’s irresponsible to make such a bold assertion with such little data. (in the instance example, his “guess” is completely incorrect, but that’s not surprising, given the tiny amount of data available to him)."

    But you presented the data with the lead in: "Here is an an example of an Oracle 10g database with an undersized log buffer, in this example 512k:" in one case and "Here is a AWR report showing a database with an undersized log_buffer, in this case where the DBA did not set the log_buffer parameter in their init.ora file:" in the other.
    Apparently you thought the data you published was sufficient to demonstrate to the rest of the world the symptoms of a log buffer that was too small. Are you now saying that the data wasn't intended to show people what the symptoms of a small log buffer would show up as in their 'Top 5 timed events' ? If so, why print the data at all ? If they're not truely symptomatic of anything why not introduce them with a comment like: "Here is a completely irrelevant set of figures from a database which (I promise you, despite apparent indications to the contrary) has a log buffer which is too small ?"


    But what’s confusing is the fact that Lewis admits that it’s “not sensible” to draw conclusions using only the top-5 timed events (unless, of course, it’s to try to insult me!). But don’t take my word for it, see Lewis’ own words here:

    http://www.jlcomp.demon.co.uk/statspack_01.html

    “Even though the [top-5 timed events] numbers look unusual, the first thing you should notice is that I haven’t supplied enough information for you to make a sensible decision. .


    The error in the argument here is known as "arguing from the particular to the general". For example, if I say: "It's a lovely sunny day here and the sky is blue", you would note be wise to report this as "Lewis says it never rains in the UK".

    By the way, in one case you are ignoring my use of the word "probably": “If you have a performance problem with this system it's probably in the slow I/O subsystem - not in the log buffer.”. It's almost as if you're trying to put together a few true statements in a way that is intended to lead to the worst possible interpretation. I can argue the case that slow writes by LGWR can lead to log buffer space waits - not to mention log file syncs - and you have log writes of 30 milliseconds. It's a pity that you only included the log buffer space waits in one of the data sets and didn't publish some "before and after" figures for the two cases, ; I'd be interested to see what improvements in I/O times and reduction in use of CPU you got from increasing the log buffer size

    "Here we go again, another incomplete proof that contradicts..."
    You're repeating yourself - you made this claim further up the page. It's the same
    counter-argument to the same demonstration. Repeating something doesn't make it true.


    "MYTH?: There is no relationship between log_buffer size and CPU count."
    It took me a moment to understand why you wanted to include this as a heading when I had quoted the Oracle manual for the default value Operating system specific. Maximum: 500K or 128K * CPU_COUNT, whichever is greater

    But of course you're trying to avoid the discussion of multiple LGWR processes by employing a logical fallacy. Let me demonstrate this by analogy. Your published statement starts. "The number of CPU is important to the log_buffer size, because ..." compare this with the following statement: "The pope is a catholic because it always rains in the UK."
    * The pope is a catholic - true
    * It always rains in the UK - not true
    * The pope is a catholic because it always rains in the UK - not true.
    At this point in your posting, you are trying to prove that you must be in the right by arguing about the pope and ignoring the weather.

    "The obsolete Oracle8 docs note that the value for the parameter log_simultaneous_copies is dependent on the number of CPU’s on the server"

    An attempt to direct attention away from the basic criticism

    Why does the value of log_simultaneous_copies have any relevance to the argument, the discussion is about multiple LGWR processes, that parameter is about the number of redo allocation latches used to allow processes to acquire space in the log buffer.


    "Metalink note 34583.1 “WAITEVENT: "log file parallel write" Reference Note”, shows that the log_buffer size is related to parallel writes (i.e. the number of CPU’s)"

    I don't see anything in that note about the log buffer size, or the number of CPUs. Moreover, the behaviour of "log file parallel writes" does not depend on the number of CPUs, it depends on the capacity for some form of asynchronous I/O. And what has "super-fast solid-state disk" got to do with how many LGWR processes exist ? Again this is just obfuscation to direct attention from the basic questions.


    "MYTH? – The log_buffer should remain small."
    Another attempt to direct attention away from the basic criticism. The default size for the ... no, wait, I've already said that - it's still true - I don't need to say it again. But note that the default size for the log buffer isn't "small".

    "This is perpetuated with several Metalink notes that have become somewhat obsolete"
    But you said that the Metalink references are authoritative - so why are you
    allowed to decided which ones are really authoritative and which ones are obsolete ?


    "[Quoting from Metalink] In 10.2 the log buffer is rounded up to use the rest of the granule. The granule size can be found from the hidden parameter "_ksmg_granule_size" and in your case is probably 16Mb"
    I know that Metalink is "authoritative" when it suits you, but the author of this one doesn't seem to know (or, perhaps, doesn't want to say) much about In Memory Undo, private redo threads, and the possibility of multiple shared redo threads. There's a lot more to it than just the granule size. But, as I said in my original note: That's not particularly helpful for some one asking about a 9i database. And what's it got to do with the basic criticisms anyway ?


    "MYTH? – Disk I/O speed is not the primary concern for tuning the log_buffer"
    Yet another attempt to direct attention away from the basic criticism. But what a strange creation. (Does it make you think, by the way, that my comment about the slow I/Os from one of the AWR reports is on the mark ?)


    And finally:
    Don't you have anything to say about my comments on the bit you copied (I said "quoted" last time - so I thought I'd take the opportunity to say "copied" this time) from Osamu Kobayashi about transactions_per_rollback_segment. And what about my demonstration that the log buffer does not have to be a multiple of 2048 if you are using multiple block sizes in 9i ?


    So let's summarize:
    * Oracle doesn't recommend a maximum of 500KB or 128KB * CPU - that's just the default
    * Oracle has recommended fairly large log buffers in the past - but even the 10g reference manual makes an inane comment about "In a busy system, a value 65536 or higher is reasonable."
    * You don't get multiple LGWR processes
    * You can get IO slave processes to assist LGWR, but only if you have configured your system to run with dbwr_io_slaves, or fiddle with a hidden parameter.
    * transaction_per_rollback_segment is irrelevant as far as redo is concerned.
    * the log buffer prior to 10.2 had to be a multiple of the redo block size, which was based on the O/S unit I/O size.
    * Metalink is not "authoritative" - and if it contradicts itself it's a good idea to test its claims; and even when it hasn't contradicted itself it may be incomplete or wrong.

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk


    Footnote: In the past, other members of the Oracle community have taken my technical criticisms of material written by Don Burleson as an excuse to pass rude, personal, and non-technical remarks about him. Accurate technical criticism is something to be encouraged. Abuse should not be tolerated.
  • 11. Re: log_buffer
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    >
    How can anyone offer-up advise that's of perfect
    quality, when there are so many "running" releases .
    Burleson,

    Datestamp everything you write.
    Try to mention the version you are talking about
    Don't tell people to read it until you have glanced through it to see that it is appropriate and likely to be relevaent - especially if the datestamp is a couple of years old or the version is incompatible.

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk
  • 12. Re: log_buffer
    108476 Journeyer
    Currently Being Moderated
    Hi Jonathan,
    Accurate technical criticism is something to be encouraged. Abuse should not be tolerated.
    Thanks, Dude, I appreciate that . . . .
    Datestamp everything you write.
    Good idea, I need to find a way to automate that! But it still does not completely address the issue of people running older releases . . .
    Try to mention the version you are talking about
    Yes, that a good idea. But what about explaining concepts to raw beginners? Nothing confuses a beginner like exception-after-exception, and it often clouds the general concepts!
    Don't tell people to read it until you have glanced through it to see that it is appropriate and likely to be relevaent - especially if the datestamp is a couple of years old or the version is incompatible.
    Ah, therein lies the rub! I don't know about the UK, but I still see people using Oracle7 in the USA (they say "If it ain't broke, don't fix it", but I suspect that the real reason is cheap shops trying to save on maintenance costs).

    It's only now that many shops here are FINALLY upgrading from Oracle8i. . . .
    Metalink is not authoritative
    Well, yes, but Oracle Corporation may not agree, and I'd have a whale of a time trying to convince an IT manager who spends $40k/year on metal-level support.

    Is there a way to petition to improve and clarify their content?

    Also, I want to commend you on the super-job you did on your STATSPACK case study (STATSPACK distractions), showing that you often have to look-past the top-5 timed events. It would be great to see more real-world case studies from you, great stuff:

    http://jonathanlewis.wordpress.com/statspack-distractions/

    As to your big response, please give me 24 hours to digest it.

    Message was edited by:
    burleson
  • 13. Re: log_buffer
    108476 Journeyer
    Currently Being Moderated
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">OK, let’s start
    with your Metalink assertion:</span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; The Metalink note doesn't say that you get multiple LGWR processes</span></p>
    <p class="MsoNormal"><font size="2"><span style="font-family: Arial">OK, educate
    me.  If they are not multiple LGW(n) processes, then how do the</span></font><span style="font-size:10.0pt;font-family:Arial;color:black">
    “I/O slaves for LGWR” appear in a &quot;ps -ef&quot; command?</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">
    <br>
    Metalink note 109582.1 (dated 2002), says: “These [slave] processes are
    allocated whenever the foreground process(DBWR, LGWR) request them. The slaves
    for ARCH and Backup exit automatically after an idle time of 60 seconds. I/O
    slaves for DBWR and LGWR do not exit on their own.”</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Also, the Oracle
    documentation says: ““Prior to Oracle8i you could configure multiple log writers
    using the LGWR_IO_SLAVES parameter.”</span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; if you set dbwr_io_slaves to a non-zero value, then lgwrio_slave defaults
    to 4, which allows up to 4 extra I/O slaves to be started for LGWR to hand off
    its i/o load. That statement has no logical connection to the fact that the
    number of CPUs affects the value of log_buffer.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Isn’t it the same
    reason that OPQ performance is related to cpu_count?  Any multiple processes
    will benefit from SMP servers . . .</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><i>&gt;&gt; And, just for
    the record, you can't mix multiple db writers with IO slaves</i></span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Can you elaborate
    on this, please?</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><i>&gt;&gt; I learned
    that one from someone on the Internet - I can't think who it was right now, but
    I'm sure it will come back to me.</i><br>
    <br>
    Touché!  . . . </span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><i>&gt;&gt; and include
    an apology for failing to include it in the first place.</i><br>
    <br>
    Thanks, but will you use the nofollow tag?  Cmon, we both know why you did not
    hyperlink to my quotes.  An apology is for when you make a mistake, and this was
    no mistake!  You chose not to link to my quotes because you don’t want the
    search engines to increase my search rank.</span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; Apparently you thought the data you published was sufficient to demonstrate
    to the rest of the world the symptoms of a log buffer that was too small.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Stop putting words
    in my mouth.  I said no such thing.  It’s “apparent” only that I shared my
    real-world observations from databases where increasing the log_buffer improved
    performance, nothing more.</span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; Are you now saying that the data wasn't intended to show people what the
    symptoms of a small log buffer would show up as in their 'Top 5 timed events' ?</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Not at all.  I’m an
    empiricist.  When I see a clear-cut case of a too-small log_buffer, I try to
    publish the symptoms.  Are they conclusive?  No.  I never said that they were. 
    But that does not mean that empirical observations are totally irrelevant,
    either!</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><i>&gt;&gt; So what do
    you do when you find two Metalink references that contradict each other? </i>
    <br>
    <br>
    For a “real” database problem, I log an SR and get a definitive answer from one
    of the Oracle software engineers who wrote the code.  Why guess?  Why doodle
    with test scripts when you can talk to the folks who wrote the source code?</span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; The error in the argument here is known as &quot;arguing from the particular to
    the general&quot;.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Ah, but isn’t that
    want you attempted in your test case?  It only proved that multiple LGWR
    processes didn’t exist for that particular release, for that particular
    configuration, under that specific user load.  It does not generalize to the
    universe of possible scenarios, right?</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"> </span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; By the way, in one case you are ignoring my use of the word &quot;probably&quot;: <b>
    “If you have a performance problem with this system it's probably in the slow
    I/O subsystem - not in the log buffer.”</b>.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">No, I only point it
    out because it’s the same type of “speculation” that you have sharply criticized
    others for making.  As I’ve watched your knowledge of tuning mature, you are
    starting to rely on your “intuition”, which is a good thing.  Like you, I’ve
    tuned over a thousand databases, and while no human can remember the exact
    nature of every problem, we get “hunches”, unverifiable feelings, which are
    usually right-on.  Human intuition manifests itself as “I’ve just got a feeling
    that this is what’s going on”, but it’s not really just a felling.  In
    actuality, it’s a long-forgotten system buried in your subconscious that you
    cannot fully remember.  Speculating is a sign or a true expert, and it will only
    get better as you get more experience!</span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; But of course you're trying to avoid the discussion of multiple LGWR
    processes by employing a logical fallacy.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">No, let’s discuss
    it!  Prove to me that Metalink is wrong.  The problem is that test cases are
    wonderful for proving that something is true, but not so hot at proving that
    something does not exist!</span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; Moreover, the behaviour [sic] of &quot;log file parallel writes&quot; does not depend
    on the number of CPUs, it depends on the capacity for <b>some form of</b>
    asynchronous I/O.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Which, in turn, is
    facilitated by multiple processors.  You could make the exact same argument
    about OPQ . . .</span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; no, wait, I've already said that - it's still true - I don't need to say it
    again.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">There is nothing
    wrong with repeating yourself for emphasis, sometimes it helps . . . </span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; I know that Metalink is &quot;authoritative&quot; when it suits you, but the author of
    this one doesn't seem to know (or, perhaps, doesn't want to say) much about In
    Memory Undo.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">OK, let’s address
    the Metalink credibility issue.  Traditionally, Metalink is the last-word on
    Oracle, often written by the software engineers who wrote the source code. 
    Metalink says what patches we must apply, what configurations are unsupported,
    and what we may, or may not do.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Does Metalink have
    uniform quality?  No.  But it’s a whole lot easier to learn how Oracle works
    from the people who wrote it, than speculating with test cases, IMHO.</span></p>
    <p class="MsoNormal">
    <span style="font-size: 10.0pt; font-family: Arial; color: black; font-style: italic">
    &gt;&gt; And what about my demonstration that the log buffer does not have to be a
    multiple of 2048 if you are using multiple block sizes in 9i?</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Do you feel
    comfortable generalizing that your “demonstration” applies to all Oracle
    databases?</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black; font-weight:700">SUMMARY</span></p>
    <p class="MsoNormal">
    <font size="2"><span style="font-family: Arial; font-weight: 700">************</span></font></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><i>&gt;&gt; * Oracle <b>
    doesn't</b> recommend a maximum of 500KB or 128KB * CPU - that's just the
    default</i>.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Yes, but the
    default values are often set according to rules of optimal performance.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><br>
    <i>&gt;&gt; * Oracle <b>has recommended</b> fairly large log buffers in the past - but
    even the 10g reference manual makes an inane comment about &quot;In a busy system, a
    value 65536 or higher is reasonable.&quot;</i></span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Yes, and that
    contradicts the Oracle Apps recommendation of 10 meg.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><br>
    <i>&gt;&gt; * You don't get multiple LGWR processe<b>s</b></i></span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">This strikes me as
    a tad nitpicky.  You admit that LGWR has factotums, and whether no not these
    slave processes appear as multiple LGWR processes was never my point.  Educate
    me, Jonathan.  The people who wrote Oracle say “As there may not be substantial
    log writing taking place, only one LGWR IO slave has been started initially. 
    This may change when the activity increases.”.  In a non-thread OS model (Linux,
    UNIX), where would I see these processes in a “ps –ef|grep ORA|grep –v grep”? 
    If they are not called LGW(n), then what are the LGWR slaved called?</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><br>
    <i>&gt;&gt; * You can get IO slave processes to assist LGWR, but only if you have
    configured your system to run with dbwr_io_slaves, or fiddle with a hidden
    parameter.</i></span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Agreed.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><br>
    <i>&gt;&gt; * transaction_per_rollback_segment is irrelevant as far as redo is
    concerned.</i></span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">You mean
    transactions_per_rollback_segment, right? Agreed.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><br>
    <i>&gt;&gt; * the log buffer prior to 10.2 had to be a multiple of the redo block
    size, which was based on the O/S unit I/O size.</i></span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Agreed.  Whatever,
    that’s a nit.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"><br>
    <i>&gt;&gt; * Metalink is not &quot;authoritative&quot; - and if it contradicts itself it's a
    good idea to test its claims; and even when it hasn't contradicted itself it may
    be incomplete or wrong</i></span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">I must strongly
    disagree.  Failure to follow the instructions on Metalink can leave a whole shop
    unsupported, and despite their errors and contradictions, Metalink is the final
    word, directly from the software engineers who wrote the source code.  That’s
    way better than trying to figure-out Oracle’s behavior with a zillion test
    cases, IMHO.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">For example, to
    “prove” that multiple LGWR processes don’t exist you would have to run thousands
    of test cases, on all platforms and configurations, and, since Oracle says that
    multiple LGWR processes only appear with load, you would have a formidable
    challenge conclusively proving this assertion.</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">Let me add one more
    assertion:</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">* After the Oracle
    software and disk I/O sub-system are optimized, there are very limited options
    for improving throughout:</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">- Use faster disks
    (</span><span style="font-size:10.0pt;
    font-family:Arial;color:black">SSD</span><span style="font-size:10.0pt;font-family:Arial;color:black">)</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black">- Turn-off
    ARCHIVELOG mode (using triple mirrored disks and hardware-level incremental
    backups)</span></p>
    <p class="MsoNormal">
    <span style="font-size:10.0pt;font-family:Arial;color:black"> </span></p>
  • 14. Re: log_buffer
    108476 Journeyer
    Currently Being Moderated
    <p>Regarding the multiple log writer processes, I got this, from Steve Karam
    (OCM):<br>
    <br>
    According to Metalink Note 109582.1, you will see multiple log writer slaves as
    ixnn processes. The note is for Oracle 8, but it’s a good description,
    confirming that multiple log writer processes exist. <br>
    <br>
    The representation of I/O Slaves is as follows: ora_ixnn_SID where <br>
    <br>
    i= slave, <br>
    x= adaptor number and <br>
    nn is the slave number within the adaptor.<br>
    <br>
    For example:<br>
    <br>
    ora_i105_mul is a background I/O slave process.<br>
    <br>
    Here:<br>
    <br>
    i stands for a slave<br>
    1 stands for the adaptor number for this slave<br>
    05 is the slave number within the adaptor.<br>
    <br>
    The valid ranges are:<br>
    <br>
    x (adaptor number) -&gt; 1 to 9 and A through F <br>
    nn (Slave number) -&gt; 01..0z, 10..zz (Once slave number 0z is spawned, next <br>
    slaves are 10,11 and so on.)<br>
    <br>
    An adaptor is a pool of memory allocated to a thread and the pool is allocated
    <br>
    a handle to be identified. The adaptor number is the handle returned by the <br>
    internal Oracle code.<br>
    <br>
    There are different classes of adaptors and the pools by these classes are<br>
    totally exclusive to each other. For example, the IO pool for data buffers and
    <br>
    log buffers are different and hence these have separate adaptor numbers.<br>
    <br>
    (Note: It is NOT necessary for the same process to use the same adaptor number
    always. What this implies is that the adaptor numbers are not 'reserved' for
    processes.)<br>
    <br>
    Some of the classes for which adaptors exist are:<br>
    <br>
    <font size="2" face="Courier New">DBWR<br>
    LGWR<br>
    ARCH<br>
    Backup_tape_io<br>
    backup_disk_Io<br>
    <br>
    ...</font><br>
    <br>
    For the other adaptors in 8i:<br>
    -----------------------------<br>
    <br>
    These start on demand. <br>
    <br>
    Thus a typical Unix ps -ef listing after instance startup may look like :<br>
    <br>
    <font face="Courier New" size="1">oracle 27582 1 0.0 11:16:40 ?? 0:44.03
    ora_pmon_mul<br>
    oracle 27584 1 0.0 11:16:40 ?? 0:28.44 ora_ckpt_mul<br>
    oracle 27586 1 0.0 11:16:40 ?? 0:58.65 ora_dbw0_mul<br>
    oracle 27588 1 0.0 11:16:40 ?? 0:05.12 ora_lgwr_mul<br>
    oracle 27593 1 0.0 11:16:41 ?? 0:32.35 ora_smon_mul<br>
    oracle 27594 1 0.0 11:16:41 ?? 0:11.59 ora_arc0_mul<br>
    oracle 27596 1 0.0 11:16:41 ?? 0:00.35 ora_reco_mul<br>
    oracle 27598 1 0.0 11:16:48 ?? 0:17.42 ora_i201_mul<br>
    oracle 28408 1 0.0 11:35:25 ?? 0:06.47 ora_i102_mul<br>
    oracle 28411 1 0.0 11:35:25 ?? 0:06.59 ora_i101_mul<br>
    oracle 28412 1 0.0 11:35:25 ?? 0:06.10 ora_i103_mul<br>
    oracle 28414 1 0.0 11:35:25 ?? 0:06.03 ora_i104_mul<br>
    oracle 28416 1 0.0 11:35:25 ?? 0:06.14 ora_i105_mul<br>
    oracle 28420 1 0.0 11:35:25 ?? 0:06.01 ora_i106_mul</font><br>
    <br>
    Here DBWR_IO_SLAVES=6.<br>
    <br>
    The processes ora_i101 to ora_i106 are DBWR IO slaves and ora_i201 <br>
    is the I/O slave for LGWR. <br>
    <br>
    Note: <br>
    <br>
    1. As there may not be substantial log writing taking place, only one LGWR IO<br>
    slave has been started initially. This may change when the activity increases.<br>
    <br>
    2. The DBWR IO slaves may also not be immediately started with the instance, as
    is clear from the timestamps of the processes.<br>
     </p>
1 2 3 Previous Next