1 2 Previous Next 28 Replies Latest reply: May 6, 2008 2:41 AM by Jonathan Lewis RSS

    system statistics not being used

    user92272
      I collected fresh system statistics for a new database load. I see its not being used by the database even though aux_stats shows stats.
        • 1. Re: system statistics not being used
          user92272
          cpu in mhz : 1043
          single block readtime in ms : 7.457
          multiblock readtime in ms : 19.566
          average multiblock readcount: 45
          • 2. Re: system statistics not being used
            108476
            Hi Sasiva,
            I see its not being used by the database
            How can you tell? What are you looking at?

            Do you have a big difference in I/O times for scattered and sequential reads?

            http://www.dba-oracle.com/t_dbms_stats_gather_system_stats.htm

            Prior to Oracle 10g, adjusting powerful optimizer parameters (i.e. optimizer_index_cost_adj) was the only way to compensate for sample size issues with dbms_stats and system stats issues. But as of Oracle 10g, improvements in system statistics collection using dbms_stats.gather_system_stats (to measure sequential vs. scattered disk I/O speed) plus improved sampling within dbms_stats had made adjustments to the optimizer parameters a "worst practice" exercise in most cases. Ceteris Parabus, always adjust CBO statistics before adjusting optimizer parms.

            Sometimes, system stats are not enough, and in rare cases, it is still necessary to adjust optimizer_index_cost_adj, even in 10g:

            http://www.dba-oracle.com/oracle11g/oracle_11g_dbms_stats_enhancements.htm

            --------------------

            Hope this helps. . .

            Donald K. Burleson
            Oracle Press author
            Author of "Oracle Tuning: The Definitive Reference":
            http://www.dba-oracle.com/bp/s_oracle_tuning_book.htm
            • 3. Re: system statistics not being used
              user92272
              Hi Donald,

              Thank you for your response. I am not seeing scattered or sequential reads but heavy load on the Database server with runq at 100%. I gathered system stats two times on May 01 and May 02. On may 01, load on the database came down drastically after I gathered system stats but on the may 02, it not the same case.

              Wait event is Latch: Cache buffer chains.

              Hope this answers your question.

              Regards,
              Siva Devulapalli
              • 4. Re: system statistics not being used
                108476
                Hi Siva,
                Database server with runq at 100%
                The runqueue is an integer, not a percent! Are you using vmstat?

                http://www.dba-oracle.com/t_tuning_cpu_usage_vmstat.htm

                BTW, it's normal to see CPU at 100%, it's designed that way:

                http://www.dba-oracle.com/t_high_cpu.htm

                Is CPU a bottleneck? If so, have you looked at optimizercost_model='cpu'?

                http://www.dba-oracle.com/oracle_tips_optimizer_cost_model.htm

                **********************************************************
                On may 01, load on the database came down drastically after I gathered system stats but on the may 02, it not the same case.
                Ah good clue! Are you running STATSPACK or AWR? If so, you can analyze the historical data and see exactly what changed:

                http://www.dba-oracle.com/art_Statspack_Trend.htm

                *****************************************************************
                Wait event is Latch: Cache buffer chains.
                http://www.dba-oracle.com/t_high_cache_buffer_chain_waits_contention.htm

                The "cache buffer chain" latch wait is normal, but high values are associated with high simultaneous buffer access, similar to a freelist shortage on an index or table segment header.



                MetaLink also suggests using the dbblock_hash_buckets and
                dbblock_hash_latches undocumented parameters have been suggested as a remedy, but always check with MetaLink before using any undocumented parameters. To see the "cache buffer chain" waits:

                select
                count(*) child_count,
                sum(gets) sum_gets,
                sum(misses) sum_misses,
                sum(sleeps) sum_sleeps
                from
                v$latch_children
                where
                name = 'cache buffers chains';


                *************************************************************

                Can you get a STATSPACK report and paste it into Statspackanalyzer at http://www.statspackanalyzer.com ?

                That should help greatly . . . .


                --------------------

                Hope this helps. . .

                Donald K. Burleson
                Oracle Press author
                Author of "Oracle Tuning: The Definitive Reference":
                http://www.dba-oracle.com/bp/s_oracle_tuning_book.htm
                • 5. Re: system statistics not being used
                  user92272
                  1. No. I did not use vmstat.
                  2. optimizer cost model is not touched
                  3. AWR. Only change is appliation started using the database
                  4. I opened a SR for Oracle advice on
                  db_block_hash_buckets and
                  dbblock_hash_latches
                  5. Values from latch children
                  65536 1.4401E+11 3127583862 569219878
                  6. From AWR
                  1. Host CPU was a bottleneck and the instance was consuming 86% of the host CPU.
                  All wait times will be inflated by wait for CPU.
                  2.Contention on buffer cache latches was consuming significant database time.

                  Hope this helps.
                  • 6. Re: system statistics not being used
                    Charles Hooper
                    Hi Siva,
                    Database server with runq at 100%
                    BTW, it's normal to see CPU at 100%, it's designed
                    that way:

                    http://www.dba-oracle.com/t_high_cpu.htm
                    Hello Mr. Burleson,

                    I have seen a couple articles and books that suggest CPU utilization at 100% is not normal. Some of the articles suggest that when CPU utilization exceeds 90%, odd things such as excessive latching, excessive time spend in the LOG FILE SYNC wait event, Oracle processes volunteering to yield their CPU time slice and then having to wait a significant number of CPU cycles to be run on the CPU again, and various other problems which may arise. If I remember correctly, the book "Forecasting Oracle Performance" suggests that the tipping point may be closer to 75% CPU utilization.

                    If saisiva is seeing excessive CPU utilization, it might be a good idea to examine the number of logical IOs and determine whether or not a large number of logical IOs is a sign that SQL statements need to be tuned. Saisiva may also want to determine if excessive parsing (even excessive soft parsing) is a problem and whether or not session_cached_cursors is set to an appropriate value. Examination of recursive CPU time, and the actual recursive calls may also be helpful.

                    Saisiva, you might find that even though a couple rows are retrieved by the query to aux_stats$, that does not mean that system statistics have been collected. System statistics should be collected when the system is under a normal to heavy (typical) load to avoid the collection of inappropriate statistics that then adversely affect cost based plan calculations.
                    You might want to take a look at this article, or check the Oracle documentation:
                    http://www.oracle.com/technology/pub/articles/lewis_cbo.html

                    Charles Hooper
                    IT Manager/Oracle DBA
                    K&M Machine-Fabricating, Inc.
                    • 7. Re: system statistics not being used
                      user92272
                      System stats were collected when the database is at its peak load
                      • 8. Re: system statistics not being used
                        ben23
                        I gathered system
                        stats two times on May 01 and May 02. On may 01, load
                        on the database came down drastically after I
                        gathered system stats but on the may 02, it not the
                        same case.
                        Did you store the stats for the two runs?

                        If so, did you compare them and try reverting back to the 'good' May 01?

                        If not, this is a good reason to start storing them.
                        • 9. Re: system statistics not being used
                          Jonathan Lewis
                          Sivakumar,

                          You haven't answered Burleson's question - how do you know that you are not using the system statistics ? The fact that you don't like the performance isn't proof, the fact that you are running at 100% CPU isn't proof.

                          All you have to do is something like:
                          explain plan for
                          select * from some_table;

                          select * from table(dbms_xplan.display);
                          You can check two things: if your execution plan includes two columns about (%CPU) and Time, then you are using system statistics, e.g:
                          -------------------------
                          | Cost (%CPU)| Time     |
                          -------------------------
                          |     2   (0)| 00:00:01 |
                          |     2   (0)| 00:00:01 |
                          -------------------------
                          If you are not using system statistics, then you won't get these two columns, and the execution plan will be followed by the comment:
                          Note
                          -----
                             - cpu costing is off (consider enabling it)
                          Ignore comments to the contrary - on a multi-user system, running constantly at 100% CPU is a bad thing. If your bottleneck is CPU then any wait problems (such as latch waits) get worse because a waiting session probably has to spend more time waiting to get to the top of the run queue - which is the point made by the ADDM report you've quoted.

                          This means your next task is to identify some CPU intensive tasks and try to make them more CPU efficient. Perhaps the best starting point is to run the AWR report for the interval that gave you that ADDM report, and look at the top two or three SQL stataments in the section titled "SQL ordered by CPU", and see if there execution plan is reasonable. Any SQL reported in that section of the report will also have it's execution plan(s) captured, and you can list them by running the awrsqrpt.sql script in your $ORACLE_HOME/rdbms/admin directory.

                          Regards
                          Jonathan Lewis
                          http://jonathanlewis.wordpress.com
                          http://www.jlcomp.demon.co.uk

                          Post Script: I suppose I could have mentioned that there's an introductory article about CPU costing / system statistics that I wrote a few years ago here: http://www.oracle.com/technology/pub/articles/lewis_cbo.html - it might have some relevance.

                          Message was edited by: Jonathan Lewis, 6th May 2008
                          Missing "not" added, "comments" changed to "columns"
                          • 10. Re: system statistics not being used
                            Jonathan Lewis
                            Hi Siva,

                            BTW, it's normal to see CPU at 100%, it's designed that way:

                            http://www.dba-oracle.com/t_high_cpu.htm
                            This is the third time you've made this claim in the last few weeks. But it's not "normal" to drive CPUs to 100%. Except for extremely exotic circumstances (and that excludes database processing) it means you've overloading the system and wasting resources.

                            Your comments in that article about the _optimizer_cost_model paramter are also incorrect. The parameter is not there for databases that are CPU bound. See Re: performance issuse


                            Jonathan Lewis
                            http://jonathanlewis.wordpress.com
                            http://www.jlcomp.demon.co.uk

                            Message was edited by: Jonathan Lewis
                            to insert missing word
                            • 11. Re: system statistics not being used
                              Jonathan Lewis
                              >
                              MetaLink also suggests using the
                              dbblock_hash_buckets and
                              dbblock_hash_latches undocumented parameters have
                              been suggested as a remedy, but always check with
                              MetaLink before using any undocumented parameters.
                              Have you got a metalink document ID for that note. Apart from benchmark specials it's a suggestion that should have been forgotten when 8i was released

                              Regards
                              Jonathan Lewis
                              http://jonathanlewis.wordpress.com
                              http://www.jlcomp.demon.co.uk
                              • 12. Re: system statistics not being used
                                108476
                                Oh brother . . . .
                                it means you've overloading the system and wasting resources.
                                No, it means that you are using the system.

                                All SMP architectures are designed to throttle-up the CPU quickly, and a 100% utilization DOES NOT mean an overload. It's straight from Algorithms 101 . . . .


                                ************************************************************
                                Your comments in that article about the optimizercost_model paramter are also incorrect.
                                No, you are wrong again:

                                http://www.dbforums.com/archive/index.php/t-1306175.html

                                "We experimented with the following other parameters so far:

                                db_writer_processes (set to 4 in combination with cpu_count = 4)
                                fast_start_parallel_rollback = high
                                _optimizer_cost_model = cpu


                                Although the last parameter is undocumented it has brought the
                                greatest improvement so far: dropping >2 min execution time down to 40
                                seconds (therefore still much higher than 10 sec with cpu_count = 1)

                                Sven Bombach"
                                • 13. Re: system statistics not being used
                                  Jonathan Lewis
                                  >
                                  All SMP architectures are designed to throttle-up the
                                  CPU quickly, and a 100% utilization DOES NOT mean an
                                  overload. It's straight from Algorithms 101 . . . .
                                  You may wish to believe that, but please don't tell anyone else.

                                  Consider the simple case of 8 queries running on 8 CPUs. They will be competing for the same cache buffers chains latches - which means that seven processes could be spinning on the same latch while the eighth is holding it. None of the processes ever need wait, but most of them could be wasting CPU most of the time.

                                  >
                                  Your comments in that article about the optimizercost_model paramter are also incorrect.
                                  No, you are wrong again:
                                  http://www.dbforums.com/archive/index.php/t-1306175.html
                                  That thread also says:
                                  "unfortunately our trial and error approach to find the proper settings (we searched Oracle Tech Net (OTN) and google groups without success) did not help so far."
                                  So they didn't know what they were doing, or why it might help.

                                  >
                                  "We experimented with the following other parameters so far:
                                  db_writer_processes (set to 4 in combination with cpu_count = 4)
                                  Read <a href="http://kevinclosson.wordpress.com/2007/08/10/learn-how-to-obliterate-processor-caches-configure-lots-and-lots-of-dbwr-processes/">Kevin Closson's blog</a> to find out why that's not such a great idea.
                                  fast_start_parallel_rollback = high
                                  And that was to solve their problem with a pl/sql procedure not running fast enough, apparently.
                                  _optimizer_cost_model = cpu
                                  Although the last parameter is undocumented it has brought the
                                  greatest improvement so far: dropping >2 min execution time down to 40
                                  seconds (therefore still much higher than 10 sec with > cpu_count = 1)
                                  They were using 9i so the default value for this parameter would have been 'choose' - which means CPU costing would have been used if they had collected system statistics.

                                  Since changing this parameter to 'cpu' apparently made a difference we can infer that they didn't have any system stats set - which means no CPU information gathered, which means the optimizer couldn't have been doing anything to address any issue of the database being "CPU-bound".

                                  But, apart from anything else - where is there any comment in that thread about how much CPU they were using before and after the change - all they mention is elapsed time.

                                  Jonathan Lewis
                                  http://jonathanlewis.wordpress.com
                                  http://www.jlcomp.demon.co.uk
                                  • 14. Re: system statistics not being used
                                    108476
                                    They will be competing for the same cache buffers chains latches - which means that seven processes could be spinning on the same latch while the eighth is holding it.
                                    Whoa! I'm talking about server-side CPU consumption, outside Oracle.

                                    Just to make sure that you are not operating under "assumptions" here, I'm talking about server-side CPU consumption, on an SMP server running lots of concurrent tasks. The references to 100% CPU are as they display in standard OS monitors like lparstat, watch, sar and vmstat.

                                    Also, don't assume that all OS tasks have the same dispatching priority. In a server-side 100% CPU situation, some tasks may have enqueues, while other do not. That's what "nice" is for.
                                    You may wish to believe that, but please don't tell anyone else.
                                    We ARE NOT talking about an opinion here . . . . .

                                    - CPU metrics are expressed as percentages. Hence, all of the CPU values (us+sy+id+wa) will always sum to 100.

                                    - In the UNIX/LINUX environment, the runqueue is used to display the number of active tasks that are currently waiting for CPU resources.

                                    - That's the measure of a CPU bottleneck: where runqueue > cpu_count.

                                    **********************************************************************

                                    This from IBM, who built their server and the OS, the final word:

                                    http://www.ibm.com/developerworks/aix/library/au-aixoptimization/index.html

                                    "In a system that is CPU-bound, all the processors are 100 percent busy and some jobs might be waiting for CPU time in the run queue."

                                    "Fixing one bottleneck might actually cause a CPU bottleneck, because your system is now allowing the CPU to perform to its optimum capacity and it might not have the capacity to handle the increased amount of resources given to it. "


                                    *****************************************************

                                    Here is a reference from Harvard, a good school, I'm told:

                                    http://maltman.hmdc.harvard.edu/unix/systune.pdf

                                    "CPU User+Sys activity near 100%, no CPU idle, no CPU wait on I/O,(sar -u) low context switch (pswch in sar -p) activity .

                                    Check the run-queue (sar -q) to see how many processes are waiting to run, a high run-queue can indicate a CPU bottleneck, but you can have a CPU without a high run-queue (if only one large process is running.) If CPU Sysactivity is high, suspect inefficient use of system calls, or borderline memory or i/o bottlenecks"
                                    all they mention is elapsed time.
                                    That's all that counts to most tuning experts . . . . . . RESPONSE TIME!
                                    1 2 Previous Next