1 2 Previous Next 20 Replies Latest reply on Oct 13, 2017 6:50 PM by TGrave

    Database Performance Trending Ideas

    TGrave

      Oracle 12.1.0.2 Enterprise (with July patch applied) CDB/PDB setup on RHEL 6 servers.

       

      I have been asked to start providing some sort of daily reporting for DB performance to appease higher management but ultimately to track trends in the DB to show system degradation and/or failures. Stuff like DB Time, CPU Time, latch contention, procedure times, etc. Since they've never really done a baseline performance test prior to getting to the 6TB size they're at now, I basically have to start monitoring now so we can look back at this time and see whether a month or year from now we can see that we're actually seeing system degradation or if we're still chugging along fine. The background on this is recent panic from longer and longer ETL loads, daily statistic refreshes, daily Materialized View refreshes, end-user report generations, and end-user report failures. The panic (from the app dev team side) is from last year when they started to see these similar things and in December they had a complete system failure that lasted almost 2 months. The Sys Admin team has stated what we're seeing currently is not the same as it was then as CPU usage is spiking and dropping normally where in December it spiked and never came back down. We have more overall (DB, Mid Tier, and Web server) monitoring now then we did then so we would be aware of issues now. I'm looking for ideas on how to provide the numbers, times, and/or percentages to management in a way that is useful to me but also easily understood by non-DBA types as well. I am free to built whatever objects I need to complete this action so I can then call up trends when necessary. I've never had to do this type of trend reporting so I'm open to all ideas. I'm fairly certain what I need is already in the DB but need a way to present it in a useful manner. I've started by running a daily AWR but this is a bit overwhelming for the uninitiated. I appreciate any feedback and figure I can't be the first to asked this type of question.

        • 1. Re: Database Performance Trending Ideas
          John Thorton

          It is easier to ask the questions, than provide meaningful answers.

           

          Which metrics are Key Performance Indicators (KPI)?

          what are corresponding Units of Measure for each KPI?

          Who or how are meaningful Baseline value established for each KPI?

          How much variation is acceptable from Baseline value?

           

          >I'm fairly certain what I need is already in the DB but need a way to present it in a useful manner.

          If above is true then examples should widespread, common, extensive, & ubiquitous on the Internet

          • 2. Re: Database Performance Trending Ideas
            TGrave

            Hence my question for ideas. I have been asked to provide some way to track system performance over time. They would not be able to tell what KPI's I need to provide them, so they ask me to provide this to be review and merits debated. I am aware of what I think would be important- DB Time, CPU Time, Latch contentions, active users per day, requests per day, slow performing jobs, etc. The baseline wasn't established so we've essentially agreed that we will have to track the system as is from now going forward. In this manner, spikes or irregularities will be ignored (more or less) and a downward trend from what is currently will take precedence. I'm posting to get ideas on what others have been asked from IT Managers/higher that was useful to whomever it was provided to.

             

            Believe me I'm not internet adverse to searching for what I need. I just figured I'd come here to pose the question while also looking elsewhere for ideas.

            • 3. Re: Database Performance Trending Ideas
              John Thorton

              DB Time, CPU Time, -> Will never   exceed 100%

              Latch contentions -> It is normal & built into DB to control access to resources & objects. How to quantify & differentiate between acceptable & a problem?

              active users per day -> Quantify? Is application 3-tier? Is Connection Pooling utilized?

              requests per day, -> Oracle does not know what a "request" is. Please clarify & quantify.

              slow performing jobs -> how to differentiate between slow & acceptable jobs? Where is slow time being spent?

               

              Which metric at what value is used to decide if/when CPU is system bottleneck?

              Which metric at what value is used to decide if/when RAM is system bottleneck?

              Which metric at what value is used to decide if/when I/O is system bottleneck?

              Which metric at what value is used to decide if/when Network is system bottleneck?

               

              IMO, this task is non-trivial with few, if any, unambiguous decision points.

              • 4. Re: Database Performance Trending Ideas
                rp0428

                I have been asked to start providing some sort of daily reporting for DB performance to appease higher management but ultimately to track trends in the DB to show system degradation and/or failures.

                Sounds like your IT dept is missing a few key components: like a problem reporting system where users create trouble tickets when they run into problems.

                 

                Not having 'degradation' data I can understand but your org should certainly already have adequate logs and maintenance record for failures. If not your IT manager should start asking the DBAs why they aren't keeping records/logs about the work they are doing.

                Stuff like DB Time, CPU Time, latch contention, procedure times, etc.

                Huh? Why would 'upper management' give a RA about ANY of those? No one of any board I ever talked to EVER MENTIONED, or even understood what 'DB Time', 'CPU Time', or 'latch contention' even was.

                The background on this is recent panic from longer and longer ETL loads, daily statistic refreshes, daily Materialized View refreshes, end-user report generations, and end-user report failures.

                Now you're talking. Those are legitimate concerns and you definitely need to be in a position to answer questions, present data, and advise on possible actions for ALL of those.

                 

                Except that 'Stuff like DB Time, CPU Time, latch contention' part of what you said generally has NOTHING to do with those issues.

                 

                Each of those issues is essentially a batch process and in a well-run shop it is MANDATORY for those processes to be instrumented. That means, for each process, collecting and logging performance metrics such as: start time, end time, number of rows affected. For code you can also capture the actual parameters being used.

                 

                In addition your version control system should have the execution plans for each of your important queries and batch process steps.

                 

                The logs can be used to produce reports about the frequency of execution of those processes and the volume of data processed.

                 

                The execution plans have the 'baseline' info needed to compare to a current plan to see if things have changed.

                I basically have to start monitoring now so we can look back at this time  and see whether a month or year from now we can see that we're actually seeing system degradation or if we're still chugging along fine.

                That is what the performance logs and execution plans will let you do.

                 

                Periodic AWR reports should provide enough of the 'in the moment' data to keep an eye on actual resource usage.

                The panic (from the app dev team side) is from last year when they started to see these similar things and in December they had a complete system failure that lasted almost 2 months.

                Panic from the app dev team? LOL! Oh - sorry - I thought you were joking.

                 

                That 'panic' should have been from your IT and business managers. And both of them should have been jumping all over the operations and app dev teams to find out what was going on and fix it.

                The Sys Admin team has stated what we're seeing currently is not the same as it was then as CPU usage is spiking and dropping normally where in December it spiked and never came back down.

                And the OBVIOUS question is - what is that opinion based on? Were they guessing? Or do they have some hard info?

                I'm looking for ideas on how to provide the numbers, times, and/or percentages to management in a way that is useful to me but also easily understood by non-DBA types as well.

                You DON'T HAVE ANY numbers/data to provide to anyone let alone 'easily understood' data - at least not based on what you have posted.

                I am free to built whatever objects I need to complete this action so I can then call up trends when necessary.

                Great - implement the logging I suggested earlier and begin collecting the actual execution plans for each of the key queries.

                 

                Call up trends? Let me repeat what I just said above: You DON'T HAVE ANY numbers/data to provide to anyone. You can't do 'trends' until you have the underlying data.

                 

                I'm fairly certain what I need is already in the DB but need a way to present it in a useful manner.

                You may be certain but I'd bet money you do NOT have what is needed. Shops that I consult with that have even minimal logging or have saved baseline execution plans are in the minority.

                 

                Focus on collecting the data. Worry about the best way to present the data AFTER you actually have some.

                1 person found this helpful
                • 5. Re: Database Performance Trending Ideas
                  JohnWatson2

                  You could send out a statspack report (or AWR report if you have the licence) each morning covering the previous 24 hours. That should tell people what they want to know.

                   

                  Incidentally, you might want to think about using phrases like "appease higher management" on a public forum 

                  • 6. Re: Database Performance Trending Ideas
                    Igoroshka

                    Did VPs defined the aim of such reports? "track trends in the DB to show system degradation and/or failures" looks too generous for me. How to charge 1 big spike? 100 small spikes? Should it be separated by application, server, type of load?

                    Do you have configured EM? If yes, why not to define baseline(s) and give VPs limited access to the EM accompanying with explanations?

                    • 7. Re: Database Performance Trending Ideas
                      TGrave

                      Currently, I have automated a daily AWR report for the previous day. The issue is that the people who are looking at these are looking at all this information and I would like to provide something a little more "dumbed" down for them. So I am not having to explain them daily. Something that if need be can be shown in line graphs or someway the Business Objects/BI folks can understand. I can tell them that things look "normal" for us right now based on what I'm seeing through AWR's, TOAD, TOAD Spotlight, and htop commands in PuTTY. I have extended the AWR retention time from default 8 to 45 days (for now) so I will be able to show them a little more. Since they had their issue last year prior to my employment they just want to feel safe that these occasional spikes are not building up to that again this year. After speaking with the SysAds that were here last year, the features that were turned off or fixed to get back to normal are still set. So from their end, they are confident we're not seeing the same things as last year. But management still needs to "feel" safe by seeing numbers with their own eyes. We know that with daily growth, more users, and more reporting requirements there will be system slowdown but we'd like to be able to "see" it. Plus from my perspective it will help when it comes time to talking about more funding for hardware/system improvements.

                       

                      I've never been asked to do something like this before so basically, my questions to the community is what kinds of things are you guys tracking to report to whomever for overall database performance or health? And how are you rendering this information from the database or OS? Is it traceable overtime or do you have more point-in-time/day-to-day reporting?

                       

                      Thank you JohnWatson2 for the heads up.

                      • 8. Re: Database Performance Trending Ideas
                        Andrew Sayer

                        TGrave wrote:

                         

                        Currently, I have automated a daily AWR report for the previous day. The issue is that the people who are looking at these are looking at all this information and I would like to provide something a little more "dumbed" down for them. So I am not having to explain them daily. Something that if need be can be shown in line graphs or someway the Business Objects/BI folks can understand. I can tell them that things look "normal" for us right now based on what I'm seeing through AWR's, TOAD, TOAD Spotlight, and htop commands in PuTTY. I have extended the AWR retention time from default 8 to 45 days (for now) so I will be able to show them a little more. Since they had their issue last year prior to my employment they just want to feel safe that these occasional spikes are not building up to that again this year. After speaking with the SysAds that were here last year, the features that were turned off or fixed to get back to normal are still set. So from their end, they are confident we're not seeing the same things as last year. But management still needs to "feel" safe by seeing numbers with their own eyes. We know that with daily growth, more users, and more reporting requirements there will be system slowdown but we'd like to be able to "see" it. Plus from my perspective it will help when it comes time to talking about more funding for hardware/system improvements.

                         

                        I've never been asked to do something like this before so basically, my questions to the community is what kinds of things are you guys tracking to report to whomever for overall database performance or health? And how are you rendering this information from the database or OS? Is it traceable overtime or do you have more point-in-time/day-to-day reporting?

                         

                        Thank you JohnWatson2 for the heads up.

                        Maybe the most relevant information to give out would be the actual end-to-end elapsed times for actual business processes.

                        e.g.

                        It takes on average 0.04 seconds for a user to load up a details page, 1% of these actions were over 0.6 seconds.

                         

                        How do you measure that? You measure it from the application side.

                         

                        By all means, collect AWR reports, they're useful for investigating problems once you know they exist, but you should always make sure you use the correct scope. There's far too much stuff going on in a typical database that has zero impact on real end users.

                         

                        Get hold of execution plans, extended SQL trace files for actual business processes.. then when something goes wrong, trace them again and see what's changed - this makes finding the problem easy. Get hold of Cary Millsap's book "Optimizing Oracle Performance: A Practitioner's Guide to Optimizing Response Time" , share it with your management - the first few chapters are especially relevant to them.

                        1 person found this helpful
                        • 9. Re: Database Performance Trending Ideas
                          The Real Rob the Relic

                          The fact that you're licensed for awr and have extended the retention period can give you the confidence that you're gathering the necessary information and have bought yourself time for the next step.  This next step I would say is to get an understanding of the nature of the app and from there identify the most likely areas of database/system resources that may come under strain.  It sounds like it may be an in-house developed app, if so you're better placed for this.  Does it use predefined queries, generated queries, ad-hoc queries or a mix.  Might it benefit from resource groups to limit resource usage of the ad-hoc queries during peak periods?  Are bind variables used or is query parsing a potential issue as the user community grows?  Once you have a feel for the way the database is used you'll be in a better place to identify what's required in the 'high level' report(s) to provide a comfort factor for management.

                          • 10. Re: Database Performance Trending Ideas
                            TGrave

                            "rp0428 wrote:

                            Sounds like your IT dept is missing a few key components: like a problem reporting system where users create trouble tickets when they run into problems.

                             

                            Not having 'degradation' data I can understand but your org should certainly already have adequate logs and maintenance record for failures. If not your IT manager should start asking the DBAs why they aren't keeping records/logs about the work they are doing."

                             

                            We actually have a ticket system that the end-users are using. This is also what is feeding the need to track database health as there has been an uptick in tickets coming across that are now not completing or taking so long the application timeout is killing their connections. We have discussed extending that but still the question comes around- "why is it now taking longer?". Which then leads to them discussing this is what started happening last year before our shutdown. Really for now it is due to the end of the Fiscal Year so more users are having to report their department activities and for longer periods. So that is easily explained for now.

                             

                            The DBA's aren't doing any work in the DB unless tracked through the ticket system after being voted on by a control board. The tablespace creation is built and pushed from the application. So there isn't a team of DBA developers. This is a source of contention for me as I've never worked in this type of environment. With the app handling this, I have to learn the app side better to see if what is being pushed to the DB is garbage and we can tune that. My assumption is that this will be the case.

                             

                            "Huh? Why would 'upper management' give a RA about ANY of those? No one of any board I ever talked to EVER MENTIONED, or even understood what 'DB Time', 'CPU Time', or 'latch contention' even was."

                            Thank you for that info, but what ARE some of the things boards have asked to see?

                             

                             

                            "Now you're talking. Those are legitimate concerns and you definitely need to be in a position to answer questions, present data, and advise on possible actions for ALL of those.

                             

                            Except that 'Stuff like DB Time, CPU Time, latch contention' part of what you said generally has NOTHING to do with those issues.

                             

                            Each of those issues is essentially a batch process and in a well-run shop it is MANDATORY for those processes to be instrumented. That means, for each process, collecting and logging performance metrics such as: start time, end time, number of rows affected. For code you can also capture the actual parameters being used.

                             

                            In addition your version control system should have the execution plans for each of your important queries and batch process steps.

                             

                            The logs can be used to produce reports about the frequency of execution of those processes and the volume of data processed.

                             

                            The execution plans have the 'baseline' info needed to compare to a current plan to see if things have changed."

                             

                            The DBA team that performs the ETL jobs do track job, start, and end times. I do not believe they track the execution plans though. I do like your suggestion for collecting those metrics though, thank you for that idea. They have noted lately that the jobs are taking longer to complete. Where the entire process was done by 7:30 it is now completing at 8:30. This can be attested to several factors so the metrics you mention will come in handy if it merely just the volume of data has grown.

                             

                            The problem with tracking execution plans though is we do not use bind variables at all. SAP BO does not like them and has caused serious issues with the DB. So without the bind variables every DB query has the potential to be a different execution plan depending on user inputs. This may not apply to ETL jobs though, never looked at them.

                             

                             

                            That is what the performance logs and execution plans will let you do.

                             

                            Periodic AWR reports should provide enough of the 'in the moment' data to keep an eye on actual resource usage.

                             

                            That's what I'm currently doing with the AWR, but we currently can only look 8 days back. I have changed that one I noticed it. So we will be able to go 45 days back as time goes along.

                             

                             

                            Panic from the app dev team? LOL! Oh - sorry - I thought you were joking.

                             

                            That 'panic' should have been from your IT and business managers. And both of them should have been jumping all over the operations and app dev teams to find out what was going on and fix it.

                             

                            The development team drives the train since we are a reporting shop. I say app development team, but they are really report designers within our application.

                             

                            The Sys Admin team has stated what we're seeing currently is not the same as it was then as CPU usage is spiking and dropping normally where in December it spiked and never came back down.

                            And the OBVIOUS question is - what is that opinion based on? Were they guessing? Or do they have some hard info?

                             

                            They're basing that off of the fact that when a report request has completed the CPU's drop back down to normal. We have a monitoring capability that sends out notifications when the CPU usage reaches 80% or more for longer than 10 minutes. Then a follow on is sent out when it falls back below 80%.

                             

                            You DON'T HAVE ANY numbers/data to provide to anyone let alone 'easily understood' data - at least not based on what you have posted.

                             

                            Hence the reason I am here. I have AWR reports, but I want to provide something meaningful and more streamlined for them.

                             

                             

                            Great - implement the logging I suggested earlier and begin collecting the actual execution plans for each of the key queries.

                             

                            Call up trends? Let me repeat what I just said above: You DON'T HAVE ANY numbers/data to provide to anyone. You can't do 'trends' until you have the underlying data.

                             

                            Let me repeat MYSELF- that is why I am asking for information from the community. What are some of the things you all have been asked for and how have you provided it.

                             

                             

                            You may be certain but I'd bet money you do NOT have what is needed. Shops that I consult with that have even minimal logging or have saved baseline execution plans are in the minority.

                             

                            Focus on collecting the data. Worry about the best way to present the data AFTER you actually have some.

                             

                            I'm guessing you and Mr. Thornton are cut from the same mold where I have to be literal about everything. What I meant by "I'm fairly certain what I need is already in the DB" is that I probably don't really need to build too much or use other applications (i.e.- Spotlight) to get that data. I may possibly have to build a table(s) to hold the daily culling of my data, but nothing too crazy. Or maybe it is already tracked somewhere I can query that. I'm just looking for ideas on what everyone else may already be doing.

                            • 11. Re: Database Performance Trending Ideas
                              rp0428

                              We actually have a ticket system that the end-users are using. This is also what is feeding the need to track database health as there has been an uptick in tickets coming across that are now not completing or taking so long the application timeout is killing their connections. We have discussed extending that but still the question comes around- "why is it now taking longer?". Which then leads to them discussing this is what started happening last year before our shutdown. Really for now it is due to the end of the Fiscal Year so more users are having to report their department activities and for longer periods. So that is easily explained for now.

                              Good - a ticket system is important and can help provide info about the type and severity of issues from the business users perspective.

                               

                              On the down side some of the info reported is general and won't necessarily tie to a specific use at a specific time to allow you to correlate it to other activity that may have been going on at the same time. You need to know WHAT is 'taking longer' - a specific canned report? An adhoc query?

                               

                              You are likely using that info as leads to assist you tracking down the users reporting the problem to get more detailed info about exactly what they were doing and when they were doing it. Even then most info about 'taking longer' will be pretty subjective.

                              The DBA's aren't doing any work in the DB unless tracked through the ticket system after being voted on by a control board.

                              Yes - that has been the trend for many years now. DBAs are responsible for the overall health and safety of the DB but they can't be experts in each of the apps. They rely on app-domain experts to do the initial gathering of info about app-specific issues and any initial analysis of that info.

                               

                              But that is why several of us are questioning some of your initial statements to get more info

                              Stuff like DB Time, CPU Time, latch contention, procedure times,

                              The first three of those as GLOBAL attributes. They aren't necessarily tied to any specific application or procedure. Those three are what a DBA would look into and rely on the app-domain experts to help identify if their apps/procedures might be part of any numbers that they are looking into.

                               

                              That is why it is so important to have application level metrics so you can determine quickly if any particular step/process of the applications YOU are responsible for have suddenly changed metrics.

                              With the app handling this, I have to learn the app side better to see if what is being pushed to the DB is garbage and we can tune that.

                              Except, as I think you know, that puts the cart before the horse.

                               

                              Until you identify a SPECIFIC problem/issue there is NOTHING to 'tune'.

                               

                              And performance is ALL ABOUT 'comparison' - comparing NOW to BEFORE. If you don't have the BEFORE info there is nothing to compare to.

                               

                              If someone says something is taking 'longer' you need to know how long it took BEFORE, under similar conditions for similar amounts of data, in order to know whether the time it takes NOW is 'longer' than the time it took BEFORE.

                              My assumption is that this will be the case.

                              Just a suggestion? Try to get in the habit of using the word 'hypotheses' instead of 'assumption'. May sound trivial but there are important connotation differences between them. A hypotheses is something you use as a base for further testing and confirmation. Too often assumptions are accepted as TRUTH and not followed by verification.

                              The DBA team that performs the ETL jobs do track job, start, and end times.

                              They have noted lately that the jobs are taking longer to complete. Where the entire process was done by 7:30 it is now completing at 8:30. This can be attested to several factors so the metrics you mention will come in handy if it merely just the volume of data has grown.

                              Again - just to be precise (as I assume you know this) the end time alone isn't sufficient for comparison purposes. You also need to know the start time and, as you add at the end, the amount of work being done in each case.

                               

                              It should be simple to create a standard report based on those metrics that highlights metrics that have changed by some 'designated' percent. The larger the percent change the more attention should be given to that process.

                              The problem with tracking execution plans though is we do not use bind variables at all. SAP BO does not like them and has caused serious issues with the DB.

                              That can certainly cause serious issues. Not only does it make it harder to 'tune' those queries but the users doing them often don't/can't remember what parameters they really used at any given time.

                              So without the bind variables every DB query has the potential to be a different execution plan depending on user inputs.

                              Yep! At some shops we have had to resort to modifying the front end to capture and log parameter values prior to submission of the query and, in hard core cases, have had to re-architect processes to use stored procedures that return REF CURSORs so that the procedure can collect the necessary metrics. Pain in the neck but it can get the job done.

                              The development team drives the train since we are a reporting shop. I say app development team, but they are really report designers within our application.

                              Most modern reporting tools can base their reports on stored procedures. That allows you to do what I just said above. In tools like Brio, Cognos, Oracle Reports and the old Crystal Reports it was also pretty easy to change the report datasource from a  query/view to a stored procedure.

                              They're basing that off of the fact that when a report request has completed the CPU's drop back down to normal. We have a monitoring capability that sends out notifications when the CPU usage reaches 80% or more for longer than 10 minutes. Then a follow on is sent out when it falls back below 80%.

                              Sure - but NONE of the above, in and of itself, indicates any problem. When a report executes it uses resources so naturally the % of resources being used will increase and then go back down.

                               

                              And that 80% is, as already mentioned, only meaningful based on the context. Does it always reach 80% for longer than 10 minutes from 2-3pm on the last day of ever month? If so then maybe it means nothing at all.

                              Let me repeat MYSELF- that is why I am asking for information from the community. What are some of the things you all have been asked for and how have you provided it.

                              . . .

                              I'm guessing you and Mr. Thornton are cut from the same mold where I have to be literal about everything. What I meant by "I'm fairly certain what I need is already in the DB" is that I probably don't really need to build too much or use other applications (i.e.- Spotlight) to get that data.

                              Ok - but that is why myself, and others, have been repeating ourselves. There is no 'mold'. But the basic steps for troubleshooting have been the same for decades:

                               

                              Step #1 - collect the data

                              Step #2 - analyzed the data

                              Step #3 - take action, if necessary, based on the data

                               

                              We we have been 'repeating' is that you need to start at step #1. I, for one, have tried to tell you how we have provided it: metrics, metrics, metrics.  All code is instrumented, All reports and queries are instrumented, internal IT reports are developed that show significant changes in metrics.

                               

                              Start collecting more raw data about metrics for your code and reports just like you have started doing with the AWR reports.

                              1 person found this helpful
                              • 12. Re: Database Performance Trending Ideas
                                John Brady - UK

                                My reply will overlap a lot with what others are saying. 

                                 

                                From a database perspective, you can use AWR if licensed for it (which you say you are).  AWR lets you create "baseline snapshots" which are kept forever, and which you can then compare future snapshots too.  Look into DBMS_WORKLOAD_REPOSITORY.CREATE_BASELINE.  Of course you have to know what "normal" is when creating such baselines - the captured period should have a "normal and representative" workload going on at that time.

                                 

                                As others have said, users never experience slow databases - they experience slow applications.  Any such "performance measurements" MUST be made across the whole stack - from the application through the middle tiers down to the database.  Looking at the database in isolation of everything else is stupid - maybe the problem is in the database, maybe it isn't.  You must measure and record the level of application activity - it is the application that submits the work to be done to the database.  Are more people using the application?  Is it submitting more requests for work to the database per hour?  That way you can track and correlate changes in database activity with changes in application usage activity.

                                 

                                What is the difference between an inefficient database wasting system resources and an efficient database maximising the use of system resources?  If you don't know then how can you tell a "good" system workload from a "bad" system workload.  Using 8 CPU's to execute 1 SQL statement might the most efficient for that SQL (shortest execution time), or it might not (wasting CPU cycles and slowing down all other SQL statements being executed at the same time).  All such value judgements are context specific.  You can minimise system resources being used on average, and end up with a system that is more than 50% idle i.e. doing nothing most of the time.  Or you can try and maximise the use of those resources, but run the risk of overloading the system (anything above 80% utilisation leads to serious queues forming and increases in response times).  Is the glass half empty or half full?  Is using more CPU a good or a bad thing?  Depends on many things, including the spare, unused CPU capacity available, and the other application requests being submitted to the system at the same time.

                                 

                                This is why ultimately your only "real measure of success" is the application workload (work in) and response time (results out).  How the database achieves this is internal to the database, and most application users don't care about it at all.

                                 

                                If you believe that the individual SQL statements being executed by these reports are the cause for increased workload on the system, then you can also enable SQL Trace and log every SQL executed and the resources it used (but this will lead to a lot of files using up a lot of disk space), and / or investigate the use of ASH (Active Session History) if licensed for it (it is part of the same pack as AWR, so you should be).  You can write reports to look at the ASH data to see which sessions were recently active, the SQL they were executing, and waits they were experiencing.  Whereas SQL Trace gives you everything that happened during a SQL statement execution, ASH uses a sampling technique to give you a subset of that data.  If single SQL statements are running for long times and using up system resources, then ASH can help show you this.  Be aware that ASH data is only held in memory for a few hours, and then flushed to the database into a DBA_HIST table.

                                 

                                I'm trying to say that when database workload increases suddenly, ASH is one tool available to try and identify in real time what SQL is executing that is taking longer and using up more system resources.  AWR can also be used after the event, but it uses far coarser and less frequent snapshots of the whole database system - every hour by default.  It can sometimes be difficult to isolate the cause of a 15 minute burst of activity and slowdown from a full hours worth of activity data that includes a lot of other things as well.

                                1 person found this helpful
                                • 13. Re: Database Performance Trending Ideas
                                  TGrave

                                  John Brady - UK, I look into creating a baseline snapshot once I've ascertained what is a "normal" day/time/date for us.

                                   

                                  This monitoring is a shop effort. I am only able to speak from a DB perspective. A Business Objects Admin is also formulating tracking from the application level. There is also a Linux guru monitoring the OS level and folks monitoring the mid-tiers. Once all areas have some usable information it will all be tracked for an overall stack picture.

                                   

                                  The inefficient vs efficient information is graetly appreacited and known already. I have to figure out how to show this to non-Oracle folks. That information (once derived) will drive the performance tuning if needed. But I do not know how to pull that info from the DB in a useful format to provide to management. I can pull AWRs, ADDMs, and ASH reports all day long but it will not help provide that visually to them. Coincidently I have automated a daily full day AWR, a daily split ADDM (1 for nightly ETL processes & 1 for daily analytics use), and daily sql scripts (that show DB information, tablespace zies and freespace, controlfile validity, datafile size, logfile validity, basic active user session info, tablespace and index max_extents, top 20 latches, etc.). I am still shaping and instituting this method as I go along.

                                   

                                  I also have ASH reporting at the handy if something pops up throughout the day. Which has found the 2 times so far that CPU and CPU Wait times are a large part of the report generation time. This is due to parallelism of the tables involved and lack of CPU at the time. Our biggest time waste is buffer gets which is causing high IO. Some cases show 1 execution performing millions of buffer gets.

                                   

                                  My thought that initiated this question thread was that I can't be the only person to ever be in a situation where they were asked to provide trending analysis of the DB and not given much else to go on. It doesn't help that they didn't have a resident DBA on-hand so start some sort of baseline to go off of. I was just looking for "what have you done when asked for this", not specific checklist. I can figure out what works for us and what doesn't from the ideas.

                                  • 14. Re: Database Performance Trending Ideas
                                    Andrew Sayer

                                    "Our biggest time waste is buffer gets which is causing high IO. Some cases show 1 execution performing millions of buffer gets."


                                    A buffer get is logical IO. It sounds like you have some SQL that does more work than it really should. Did you look at what the SQL was? Did you look at why it was doing so much work?

                                    1 2 Previous Next