Advanced Database Troubleshooting Tools, Pt. 2

Version 2

    Advanced Database Troubleshooting Tools, Part 2

     

    by Przemyslaw Piotrowski

     

    The first part of this series introduced a selection of tools for managing the database stack across the enterprise using a comprehensive approach for preventing, examining, and researching transient database problems at scale. This installment dives into troubleshooting problems at the database, instance, and SQL statement level using a tailored, accurate response when either performance or availability is at stake.

     

    See Also:

    Part 1: Strategy for approaching troubleshooting

    Part 3: Tools for tackling symptoms surfacing through the wait Interface, cluster, operating system, and client layers.

     

    Introduction

     

    Oracle Database runs on dozens of different platforms and offers an extensive feature set for a number of industries. Such flexibility comes at the cost of increased complexity that would not have been required otherwise. Troubleshooting exercises should always include such variables as workload patterns and throughput trends because, frequently, most of the tuning can be done by introducing change into the process rather than just by altering the configuration.

     

    Many of the database troubleshooting tools introduced in this article depend on Oracle Diagnostics Pack, which is licensed separately. While some of the tools listed in Figure 1 don't impose such a dependency, there are no solid reasons to neglect using this powerful functionality. When the requirement for such a license is needed, it will be indicated in the article.

     

    f1.gif

    Figure 1. Database performance troubleshooting framework.

     

    Self-Tuning Redefined

     

    An often overlooked yet very useful tool that doesn't get enough recognition is Automatic Database Diagnostic Monitor (ADDM). After the Oracle Database 11g release added Oracle Real Application Clusters (Oracle RAC) support, there is no reason not to make analysis of the ADDM report a first step when investigating a system, provided that the Oracle Diagnostic Pack license is on hand. One important thing to note is that ADDM always works in the context of database time, so all displayed recommendations refer to a fraction of total database time. These recommendations also indicate the weighted impact, which makes primary problems stand out clearly.

     

    Introduced with Oracle Database 12c is Spot ADDM, which kicks in when overall performance deteriorates to the point when the CPU or I/O operations top out. Sometimes marketed as "performance tuning at the expert level," ADDM performs a comprehensive analysis of a database and comes back with a report with the following sections that enumerates each performance item (see Figure 2):

     

    • Finding: A detailed description of the identified problem such as expensive SQL commands with their wait classes, I/O performance metrics, connection management problems, and so on. Each finding comes with a percentage impact indicator translated into seconds of database time (of the snapshot width) that could potentially be saved when the issue is removed from the picture entirely.
    • Recommendation: The finding is broken down into one or more recommendations, reiterating the expected impact to overall database time. This section always specifies a class of analysis, and each recommendation always has advice for exactly one action and its rationale.
    • Action: A description of the specific activity attributed to the given recommendation, which is sometimes a ready-to-go solution and sometimes a follow-up item.
    • Rationale: An explanation of why the given recommendation was produced.

     

    An analysis is performed on a pair of AWR snapshots and, by default, is generated every time a new snapshot is being captured. Oracle Enterprise Manager 12c introduced an entirely new perspective to ADDM by adding real-time capability to the diagnostic analysis. While that might not sound like a big deal, what's really important is the ability to connect using a diagnostic connection mode, which can greatly enhance the experience of troubleshooting "hang" situations when regular connections are not possible.

     

    f2.gif

    Figure 2. Real-time ADDM.

     

    There are a number of ways to access ADDM:

     

    • Through the Oracle Enterprise Manger 12c performance pages
    • Using SQL*Plus or SQL Developer reports: ORACLE_HOME/dbms/admin/addmrpt.sql and addmrpti.sql
    • Using the DBMS_ADDM package with ANALYZE_DB, ANALYZE_INSTANCE, or ANALYZE_PARTIAL (subset of instances)
    • Using the DBMS_ADVISOR package (deprecated in favor of the dedicated DBMS_ADDM package)
    • Through system views DBA_ADIVSOR_FINDINGS, DBA_ADDM_FINDINGS, DBA_ADVISOR_FINDING_NAMES, DBA_ADVISOR_RECOMMENDATIONS, and DBA_ADVISOR_TASKS

     

    Even for a first-class tuning citizen, several assumptions have to be made regarding ADDM to make it plug flawlessly into a tuning workflow. Comprehensive coverage of the database dimensions does not make the analysis bulletproof, and if some problems didn't make it into the ADDM engine yet, the report would return a skewed recommendation and a diagnosis that is amiss. It is important to recognize these shortcomings and use ADDM to steer in the direction of the real problem in contrast to blindly relying on the ADDM report.

     

    Workload Repository

     

    Not many terms come up before Automatic Workload Repository (AWR) whenever database performance is discussed. AWR is probably one of the very few database acronyms that can and effectively is articulated by anyone inside an IT organization—from system admins to architects, from project managers to business analysts, and from DBAs to CTOs. The ability to read and understand an AWR report is among the top skills a DBA can acquire, because it reveals how things are orchestrated inside the database engine. What's AWR then?

     

    Already mentioned was the heavy instrumentation of the Oracle Database system. At every given moment, a set of time models, wait statistics, Active Session History (ASH) samples, and system statistics are updated and exposed through dynamic performance v-dollar views (V$ views). Because these statistics are cumulative and measured from instance startup, little can be deduced from these numbers until a snapshot of this data is made and compared against another point in time. This is the sole purpose of AWR, which at regular interval captures the contents of these views and stores them in the SYSAUX tablespace.

     

    By comparing the differences between two snapshots, it is possible to determine which statistics changed and to draw certain conclusions. These differences are then provided as a report. Despite how simple it sounds, Oracle Database comes with hundreds of parameters, statistics, metrics, and events, so being able to put them together into organized lists makes the troubleshooting process much more plausible.

     

    There are a number of different ways to access AWR data in various representations:

     

    • Using Oracle Enterprise Manager to get a graphical view on past performance, including drill-downs into segment, SQL, or system snapshots
    • Using SQL*Plus or SQL Developer and scripts bundled inside ORACLE_HOME/rdbms/admin:
      awrrpt.sql for a single-instance report
      awrddrpt.sql for a single-instance difference report
      awrgrpt.sql for an Oracle RAC report
      awrgdrpt.sql for an Oracle RAC difference report
      awrinfo.sql for a diagnostic report, which is a best way to health-check the current AWR configuration/deployment
      spawrio.sql for a report of I/O intensity over time
    • Using one of the pipelined functions inside the DBMS_WORKLOAD_REPOSITORY package
    • Using one of the DBA_HIST family of system views

     

    In cases when no Oracle Diagnostic Pack license is available on the system, AWR's free predecessor, Statspack, can be used for achieving comparable functionality of snapshotting performance views and running reports. This will also be the option for Oracle Database Express Edition 11g installations and Oracle Active Data Guard deployments where AWR is not available. Documentation for Statspack and Standby Statspack can be found under the ORACLE_HOME/rdbms/admin directory in spdoc.txt and sbdoc.txt, respectively. And to clarify, Statspack is not deprecated and continues to receive improvements between releases.

     

    When a busy production system cannot handle extra traffic from possibly complex AWR queries or reports, the whole investigation can be offloaded to a test or development site through the AWR Extract and Load utility, which uses Oracle Data Pump behind the scenes to transport repository data to a different location. Two scripts are involved in the transportation process inside ORACLE_HOME/rdbms/admin:

     

    • awrextr.sql for extraction
    • awrload.sql for uploading

     

    Offloading AWR brings another interesting feature introduced with Oracle Enterprise Manager 12c—AWR Warehouse—which is a new way to centralize AWR repositories from chosen monitored targets into a single location, and is aimed at facilitating capacity planning of database systems. For individual offloads, AWR Extract and Load will be the natural choice, whereas AWR Warehouse is an all-purpose repository that can streamline entire capacity planning, resource utilization, and information lifecycle management processes. Data is pumped into a central warehouse repository through Oracle Enterprise Manager jobs that pull AWR data at regular intervals from target locations.

     

    Average Active Sessions and Their History

     

    The single most-representative metric for determining current database load is Average Active Sessions (AAS)—a cumulative number of sessions performing work at a given point in time. It represents the same metric that is being used to calculate overall database time over a time window. AAS includes foreground sessions waiting on I/O, CPU, or other non-idle events and, generally, must remain below a system's CPU count to avoid bottlenecks and excessive waits. The "Average" part of "Average Active Sessions" refers to the metric's aggregated nature, whereas a ratio of database time to time slot length can be used to determine average load over time ranges.

     

    By default, ASH samples active sessions every second into the ASH Buffer, which can then be queried using V$ACTIVE_SESSION_HISTORY views. That data is later further stored inside AWR view DBA_HIST_ACTIVE_SESS_HISTORY, but only 10 percent of the original information is retained, by default, and 90 percent gets lost. This is a clear indicator that certain attention must be paid to the result of ASH and AWR ASH analysis, because some of the events that occurred on a system will simply be missing. Still, as long as there is an Oracle Diagnostic Pack license, there is no better way to check system activity than through ASH and AAS.

     

    Many ways exist for accessing ASH data:

     

    • Using Oracle Enterprise Manager performance pages
    • Using SQL*Plus or SQL Developer to run the built-in script ashrpt.sql for single-instance databases or ashrpti.sql for Oracle RAC
    • Using the DBMS_WORKLOAD_REPOSITORY package and the ASH_GLOBAL_REPORT_HTML/TEXT, ASH_REPORT_HTML, and ASH_REPORT_TEXT pipelined functions
    • Using the V$ACTIVE_SESSION_HISTORY and AWR DBA_HIST_ACTIVE_SESS_HISTORY views
    • Using ASHDUMP for dumping ASH data from memory to comma-separated format (also known as "offline ASH"; see My Oracle Support Note 243132.1)

     

    The close relationship Oracle Enterprise Manager 12c has with the Oracle Database is, again, represented through embedding extra features on top of Oracle Database through Oracle Enterprise Manager's graphical interface. From the ASH standpoint, this is ASH 3-D analytics (see Figure 3), a new way to drill down into session history data through attributes organized into dimensions.

     

    f3.gif

    Figure 3. ASH 3-D analytics with SQL_ID and User session breakdown.

     

    Advisors and Assistants

     

    Those with limited troubleshooting experience will definitely appreciate a series of tools developed to assist in guided issue resolution and provide instant recommendations for transient problems at each level. Oracle Database 9i introduced the concept of "advisor," which was greatly enhanced in the 10g, 11g, and 12c releases into a mature framework of capabilities often dubbed as "self-healing" (see Figure 4).

     

    Because the list of advisors might be a little overwhelming at first glance, only a subset of them requires manual invocation. Most of them are already running behind the scenes, so their recommendations are immediately available.

     

    f4.gif

    Figure 4. Oracle Database advisory framework.

     

    SQL advisors:

     

    • SQL Tuning Advisor, a part of the separately licensed Oracle Tuning Pack, will take underperforming statements as input and perform a complex analysis to come up with scored recommendations that will potentially improve performance. SQL Tuning Advisor will analyze dependent objects for stale or missing statistics, profile queries by executing them partially or in full extent, examine access paths and recommend indexes or suggest running SQL Access Advisor, and perform transposition of execution plan steps or suggest structural changes to SQL that might result in huge gains in response time. Usually the suggestions come in SQL Profile form that can be created automatically (not a default). SQL Tuning Advisor is invoked automatically for heavy SQL statements during autotask maintenance windows.
    • SQL Access Advisor, another component of Oracle Tuning Pack, determines if access to data structures can be optimized through new indexes, materialized views, or partitioning (via the Partition Advisor subfeature). SQL Access Advisor can operate in manual or automatic mode.
    • SQL Repair Advisor will act upon statements throwing errors and determine the reason they failed. Frequently it will address the problem with SQL Patch, which will alter the execution plan in a way that can work around the problem.

     

    Memory advisors:

     

    • Shared Pool Advisor is the advisory framework behind properly sizing the shared pool component of the system global area (SGA). Based on current workload characteristics, it will help estimate the potential impact of increasing or decreasing the shared pool size.
    • Buffer Cache Advisor determines if the current workload can benefit from the caching of table data by increasing the buffer cache, and it can effectively predict if downsizing this cache will have negative impact.
    • PGA Advisor helps to properly scale the PGA_AGGREGATE_TARGET parameter by observing current session activity in program global area (PGA). Having sufficient memory for session operations is crucial for heavy connection saturation.

     

    Maintenance and recovery advisors:

     

    • MTTR Advisor is indispensable in sizing the Mean-Time-To-Recover ratio, one of the most important recovery parameters inside the database that affects dirty block checkouts required for instance recovery at given point in time. Information provided by this advisor through the V$INSTANCE_RECOVERY view will require finding out upper and lower bounds on a specific workload and choosing a number that fits into the recovery point objective (RPO) and recovery time objective (RTO) strategy.
    • Segment Advisor assists in determining segments' high watermark for shrinking, checks tables for chained rows, and, in automatic mode, suggests objects that would benefit from advanced compression.
    • Undo Advisor makes sizing of undo tablespaces a lot easier in lieu of manual sizing, which usually ends up with automatically extended tablespaces that can run amok and fill all free space.
    • Upgrade Advisor is a My Oracle Support resource providing full coverage of upgrade process between certain database releases. It is a complete checklist of every stage of a real upgrade project with templates and references to further documentation. For Oracle Database releases 9.2 to 11.2, see My Oracle Support Note 264.1, for releases 10.2 to 11.2, see My Oracle Support Note 251.1, and for releases 11.2 to 12.1, see My Oracle Support Note 1600704.1.
    • Patching and Maintenance Advisor is another My Oracle Support reference guide for building a patching strategy around Oracle Database environments (see My Oracle Support Note 331.1). It is project management–ready, so it can be immediately incorporated into a corporate maintenance strategy.
    • Data Recovery Advisor is an entirely new Oracle Recovery Manager (Oracle RMAN) feature introduced with the Oracle Database 11g release that can analyze the database state and not only provide recommendations but handle full recovery with a few simple steps: VALIDATE DATABASE, ADVISE FAILURE, and REPAIR FAILURE. This framework will always provide a detailed impact of every recommendation so that decisions can be made on how to handle all sorts of media failures.

     

    Feature advisors:

     

    • Streams Performance Advisor is a central point for troubleshooting Streams performance, including the recent XStream API. It combines the DBMS_STREAMS_AVISOR_ADM package with few a DBA_STREAMS dynamic performance views to provide instant insight into throughput rates, latencies, queues, and read/write components.
    • Advanced Compression Advisor prior to Oracle Database 11g Release 1 was a separate download, but later it was incorporated into a regular release and has been available ever since through the DBMS_COMPRESSION package. It allows estimating potential space savings when different compression algorithms are applied to segments.
    • Application Express Advisor is a cross-check for applications and workspaces deployed within an Oracle Application Express environment. These checks include validation against common errors, security, and performance and are aimed at enhancing the overall experience of running and using Oracle Application Express applications. The user interface for Application Express Advisor is within the Oracle Application Express administrative interface (Application Builder > Utilities > Advisor).

     

    Architecture advisors:

     

    • Maximum Availability Architecture Advisor, part of Oracle Enterprise Management Advisory, is a top-down utility for diagnosing the complete database stack for compliance with the Oracle Maximum Availability Architecture blueprint, which includes the setup of Oracle RAC and Data Guard Standby, among other things. These checks are on par with the Oracle Maximum Availability Architecture Scorecard provided through ORAchk, so either one can be used to validate configuration.
    • Oracle RAC Load Balancing Advisory (CLB) is a seamless mechanism for distributing a workload evenly between Oracle RAC nodes by providing feedback and optimizing for response time (OLTP) or throughput (DWH). The choice for the CLB mode is made at the clusterware service level through the srvctl -clbgoal parameter.

     

    Advisor webcasts are not real advisors but live streaming presentations from Oracle experts on particular product features, which are available to all My Oracle Support subscribers (see My Oracle Support Note 1456176.1). All streamed webcasts are later archived and made available with their accompanying slide decks.

     

    My Oracle Support is constantly enhanced with new tools for guided problem resolution to assist in solving common problems that share a similar solution path and to deliver further self-service benefits. While there are too many assistants to cover them all here, some significant ones include the following.

     

    General assistants:

     

     

    Performance-related assistants:

     

     

    Problems with SQL Statements

     

    The first step in tuning SQL statements would always be pulling the execution plan (see Figure 5), which is a tree representation of the Cost Based Optimizer (CBO) steps required to perform the execute phase of cursor processing. Today, very few refrain from using other means of generating plans, such as going to the DBMS_XPLAN package that can receive SQL_ID as input for digging into the SQL area of the library cache (using the DISPLAY_CURSOR pipelined function) or the AWR repository (with the DISPLAY_AWR function or the ORACLE_HOME/rdbms/admin/awrsqrpt.sql report).

     

    f5.gif

    Figure 5. SQL statements troubleshooting workflow with guided resolution and external support.

     

    Oracle Database 11g introduced some breakthrough functionality in terms of watching execution plans live as they run: through active reports generated from SQL Monitor (see Figure 6), which is available under the Oracle Tuning Pack license (see My Oracle Support Note 1229904.1). SQL Monitor can display the live execution of expensive statements and provide visual feedback on the progress of each plan's step.

     

    Another useful tool for capturing single SQL metrics is the SQL Details Active Report, which can assist in troubleshooting statement problems by capturing their current and past state. This report relies on the availability of both the Oracle Diagnostic Pack and the Oracle Tuning Pack, but in return, it offers an extensive package of SQL-isolated data for analysis. Both active reports allow for offline viewing, and required resources are pulled directly from Oracle Technology Network.

     

    The SQL Monitor and SQL Details Active Report are invoked from Oracle Enterprise Manager performance pages or through the DBMS_SQLTUNE package through the REPORT_SQL_MONITOR, REPORT_SQL_MONITOR_LIST, and REPORT_SQL_DETAIL functions. To make sure a query makes its way into SQL Monitor, it can be supplemented with the /*+ MONITOR */ hint, or to prevent it from being captured there, use the /*+ NO_MONITOR */ hint.

     

    f6.gif

    Figure 6. SQL Monitor Active Report within Oracle Enterprise Manager.

     

    When an issue isn't obvious by this point, many tools are available for assisting in troubleshooting SQL statements:

     

    • SQL Query Performance Overview is the central location on My Oracle Support for troubleshooting possible query performance issues; see My Oracle Support Note 199083.1.
    • The SQL Health Check (SQLHC) script is used for non-invasive investigation into a query's performance with a solid base of built-in recommendations and reports. It can operate under any license, but having the Oracle Diagnostics Pack and Oracle Tuning Pack will yield more information; see My Oracle Support Note 1366133.1.
    • SQLTXPLAIN (or SQLT) predates SQLHC from which it was spun, and its reports are frequently requested by Oracle Support to troubleshoot underperforming statements. SQLT runs with a number of different methods for generating diagnostic information for SQL statements, regardless of whether the statements are executing currently or they were executed in the past. It is useful when SQLHC falls short of providing clues, although it requires the installation of its own schema on the target database. A central point for reference on SQLT is in My Oracle Support Note 215187.1 and usage guidelines are available in My Oracle Support Note 1614107.1.
    • Parallel Execution Health-Checks and Diagnostic Reports (PXHCDR) assists in the diagnosis of parallel execution (PX) by going through the degree of all objects and the current parallel server configuration. It comes back with reports that include recommendations as well as details on objects and configured parameters; see My Oracle Support Note 1460440.1. Recent releases of SQLHC/SQLT run PXHCDR diagnostics as part of a routine check.
    • AWR SQL helps to determine if historical plans were on par with the currently executing one by using the same information as the DBMS_XPLAN.DISPLAY_AWR function but in a report form. It's generated using the ORACLE_HOME/rdbms/admin/awrsqrpt.sql script or through the DBMS_WORKLOAD_REPOSITORY.AWR_SQL_REPORT_HTML/TEXT function by providing the SQL_ID and snapshot range.

     

    In the era of Oracle Database 12c, it is rather unlikely that neither of mentioned tools will return without a recommendation but that is just one, high-level side of troubleshooting. The other is more invasive—affecting load—and more thorough, including a breakdown of waits, and it is a more challenging approach, requiring extra processing for massive amounts of instrumentation data dumped for analysis, called SQL Trace. Numerous tools were developed to assist in obtaining, processing, and examining that information. Internally, SQL Trace invokes event 10046 and often it will be referred to as such.

     

    Phase 1—Obtaining trace data:

     

    • On the client level through the DBMS_MONITOR.CLIENT_ID_TRACE_ENABLE procedure
    • On the session level through the DBMS_SESSION.SET_SQL_TRACE procedure
    • On another session through the DBMS_SYSTEM.START_TRACE_IN_SESSION procedure (only when requested by Oracle Support)
    • On the module and action level through the DBMS_MONITOR.SERV_MOD_ACT_TRACE_ENABLE procedure
    • On the service level through the DBMS_MONITOR.SESSION_TRACE_ENABLE procedure
    • On the database level through the DBMS_MONITOR.DATABASE_TRACE_ENABLE procedure

     

    Phase 2—Processing trace data:

     

    • TRCSESS is used to merge trace data from multiple individual files filtered by session, client, service, action, or module into single trace file for later processing.
    • TKPROF, which for each input trace file outputs SQL Trace statistics, library cache information, the execution plan, plus wait and/or bind data if it was selected on Phase 1. TKPROF outputs SQL execution information in highly compact, readable format where statements can be sorted by multiple metrics such as CPU or physical reads, aggregated, and explained. It used to be one of the most common ways to demonstrate a particular SQL command's performance, but it has been superseded with plan statistics, SQL Monitoring, and the SQL Details Active Report, which are more accessible sources for query performance.

     

    Phase 3—Examining trace data: Trace Analyzer (TRCANLZR or TRCA) is a dedicated tool for producing a diagnostic report from raw trace files and will perform statistical analysis to come up with set of observations and recommendations regarding the execution and performance of captured statements. It requires prior installation in a separate schema, but analysis can be offloaded to a development database so that only production trace data has to be pulled for screening.

     

    Optimizer, which was historically called Cost-Based Optimizer (CBO) to differentiate it from the Oracle Database 9i legacy Rule-Based Optimizer (RBO), is responsible for producing query execution plans for actual data retrieval from disk using a sophisticated algorithm that uses statistical information to compute optimal access paths. Beyond obvious situations such as missing indexes, stale statistics, or messy predicates, it is often difficult to understand a particular access path from the explain plan alone. And when verification of this plan through the GATHER_PLAN_STATISTICS hint shows that there is no distortion from estimates in the actual execution steps, the only way to dig into the Optimizer logic is by dumping the Optimizer trace, which uses internal event 10053. Before Oracle Database 11g, this tracing always required prior instrumentation on the session level and then manually invoking the SQL statement of interest.

     

    The new tracing infrastructure that shipped with Oracle Database 11g changed the way the Optimizer trace is obtained so that it can be pulled after the SQL command has been executed. This makes it possible to trace statements issued from live applications and PL/SQL packages by just providing the SQL_ID as an input to the DBMS_SQLDIAG.DUMP_TRACE procedure. Dumped traces can be later located through the V$DIAG_INFO view, and it is the session that invoked the diagnostic event not the one actually running the SQL command.

     

    The Big Picture

     

    A recent addition to the tools family is ORATOP (see Figure 7), a command-line interface modeled after the popular "top" utility for displaying the top sessions and wait events in a highly readable form.

     

    f7.gifInsert f7.gif here and then delete this line

    Figure 7. ORATOP running on an Oracle Database 12c single-instance database with heavy concurrency.

     

    A growing number of features packed inside Oracle Database over the years has spurred a parallel increment in the quantity of troubleshooting tools around them. That increment also includes a boost in the quality and maturity of former utensils to further improve the overall functionality coverage. Developing a strategy and systematization around that toolset is fundamental for ensuring high database service levels with the least effort.

     

    About the Author

     

    Przemyslaw Piotrowski is a principal database administrator with more than 10 years of experience developing, architecting, and running mission-critical databases for a number of industries. He is an Oracle Database Certified Master and an Oracle Certified Expert for Oracle RAC, Performance Tuning, and SQL specializing in Oracle Maximum Availability Architecture and real-world performance patterns. As regular Oracle Technology Network columnist, he publishes information on databases, tools, and programming languages.