Oracle® Database 2 Day + Performance Tuning Guide 10g Release 2 (10.2) Part Number B28051-01 |
|
|
View PDF |
Transient performance problems are short-lived and do not appear in the Automatic Database Diagnostics Monitor (ADDM) analysis. ADDM tries to report the most significant performance problems during an analysis period in terms of their impact on DB time. If a particular problem lasts for a very short duration, its severity might be averaged out or minimized by other performance problems in the entire analysis period; therefore, the problem may not appear in the ADDM findings. Whether or not a performance problem is captured by ADDM depends on its duration compared to the interval between the Automatic Workload Repository (AWR) snapshots.
If a performance problem lasts for a significant portion of the time between snapshots, it will be captured by ADDM. For example, if the snapshot interval is set to one hour, a performance problem that lasts for 30 minutes should not be considered as a transient performance problem because its duration represents a significant portion of the snapshot interval and will likely be captured by ADDM.
On the other hand, a performance problem that lasts for only 2 minutes could be a transient performance problem because its duration represents a small portion of the snapshot interval and will likely not show up in the ADDM findings. For example, if the user notifies you that the system was slow between 10:00 p.m. and 10:10 p.m., but the ADDM analysis for the time period between 10:00 p.m. and 11:00 p.m. does not show a performance problem, it is likely that a transient performance problem occurred that lasted for only a few minutes of the 10-minute interval reported by the user. This chapter describes how to resolve these types of transient performance problems with Oracle Database.
This chapter contains the following sections:
In order to capture a detailed history of database activity, Oracle Database samples active sessions each second using the Active Session History (ASH) sampler. The sampled data is collected into memory and written to persistent storage by the Automatic Workload Repository (AWR) snapshot processing. ASH is an integral part of the Oracle Database self-management framework and is extremely useful for diagnosing performance problems.
Unlike the instance-level statistics gathered by the AWR, sampled data is gathered at the session level by ASH. By capturing statistics for only active sessions, a manageable set of data is represented, with the size being directly related to the work being performed rather than the entire instance.
Sampled data captured by ASH can be rolled up based on the various dimensions that it captures, including:
SQL identifier of a SQL statement
Object number, file number, and block number
Wait event identifier and parameters
Session identifier and session serial number
Module and action name
Client identifier of the session
Service hash identifier
You can run Active Session History (ASH) reports to analyze transient performance problems with the database that only occur during specific times. This is especially useful when trying to:
Resolve transient performance problems that may last for only a short period of time, such as why a particular job or session is not responding when the rest of the instance is performing normally
Perform scoped or targeted performance analysis by various dimensions or their combinations, such as time, session, module, action, or SQL identifier
See Also:
This section describes how to run ASH reports.
On the Database Performance page, under Average Active Sessions, click Run ASH Report.
The Run ASH Report page appears.
Enter the start date and time, and the end date and time, of the time period when the transient performance problem occurred.
In this example, database activity spiked between 10:00 p.m. and 10:10 p.m., so an ASH report needs to be created for that time period.
Click Generate Report.
The Processing: View Report page appears while the report is being generated.
After the report is generated, the ASH report appears under Report Results on the Run ASH Report page, as shown in Figure 7-1. To save the report in HTML for future analysis, click Save to File.
To use an ASH report to analyze transient performance problems, you need to review the contents of the ASH report to identify the problem.
The contents of the ASH report are divided into the following sections:
The Top Events section describes the top wait events of the sampled session activity categorized by user, background, and priority. Use the information in this section to identify the wait events that may be the cause of the transient performance problem.
The Top Events section contains the following subsections:
The Top User Events subsection lists the top wait events from user processes that accounted for the highest percentages of sampled session activity.
The example in Figure 7-2 shows that a high percentage of database activity (73 percent) is consumed by the CPU + Wait for CPU
event. In this example, the Load Profiles section should be examined next to determine the type of activity that is causing this wait event.
The Top Background Events subsection lists the top wait events from backgrounds that accounted for the highest percentages of sampled session activity.
The Top Event P1/P2/P3 subsection lists the wait event parameter values of the top wait events that accounted for the highest percentages of sampled session activity, ordered by the percentage of total wait time (% Event). For each wait event, values in the P1 Value, P2 Value, P3 Value column correspond to wait event parameters displayed in the Parameter 1, Parameter 2, and Parameter 3 columns.
For example, in Figure 7-3, db file sequential read
is the top wait event, consuming 13.26 percent of total wait time while the session waits for a sequential read from the database to be performed.
The wait event has the following parameters: file#
(P1), block#
(P2), and blocks
(P3). The corresponding values for these wait event parameters are displayed in the P1 Value, P2 Value, P3 Value column:
file#
= "1"
block#
= "1801"
block
= "1"
Using the parameter values in this example, it can be determined that 1801 is the block number of the single block for which the session is waiting.
See Also:
Oracle Database Reference for information about common wait event parameters
The Load Profile section describes the load analyzed in the sampled session activity. Use the information in this section to identify the service, client, or SQL command type that may be the cause of the transient performance problem.
The Load Profile section contains the following subsections:
The Top Service/Module subsection lists the services and modules that accounted for the highest percentages of sampled session activity.
The example in Figure 7-4 shows that the majority of database activity (65 percent) is consumed by the SYS$USERS
service running the SQL*Plus module. In this example, it appears that the user is running a high-load SQL statement that is causing the performance problem. The Top SQL Command Types subsection should be analyzed next to determine if a particular type of SQL statement makes up the load.
See Also:
"Monitoring Top Services" for information about services
"Monitoring Top Modules" for information about modules
"Monitoring Top Actions" for information about actions
The Top Client IDs subsection lists the clients that accounted for the highest percentages of sampled session activity based on their client ID, which is the application-specific identifier of the database session.
The Top SQL Command Types subsection lists the SQL command types (such as SELECT
, UPDATE
, INSERT
, and DELETE
) that accounted for the highest percentages of sampled session activity.
The example in Figure 7-5 shows that the majority of database activity (68 percent) is used by the SQL command type SELECT
. From this information, it appears that SQL statements causing the performance problem are SELECT
statements. The Top SQL section should be analyzed next to identify the high-load SQL statement.
The Top SQL section describes the top SQL statements of the sampled session activity. Use this information to identify high-load SQL statements that may be the cause of the transient performance problem.
The Top SQL section contains the following subsections:
The Top SQL Statements subsection lists the SQL statements that accounted for the highest percentages of sampled session activity. To view the text of the Top SQL statements displayed, click the link in the SQL ID column of the SQL statement that you want to view.
The example in Figure 7-6 shows that the majority of database activity (65 percent) is used by a particular SQL SELECT
statement. Based on this information, it appears that this is the high-load SQL statement causing the performance problem. The Top Sessions section should be analyzed next to identify the session running this SQL statement.
See Also:
The Top SQL Statements subsection lists the SQL statements using literals that accounted for the highest percentages of sampled session activity.
The Top Sessions section describes the sessions that were waiting for a particular wait event. Use this information to identify the sessions that accounted for the highest percentages of sampled session activity, which may be the cause of the transient performance problem.
The Top Sessions section contains the following subsections:
The Top Session subsection lists the sessions that were waiting for a particular wait event that accounted for the highest percentages of sampled session activity.
The example in Figure 7-7 shows that the majority of database activity (65 percent) is used by the user ICHAN
with the session ID of 135. From this information, it appears that this is the user running the high-load SQL statement identified earlier. You should investigate this session to determine whether it is performing a legitimate operation and tune the SQL statement if possible. If tuning the SQL statement is not possible and the session is causing an unacceptable performance impact on the system, you may want to consider terminating the session.
See Also:
"Monitoring Top Sessions" for information about sessions
Chapter 10, "Tuning SQL Statements" for information about tuning SQL statements
The Top Blocking Sessions subsection lists the blocking sessions that accounted for the highest percentages of sampled session activity.
The Top Objects/Files/Latches section provides additional information about the most commonly-used database resources and contains the following subsections:
The Top DB Objects subsection lists the database objects (such as tables and indexes) that accounted for the highest percentages of sampled session activity.
The Top DB Files subsection lists the database files that accounted for the highest percentages of sampled session activity.
The Top Latches subsection lists the latches that accounted for the highest percentages of sampled session activity.
Latches are simple, low-level serialization mechanisms to protect shared data structures in the system global area (SGA). For example, latches protect the list of users currently accessing the database and the data structures describing the blocks in the buffer cache. A server or background process acquires a latch for a very short time while manipulating or looking at one of these structures. The implementation of latches is operating system dependent, particularly in regard to whether and how long a process will wait for a latch.
See Also:
Oracle Database Performance Tuning Guide for information about latches
The Activity Over Time section is one of the most informative sections of the ASH report. This section is particularly useful for longer time periods because it provides in-depth details about activities and workload profiles during the analysis period. The Activity Over Time section is divided into multiple time slots, as shown in Figure 7-8.
Each of the time slots contains information regarding that particular time slot, as described in Table 7-1.
Table 7-1 Activity Over Time
Column | Description |
---|---|
Slot Time (Duration) |
Duration of the slot |
Slot Count |
Number of sampled sessions in the slot |
Event |
Top three wait events in the slot |
Event Count |
Number of ASH samples waiting for the wait event |
% Event |
Percentage of ASH samples waiting for wait events in the entire analysis period |
The first and last slots are usually odd-sized. All inner slots and are 1 minute each, and can be compared to each other.
When comparing the inner slots, perform a skew analysis by identifying spikes in the Event Count and Slot Count columns. A spike in the Event Count column indicates an increase in the number of sampled sessions waiting for a particular event. A spike in the Slot Count column indicates an increase in active sessions, because ASH data is sampled from active sessions only and a relative increase in database workload. Typically, when the number of active session samples and the number of sessions associated with a wait event increases, the slot may be the cause of the transient performance problem.