Using the Performance Analyzer in the Oracle Solaris Studio IDE
Profiling applications is critical when improving application performance. Without profiling, it is very easy to "guess" where time is being spent, and then expend effort optimizing code that is not on the critical path. The Performance Analyzer makes it easy to profile an application, but it can be tricky to know what to do next in order to make an application run faster.
Oracle Solaris Studio 12.4 includes an overview screen designed to focus users on the metrics of interest, and to ensure that they don't miss any metrics that might indicate an opportunity.
The Overview screen
When an experiment is loaded, the Overview screen, shown in Figure 1, is displayed.
Figure 1. Overview screen
The Overview screen has three sections. The first section contains information about the experiment, the second section shows the metrics that the experiment contains, and the third section shows a preview of the selected metrics.
Information About the Experiment
When examining an old experiment, it can be useful to be reminded of when it was collected, what machine it was gathered on, what command was profiled, and what the command line was. All this information is shown in the Experiment(s) section, as shown in Figure 2.
Figure 2. Experiment summary
Examining the Available Metrics
The Metrics section of the Overview screen is the largest and most important area. This area fulfills two purposes. First, it shows the user what metrics are available in the experiment. Second, it allows the user to select what metrics are shown.
The metrics that an experiment might show depend on what was gathered. Most profiling experiments will show time spent in various states: user time, system time, and so on. Some profiling experiments will show hardware counter information, such as instruction count or cycle count. Other experiment types might show data such as OpenMP parallel regions or memory allocations.
Figure 3 shows some of the metrics that are available in an experiment. It is worth discussing this section in some considerable detail.
Figure 3. Metrics available in an experiment
The section shows the experiment duration, which tells the user for how long data was collected. The other metric to look at is the Total Thread Time, which is the number of seconds of data that are available for all the threads in the experiment.
In this experiment, about 1.4 million seconds of data were gathered in about 1,000 seconds of elapsed time. This suggests that there are about 1,400 threads present over the duration of the experiment, with each thread contributing about 1,000 seconds. The threads need not last for the entire duration of the experiment, but the interesting fact here is that we are dealing with a multithreaded experiment and not just a few threads—a lot of threads.
The Total Thread Time breaks down into multiple states. The easiest way to think about this is a thread is either running on a CPU or it is not running on a CPU.
There is a metric called Total CPU Time, which captures the amount of time that a thread is running on a CPU. The rest of the time the thread is idle; usually, this means it is waiting for some event to happen. In the data shown, the time spent on a CPU is about 6 percent of the total thread time, which is unsurprising. In situations where there are many threads, it is quite usual to have a large number of those threads waiting for something to happen—and while they are waiting, the threads will contribute no on-CPU time.
When a thread is on the CPU, it can be in one of three states: User mode, System mode, or Trap mode. Most applications should spend the majority of their time in User mode.
An important thing to realize is that the data reported here is different from the data reported by tools, such as
prstat, that report system-wide activity. An experiment like the one shown may report that it has only 6 percent CPU time, and 94 percent of the time is spent in other states. However, that will not mean that the system will report being 94 percent idle.
For example, imagine that there are 100 threads. All the threads have work to do, but the system only has eight CPUs. So only 8 of the 100 threads can be on a CPU at any instant in time. The system would report that it is 100 percent busy, but the Performance Analyzer experiment would show an experiment with only 8 percent on-CPU time and 92 percent of the time spent off CPU. In this instance, the threads that were not given a CPU, but had work to do, would be shown as being in the "Wait CPU" state—they are waiting to be assigned a CPU to run on.
Controlling Metric Selection
To the right of the metrics is a series of checkboxes, as shown in Figure 4. These control whether a metric is shown or not. Some metrics are highlighted with an asterisk. These are metrics that the tool thinks might be worth investigating. There is a button that will select all the metrics that are highlighted with asterisks.
Figure 4. Selecting metrics of interest
There are multiple ways a single metric can be presented. Some metrics are purely time, so there is a checkbox that can be selected to show those metrics as time. Metrics can also be selected as percentages of the total—so a hot function might take 20 percent of the total user time.
Some metrics can be presented as raw counts, and there is a way for the tool to convert this into a time value. A good example of this is cycle counts. A processor might be running at 3.6 GHz, meaning 3.6 billion cycles per second. Profiling on cycles would count the number of cycles where a thread was on-CPU. The raw count for cycles might be 7 billion, but the tool knows that there are 3.6 billion cycles per second, so it can convert this into about two seconds.
There is one further classification of metrics into inclusive time and exclusive time. This classification can be rather confusing, but it is important to understand it. Exclusive time is time spent in a function. So suppose I have a program that runs for five minutes, and it spends all its time calling
sin(). The entire runtime would be spent in
sin(), so the exclusive time for
sin() would be 300 seconds.
Inclusive time is time spent both in a routine and any routines that it calls. So going back to the example, if
sin() is called directly from
main() would have nearly zero exclusive time, but it would have 300 seconds of inclusive time. The inclusive time reflects the fact that
main() spent all of its time calling
In most instances, a user will be interested in exclusive time, but there are some situations where it is useful to explore inclusive time.
Previewing the Selected Metrics
The third section of the Overview screen shows the selected metrics together with their values. An example of this is shown in Figure 5.
Figure 5. Preview of selected metrics
The preview is updated whenever metrics are selected or deselected, and it allows a user to quickly identify whether a metric is sufficiently large for it to be worth further investigation.
The new Performance Analyzer Overview screen helps a user make rapid decisions about how to analyze an experiment. First, it shows all the available metrics, clearly highlighting those that the tool considers worth at least a cursory examination. Second, it shows values for the selected metrics enabling the user to make a quick determination as to whether to further explore those metrics. An advantage of this approach is that it is much easier to identify where the application is spending significant time, and a user can focus on converting that chunk of time into productive work.
|Revision 1.0, 12/18/2014|