Skip navigation

Today I decided it was past time to reorganize the slides in the performance testing section of the course. I added a number of new diagrams and charts to help organize the materials and then I went back and started to review some of the definitions that I was using. First up was the question; what is performance testing and how is it related to load testing. 

I must confess that I derived a number of my own definitions for a number of terms simply because I wasn’t satisfied with the definitions that were out there. I couldn’t really fault with many of the definitions. It was just that they were either, not useful in that they didn’t suggest a course of action when they should have. Or, the definitions were not in alignment with my experiences. That said, I do want to try to conform even if I find it difficult to do so on occasion. So, I went out the web to try to see if I could normalize my definitions of load and performance testing to what others were using.

While it’s clear that there is a difference between load and performance testing, that difference isn’t readily apparent until you define the two. According to Wikipedia, performance testing is “testing performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system such as scalability, reliability and resource usage.”

Quite a mouthful indeed and we can quibble about the quality of a definition that uses the words it’s trying to define as part of the definition (is that like autophagic?). What I can say is with all the words the definition uses, there are still a few missing that are needed if this is to be used to define a load test. For consistency lets return to Wikipedia to see what it says about load testing. It’s definition for load testing is; “the process of putting demand on a system or device and measuring its response. Load testing is performed to determine a system’s behavior under both normal and anticipated peak load conditions”. The definitions sound very similar but now the missing words have surfaced.  The load testing definition constrains the phrase “particular workload” found in performance testing to “normal and anticipated peak load conditions”, what ever that means.

The differences in the definitions make the relationship between load, stress, endurance and performance testing while subtile, is apparent. The former terms are slightly different specializations of the later. In fact, the difference between them is sometimes so slight that it’s not surprising that there are so many different definitions out there. But the difference is important as each type of performance testing serves a different purpose and requires different treatment. 

I mentioned the idea to use Wordle as an execution profiler while presenting the profiling section of my performance tuning course in Paris last December. The idea was seeded by presentation that Neal Ford did a few years ago in which he used Wordle to expose the vocabulary of a Java application. Neal took a pile of source code and fed it into Wordle. The result was a word cloud that visualized the dominate words in the source code. What I was interested in was exploring how I could use Wordle to visualize an applications dominate activity (i.e., the bottleneck).

 In profiling we setup a mechanism to repeatedly ask,“What are you doing now? What are you doing now, What are you doing now?” . Profilers aggregate and count the answers to this question. It is the treatment of the answers that allows us to identify hotspots (dominate activities) in our application. Wordle aggregates and counts words. It uses those counts to build a word cloud. So it seems as though it should work as long as we feed is the right set of words. In this case, the words should come from the answers to the repeated question,“What are you doing now?”. To answer that question, we can use thread dumps.

 At this point you should be thinking, “Thread dumps to profile? Are you daft? The impact on performance will be horrendous!”. I completely agree, done haphazardly, continuously taking thread dumps from a running application would inflict a huge performance penalty on that application. But what if we didn’t need all that data to figure out what was going on? Instead, what if we switched to “slow profiling” where we took a thread dump every 5 minutes minimizing it’s impact on performance? I recently used this idea to create a Java agent called ConditionalThreadDumperwhich I then launched into a production system. The impact on performance was horrendously unnoticeable.

 The ConditionalThreadDumperpolled the system on a regular basis looking to see if it was in a particular (bad) condition. For example, we might look at CPU utilization and another performance metric to see if they have exceeded some level of utilization. If so, it would turn on a mechanism to created a thread dump every x minutes. If the system worked it’s way out of the condition, the thread dumper would be turned off. Launching in this agent into production resulted in several thousand data points over a period of a couple of days. Not a lot when you consider an execution profiler can collect that many samples in a few minutes. However, this was running in production and it was only running when the system was in trouble. So while there wasn’t a lot of data, it was significantly more relevant to the problem at hand, as in aggregate, these thread dumps tell a story of the dominate activity was in the badly behaving application.

 Blasting thousands of raw thread dumps into Wordle simply won’t work so I hacked together a scanner that harvested stack traces of interest from the thread dumps. What I was looking for were application threads (in this case Tomcat threads, which are easily identifiable). After collecting and cleaning all of the stack trace entries for the Tomcat threads I ended up with more than 350,000 lines of data. The user interface for Wordle made inserting all of this data impractical so I modified the code to aggregate and count identical entries. This reduced that dataset to just under 90,000 lines.

I fed the top 100 or so items into Wordle and it came back with a clear but useless answer. The dominate activity was java.lang.Object.wait. Since I already knew that the thread pool was oversized it wasn’t a surprise to see that the dominate activity were threads blocked on a lock (see Figure 1).

  • java.lang.Object.wait:91295
  • java.lang.Thread.run:46578
  • org.apache.tomcat.util.net.AprEndpoint$Worker.run:45174
  • org.apache.tomcat.util.net.AprEndpoint$Worker.await:44589
  • waiting on (a org.apache.tomcat.util.net.AprEndpoint$Worker):44586
  • javax.servlet.http.HttpServlet.service:1137

 

 Figure 1. Dominating Activities

 

I filtred java.lang.Object.wait and five other related and uninteresting entries from the data and then fed the top 100 or so back into Wordle. Not surprisingly, the output from Wordle demonstrated what we knew to be true from initial work with "real" performance tooling. The dominate activity was interactions with the JDO product, Kodo (kodo.kernel.BrokerImpl.find) r. Not far behind, was a proxy using reflection (java.lang.reflect.Method.invoke). This is quite visible in Figure 2.is

 

 

 

 

word cloud from thread dumps

 

Figure 2. Wordle output showing dominate activities

 

So there you have it, how to use Wordle as a wall clock time, method exclusive execution profiler.

 

If you are interested in learning more about performance tuning using more traditional tooling and techniques, please join me in one of my open enrollment performance tuning seminars. I have a few scheduled for both Europe and North America. I'm also very excited to announce that I'm working with Exceed in Australia. I'll be there for two weeks. My first will be presenint the workshop at an inhouse offering. The second week, Feb 6th is for my first full open enrollment performance tuning workshop down under. This will also be a rare opportunity for me to meetup with people in Oz.