1 2 Previous Next

kcpeppe

16 posts
kcpeppe-JavaNet

RMI Unpluged Blog

Posted by kcpeppe-JavaNet Jan 27, 2015
I've just finished tuning a client's application where one of the items on the table was to find the source of calls to System.gc(). Using Censum made easy work of understanding the source of the calls. The team I was working with not only missed that these calls to System.gc() where creating havoc with their end users experience, they didn't even realize that something, some where was messing things up! It's not to say that developers are completely oblivious to the issue. Developers have gotten better about not placing calls to System.gc() in their code, but they often still don't see the call being made in their GC logs. One of the confusing causes of unwanted calls to System.gc() comes from Java (typically Java EE) applications that have remote objects, and therefore use RMI. In the early days these calls came into applications at the rate of 1 a minute and as you can imagine, it caused a lot of consternation for those analyzing the performance of Java EE applications making use of Session or Entity beans. After much discussion, the call rate was reduced to the current rate of once an hour. Better but it can still be disruptive as much as it can be useful. Lets expose a little bit of how the JVM works by digging into how RMI plays with the Garbage Collectors. The Remote Procedure Call (RPC) protocol (originally written in C) allowed one process to call an exposed function in another process. The function in the other process was expected to manage all of it  
This a fun story rather than a deep technical discussion of some performance issue, measure or tool. My memory was jogged by a story (http://www.folklore.org/StoryView.py?story=Negative_2000_Lines_Of_Code.txt) in which Bill Atkinson reported producing -2000 lines of code on his weekly productivity report and by Kevlin Henney retweeting one of my one liners, features are the asset, code is a liability. Needless to say management wasn't impressed by having Bill report that he produced -2000 lines of code. Coders main purpose is to code and if they are producing code you should be able to count how many lines of code they've produced and consequently measure productivity. The key point is that in the process of eliminating code Bill was making the software better and faster. His story reminded me of a situation that I was in 1996. I sat down to have lunch with my supervisor and he hit me with the question, how would you measure reuse. I jokingly answered with 3 questions; what number do you need, who  
Every once in a while I run into someone that has a JVM that is running back to back collections and yet the heap is still almost full after each attempt! When they discover that their problem is related to the JVM not having enough memory they often ask the question, why didn't the JVM throw an OutOfMemoryError? After all my application is not making any forward progress and the reason is Java heap is almost completely exhausted. First things first, if you're lucky, you may never realize that your JVM is running a ponzi scheme on you. It  
Just recently I was engaged to assist with an application that wasn  
kcpeppe-JavaNet

Fun with Lambda's Blog

Posted by kcpeppe-JavaNet Nov 10, 2013
I just finished delivering a talk at Oredev 2013 on better concurrency in Java 8. With Lambda  
A couple of years ago I brought down www.lufthansa.com. Ok ok... maybe I didn  
More than a few days ago a friend pinged me complaining that recent Eclipse release was quite sluggish. Since she had taken my performance tuning seminar she knew exactly how to get and read a GC log but as we all do, she was looking for a second opinion. After looking at the log for a minute it because quite apparent that the default configuration left the IDE starved for memory. After a bit more analysis we came to the conclusion that it is very likely that the 4.2 version of Eclipse is suffering from a memory leak. I thought I  

While I don't normally blog about the exiting world of JVM command line options, this new 7.0 option, which allows you to rotate to a new GC log file, was just so exciting... err... not so widely published that I decided to share. Of course these options only apply when you've used the -Xloggc: option. The settings are; UseGCLogFileRotation, default value: false NumberOfGCLogsFiles, default value: 0 means do not rotate GCLogFileSize, default value: 0 bytes means do not rotate

Today I decided it was past time to reorganize the slides in the performance testing section of the course. I added a number of new diagrams and charts to help organize the materials and then I went back and started to review some of the definitions that I was using. First up was the question; what is performance testing and how is it related to load testing. 

I must confess that I derived a number of my own definitions for a number of terms simply because I wasn’t satisfied with the definitions that were out there. I couldn’t really fault with many of the definitions. It was just that they were either, not useful in that they didn’t suggest a course of action when they should have. Or, the definitions were not in alignment with my experiences. That said, I do want to try to conform even if I find it difficult to do so on occasion. So, I went out the web to try to see if I could normalize my definitions of load and performance testing to what others were using.

While it’s clear that there is a difference between load and performance testing, that difference isn’t readily apparent until you define the two. According to Wikipedia, performance testing is “testing performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system such as scalability, reliability and resource usage.”

Quite a mouthful indeed and we can quibble about the quality of a definition that uses the words it’s trying to define as part of the definition (is that like autophagic?). What I can say is with all the words the definition uses, there are still a few missing that are needed if this is to be used to define a load test. For consistency lets return to Wikipedia to see what it says about load testing. It’s definition for load testing is; “the process of putting demand on a system or device and measuring its response. Load testing is performed to determine a system’s behavior under both normal and anticipated peak load conditions”. The definitions sound very similar but now the missing words have surfaced.  The load testing definition constrains the phrase “particular workload” found in performance testing to “normal and anticipated peak load conditions”, what ever that means.

The differences in the definitions make the relationship between load, stress, endurance and performance testing while subtile, is apparent. The former terms are slightly different specializations of the later. In fact, the difference between them is sometimes so slight that it’s not surprising that there are so many different definitions out there. But the difference is important as each type of performance testing serves a different purpose and requires different treatment. 

I mentioned the idea to use Wordle as an execution profiler while presenting the profiling section of my performance tuning course in Paris last December. The idea was seeded by presentation that Neal Ford did a few years ago in which he used Wordle to expose the vocabulary of a Java application. Neal took a pile of source code and fed it into Wordle. The result was a word cloud that visualized the dominate words in the source code. What I was interested in was exploring how I could use Wordle to visualize an applications dominate activity (i.e., the bottleneck).

 In profiling we setup a mechanism to repeatedly ask,“What are you doing now? What are you doing now, What are you doing now?” . Profilers aggregate and count the answers to this question. It is the treatment of the answers that allows us to identify hotspots (dominate activities) in our application. Wordle aggregates and counts words. It uses those counts to build a word cloud. So it seems as though it should work as long as we feed is the right set of words. In this case, the words should come from the answers to the repeated question,“What are you doing now?”. To answer that question, we can use thread dumps.

 At this point you should be thinking, “Thread dumps to profile? Are you daft? The impact on performance will be horrendous!”. I completely agree, done haphazardly, continuously taking thread dumps from a running application would inflict a huge performance penalty on that application. But what if we didn’t need all that data to figure out what was going on? Instead, what if we switched to “slow profiling” where we took a thread dump every 5 minutes minimizing it’s impact on performance? I recently used this idea to create a Java agent called ConditionalThreadDumperwhich I then launched into a production system. The impact on performance was horrendously unnoticeable.

 The ConditionalThreadDumperpolled the system on a regular basis looking to see if it was in a particular (bad) condition. For example, we might look at CPU utilization and another performance metric to see if they have exceeded some level of utilization. If so, it would turn on a mechanism to created a thread dump every x minutes. If the system worked it’s way out of the condition, the thread dumper would be turned off. Launching in this agent into production resulted in several thousand data points over a period of a couple of days. Not a lot when you consider an execution profiler can collect that many samples in a few minutes. However, this was running in production and it was only running when the system was in trouble. So while there wasn’t a lot of data, it was significantly more relevant to the problem at hand, as in aggregate, these thread dumps tell a story of the dominate activity was in the badly behaving application.

 Blasting thousands of raw thread dumps into Wordle simply won’t work so I hacked together a scanner that harvested stack traces of interest from the thread dumps. What I was looking for were application threads (in this case Tomcat threads, which are easily identifiable). After collecting and cleaning all of the stack trace entries for the Tomcat threads I ended up with more than 350,000 lines of data. The user interface for Wordle made inserting all of this data impractical so I modified the code to aggregate and count identical entries. This reduced that dataset to just under 90,000 lines.

I fed the top 100 or so items into Wordle and it came back with a clear but useless answer. The dominate activity was java.lang.Object.wait. Since I already knew that the thread pool was oversized it wasn’t a surprise to see that the dominate activity were threads blocked on a lock (see Figure 1).

  • java.lang.Object.wait:91295
  • java.lang.Thread.run:46578
  • org.apache.tomcat.util.net.AprEndpoint$Worker.run:45174
  • org.apache.tomcat.util.net.AprEndpoint$Worker.await:44589
  • waiting on (a org.apache.tomcat.util.net.AprEndpoint$Worker):44586
  • javax.servlet.http.HttpServlet.service:1137

 

 Figure 1. Dominating Activities

 

I filtred java.lang.Object.wait and five other related and uninteresting entries from the data and then fed the top 100 or so back into Wordle. Not surprisingly, the output from Wordle demonstrated what we knew to be true from initial work with "real" performance tooling. The dominate activity was interactions with the JDO product, Kodo (kodo.kernel.BrokerImpl.find) r. Not far behind, was a proxy using reflection (java.lang.reflect.Method.invoke). This is quite visible in Figure 2.is

 

 

 

 

word cloud from thread dumps

 

Figure 2. Wordle output showing dominate activities

 

So there you have it, how to use Wordle as a wall clock time, method exclusive execution profiler.

 

If you are interested in learning more about performance tuning using more traditional tooling and techniques, please join me in one of my open enrollment performance tuning seminars. I have a few scheduled for both Europe and North America. I'm also very excited to announce that I'm working with Exceed in Australia. I'll be there for two weeks. My first will be presenint the workshop at an inhouse offering. The second week, Feb 6th is for my first full open enrollment performance tuning workshop down under. This will also be a rare opportunity for me to meetup with people in Oz.

One of the old bits of tuning advice given when Java memory management was not as tall as it it today was to set max heap to min heap. After all, we don't really want the JVM messing around with memory when it should really be getting on with things. Fast forward a few years and the adaptive memory management picture has matured considerably. So much so that setting -Xmx == to -Xms would now be considered bad practice. Unfortunately, those of us that are rallying against the tide created by those who practice "tuning by folklore", have had our voices drowned out by the countless admin manuals that say, setup, step 12, in the console window set max heap to 2048m and min heap to 2048m, the overwhelming noise in the blog-o-sphere parroting this advice. As I've written before in this blog, setting max heap to min heap is a way to turn off the JVMs ability to adapt to changing conditions in your Java application's run time. But, maybe no more.

Those working on the long awaited G1 have just published a change request stating;

"The idea is to give G1 some room to allow it to use its heuristics for calculating the young gen size even when the heap size is fixed.

CR:
7113021 G1: automatically enable young gen size auto-tuning when -Xms==-Xmx"

This is what I like about the JVM.. if we can't change people's ideas about how to tune the JVM, we'll just change the JVM to match people's ideas of.. how to tune the JVM.

As far as I know, this will apply only the the G1 collector. So the best advice currently available for those needing/wanting adaptive GC is to use a Parallel Collector where -Xms is set to the maximum amount of heap you'd like to use and -Xmx is set higher to provide a wee bit of headroom just incase business picks up.

Just wrapped up my last performance tuning course for this year and for the second time running, some members of my Parisian group had the opportunity to run the exercises on virtualized hardware. Granted, the underlying hardware wasn't quite top shelf but then this hardware had been un-virtualized in past offerings. And, the point is to learn how to diagnose Java performance bottlenecks and in that context, OS/Hardware isn't so important.

I say that some of the group used virtualized hardware because the rest of the group (wisely) chose to use their own laptops. The mix of hardware that has been used for the exercises contains all of the usual suspects, MacBook Pro, Dell, Sony, HP, Lenovo, Siemans and a bunch of other brands. Operating systems included XP, Windows 7, OSX, and Linux. I've also run the benchmarks on Solaris (just to round out the list). At the end of the exercise the tradition is to share results.

The first benchmark is single threaded and as it turns out it surprisingly stresses the thread scheduler while trying to use the RTC. The results of the of the bench are very predicatble in that XP completes the 1 second unit of work in 1.8-1.9 seconds, OSX and Linux run in about 1.1-1.2 seconds. Out of the box Solaris runs in about 10.100 seconds. Set the clock resolution kernel parameter to 1ms and the unit of work runs in about 1.150ms. Windows 7 runs in just over 1.000-1.010 seconds. Note that results are entirely dependent upon the OS. Reboot the same hardware in a different OS and the benchmark result change accordingly.

The second benchmark tries to measure the multi-threaded throughput of various Map implementations. As you can imagine, the benchmark is sensitive to the number of cores along with the choice of OS. Again, XP scores poorly where as Linux and Mac OSX run neck and neck with Windows 7 leading the way for the synchronized collections. Linux, OSX and Windows 7 run neck in neck when benching the java.util.concurrent or non-blocking (Cliff Click's NonBlockingHashMap) maps.

The fun comes when any of these benches are run on any of these operating systems and the change in results is astounding. The 1.200 second run time balloons to 2.3-2.8 seconds. Throughput on the other benchmark drops from 10s of millions per second to a couple of million for the synchronized collections. The results are not as devistating for the non-blocking implementation.

My guess is that virtualization in the first bench, is adding additional stress to the already overworked stressed thread scheduler. I've not thought through why the second bench is so adversely affected. Bottom line, running these benches in a virtualized environment doesn't paint a very pretty performance picture. After running the benchmarks, the group started trading their virtualization war stories. My response, you can't virtualize yourself into more hardware but you can certainly virtualize yourself out of it. That said, I still like virtualization.

 In 1996, a group that I was working with devised a development process which we called Defect Driven Design, otherwise known as D3. We were a bit disappointed by not surprised that D3never really caught on but just recently I saw a glimmer of hope for a revival. It was quite a surprise when about a year ago when Cameron Prudy started telling me about this great methodology called Defect Driven Design.  Geertjan Wielenga happened to be there and he immediately wanted to interview me for. So very early in the morning in a bar in Stockholm he filmed me while I explained D3. It was fantastic to believe that almost 15 years after it’s inception, it might happen. D3‘s time had come to be the industries true silver bullet.

 

Like most things in life, things that don’t make sense to developers often do make great sense to managers and the bean counters. This is clearly the case with D3. Let me explain by starting with a bean counters point of view before addressing why managers prefer this style of development.

 

D3makes the most sense when you consider total cost of ownership though inception to product end of life. Of all the costs, maintenance is by far the largest. Unfortunately it is the cost that cannot be avoided. However we can avoid the costs of every other phase by simply eliminating them from the process. Costs of development? 0! Cost of requirement phase? 0! Cost of all the various forms of testing? 0! This is why bean counters love D3. The only phase that costs anything is maintenance but by eliminating the other phases, we have reduced total cost of ownership.

 

From the managers perspective, D3makes most sense when you consider the painful task of scheduling. Lets face, trying to estimate when developers will finish any software project is like trying to predict Saturday’s evening lottery numbers. It’s like a stupidity tax. The dumber you are, the more you think you can do it, the bigger the tax you pay. So smart managers immediately get D3because they recognize by getting rid of all of the phases, they’re going to be right far more often. How long does it take to collect requirements? No time at all! How about time to develop and deploy the application? Zero! Application is deployed immediately after the users request it. Testing? That rolls right into the maintenance phase. Users are happy, managers hit schedules.

 

So how is  D3able to provide all these benefits? This is best explained in terms of an example. Lets suppose a user comes up to your manager and says, “Scotty, we really need this mission critical application as soon as possible”. Instead of your manager saying, “our development team cannot take it, we can’t do anymore”, he can answer done!. If the user asks when it will be delivered the answer is, it’s been delivered!

 

Requirements collected in no time, delivery immediately taken care off . No need for long meetings with bean counters as you try to get funding to develop this application. That all melts away when you use D3. That said, it’s not quite the panacea that it’s been made out to be. The user will go to his desktop and look about for the application and not finding an icon will immediately cause him to report something like, “where’s the icon to start the application? I can’t find the icon.” This is the projects first defect, no Icon to start the application. No problem, generate that icon, fix the defect and report back that all is well.

Second step, users clicks on icon and nothing happens. No problem, this is the second defect which will be reported and the maintenance programmers will work to get fixed. Third defect, fourth and so on.

 

By now you should see how D3can solve the problems face by managers and bean counters. But, what about developers? I think we can all agree that the most tedious thing a developer is work on a brand new project. Having to go through design phases, making choice about which technology to use, how to get everything up and running. It can give one a headache just thinking about all of the technical challenges that come with starting a new project. However, with maintenance programming, all you have to do if look at the list of defects, pick one and fix it. Gone is all the tedium of trying to start a new project. No more headaches, long hours and loss of sleep trying to meet insane schedules set by your manager given impossible budgets by his bean counters. No more arguments about how we’ve got to rewrite this application from the ground up because it was poorly designed. Wit no design phase, there is no possibility of that phase producing a bad design. No messy code from the development phase to complain about. It’s really a win all round.

 Over the last few weeks I've been working Java regular expressions into a couple of applications. After a couple of rounds with trying to sort out regular expression syntax I hacked together this handy little GUI. If anyone makes any improvements do let me know and I'll be happy to make use of them.

 
package com.kodewerk.util;
 
import javax.swing.*;
import java.awt.*;
import java.awt.event.ActionEvent;
import java.awt.event.ActionListener;
import java.awt.event.WindowAdapter;
import java.awt.event.WindowEvent;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
/**
 * @author kirk
 * @since 5:54:30 AM, Sep 10, 2011
 */
public class TestPattern extends JFrame {
 
    private final JTextArea testString = new JTextArea();
    private final JTextArea testPattern = new JTextArea();
    private final JTextArea output = new JTextArea();
    private JButton findButton;
    private JButton groupButton;
    private JButton matchButton;
 
    public TestPattern() {
        super.setTitle("Java Regular Expression Evaluator");
        super.setLayout(new BorderLayout());
        JPanel listPane = new JPanel();
        listPane.setLayout(new BoxLayout(listPane, BoxLayout.PAGE_AXIS));
        //listPane.setBorder(BorderFactory.createEmptyBorder(30, 70, 10, 70));
        listPane.add(buildTestStringPanel());
        listPane.add(buildExpressionStringPanel());
        //listPane.add(buildSettingsPanel());
        listPane.add(buildResultsPanel());
        listPane.add(buildButtonsPanel());
        super.add(listPane, BorderLayout.CENTER);
    }
 
    private JPanel buildButtonsPanel() {
        JPanel buttons = new JPanel();
        buttons.setOpaque(false);
        findButton = new JButton("find");
        findButton.setBackground(Color.GREEN);
        buttons.add(findButton);
 
        groupButton = new JButton("group");
        groupButton.setBackground(Color.GREEN);
        buttons.add(groupButton);
 
        matchButton = new JButton("match");
        matchButton.setBackground(Color.RED);
        buttons.add(matchButton);
 
        JButton exitButton = new JButton("Exit");
        buttons.add(exitButton);
 
        findButton.addActionListener(new ActionListener() {
            public void actionPerformed(ActionEvent e) {
                executeFind();
            }
        });
 
        groupButton.addActionListener(new ActionListener() {
            public void actionPerformed(ActionEvent e) {
                executeGroup();
            }
        });
 
        matchButton.addActionListener(new ActionListener() {
            public void actionPerformed(ActionEvent e) {
                executeMatch();
            }
        });
 
        exitButton.addActionListener(new ActionListener() {
            public void actionPerformed(ActionEvent e) {
                System.exit(0);
            }
        });
 
        return buttons;
    }
 
    private JPanel buildResultsPanel() {
        JPanel panel = new JPanel();
        panel.setLayout(new BoxLayout(panel, BoxLayout.Y_AXIS));
        panel.add(new JLabel("Regular Expression Evaluation Result"));
        output.setRows(10);
        output.setEditable(false);
        panel.add(new JScrollPane(output));
        return panel;
    }
 
    private JPanel buildTestStringPanel() {
        JPanel panel = new JPanel();
        panel.setLayout(new BoxLayout(panel, BoxLayout.Y_AXIS));
        panel.add(new JLabel("Test String"));
        testString.setRows(1);
        testString.setEditable(true);
        panel.add(new JScrollPane(testString));
        return panel;
    }
 
    private JPanel buildExpressionStringPanel() {
        JPanel panel = new JPanel();
        panel.setLayout(new BoxLayout(panel, BoxLayout.Y_AXIS));
        panel.add(new JLabel("Regular Expression"));
        testPattern.setRows(1);
        testPattern.setEditable(true);
        panel.add(new JScrollPane(testPattern));
        return panel;
    }
 
    private JPanel buildSettingsPanel() {
        JLabel label;
        JPanel settingPanel = new JPanel(new GridLayout(0, 2, 10, 10));
 
        settingPanel.setBorder(BorderFactory.createTitledBorder("Settings:"));
        settingPanel.setOpaque(false);
 
        label = new JLabel("Test String:", JLabel.RIGHT);
        label.setToolTipText("String to match pattern against");
        settingPanel.add(label);
        settingPanel.add(testString);
 
        label = new JLabel("Regular Expression:", JLabel.RIGHT);
        label.setToolTipText("regular expression to match against test string");
        settingPanel.add(label);
        settingPanel.add(testPattern);
 
        return settingPanel;
    }
 
    public void executeFind() {
        String pattern = testPattern.getText();
        String targetString = testString.getText();
        try {
            if ( Pattern.compile( pattern).matcher( targetString).find())
                output.setText( "pattern matched");
            else
                output.setText( "pattern failed to match");
        } catch(Throwable t) {
            StringBuilder text = new StringBuilder(t.getClass().getName());
            text.append('\n').append( t.getMessage());
            output.setText( text.toString());
        }
    }
 
    public void executeGroup() {
        String pattern = testPattern.getText();
        String targetString = testString.getText();
        try {
            Pattern p = Pattern.compile( pattern);
            Matcher m = p.matcher( targetString);
            String group = "";
            if ( m.find()) {
                int groups = m.groupCount();
                StringBuffer results = new StringBuffer();
                for ( int i = 0; i <= groups; i++) {
                    results.append(i + ": `" + m.group(i) + "`\n" );
                }
                output.setText(results.toString());
            }
        } catch(Throwable t) {
            StringBuilder text = new StringBuilder(t.getClass().getName());
            text.append('\n').append( t.getMessage());
            output.setText( text.toString());
        }
    }
 
    public void executeMatch() {
        String pattern = testPattern.getText();
        String targetString = testString.getText();
        try {
            if ( Pattern.compile( pattern).matcher( targetString).matches())
                output.setText( "pattern matched");
            else
                output.setText( "pattern failed to match");
        } catch(Throwable t) {
            StringBuilder text = new StringBuilder(t.getClass().getName());
            text.append('\n').append( t.getMessage());
            output.setText( text.toString());
        }
    }
 
    public static void main(String[] args) {
        final TestPattern me = new TestPattern();
        me.setDefaultCloseOperation(JFrame.DO_NOTHING_ON_CLOSE);
        me.addWindowListener( new WindowAdapter() {
            public void windowClosing( WindowEvent evt) {
                int response = JOptionPane.showConfirmDialog( me, "Exit?\n",
                  "Exit?", JOptionPane.YES_NO_OPTION);
                if ( JOptionPane.YES_OPTION == response)
                    System.exit(0);
            }
        });
        me.pack();
        me.setVisible(true);
    }
}
 
On an upbeat note, Ecetra and I have agreed to a terms that will have them hosting my performance tuning course in Australia at the beginning of Feb (2012). Really looking forward to spending time in Sydney.

It’s after JavaONE and as promised, here is the answer to the performance puzzler with a stack trace. But first a word about JavaONE. The physical layout was the same as last year and as a result, JavaONE suffered from some of the same problems as last year. This year Oracle took a lot of advice from an advisory committee and the result was the conference made very good use of the poor physical layout it is unfortunately stuck with. That said, the conference had a lot more energy than in past years. Oracle’s delivery of Java 7 along with the release of JavaFX seems to have revived interest in Java. Better yet, Java on the iPad, Twitter joining OpenJDK, loads of good talks from IBM... all in all a very positive experience. Now to fold Google and Apache back into the conference and all will be good. I had fun presenting my GC log talk and it was fun to hear people shouting out the answers to the puzzlers. Instead of a slide drop, I’m thinking of releasing them here as more puzzlers. Just for fun.

Back to the puzzle at hand. I received quite a few emails with suggestions, guesses and in a couple of cases, the correct hypothesis and the required next step to confirm the hypothesis. Lets work through the process to see how to arrive at the hypothesis. Lets start by looking at the information provided.

The application was running in OS X Lion burning 135% CPU when it should have been idle. The application had been previously running normally. The first thing to do with the thread dump is filter out all the threads that could not possibly be burning CPU. The first thread a top of thread  dump is an “Attach Listener” and it’s “waiting on condition”. Threads “waiting on condition” are parked and cannot possibly be burning CPU. Based on this information we can filter it from our list of “threads of interest”. Digging deeper we will find that there are a number of threads “waiting on condition”. Lets get rid of those. In addition to thread parked for that reason, you will find threads parked in a TIMED_WAITING (on object monitor), sleeping,  and some threads marked runnable but are actually blocked on the native method, socketAccept. Lets filter out all those and see what we’re left with.??? 

"AWT-AppKit" daemon prio=5 tid=7f846325e800 nid=0x7fff70b3e960 runnable [00000000]
    java.lang.Thread.State: RUNNABLE

"Poller SunPKCS11-Darwin" daemon prio=1 tid=7f84638cf800 nid=0x1140a4000 runnable [1140a3000]
    java.lang.Thread.State: RUNNABLE
    at sun.security.pkcs11.wrapper.PKCS11.C_GetSlotInfo(Native Method)
    at sun.security.pkcs11.SunPKCS11.initToken(SunPKCS11.java:767)
    at sun.security.pkcs11.SunPKCS11.access$100(SunPKCS11.java:42)
    at sun.security.pkcs11.SunPKCS11$TokenPoller.run(SunPKCS11.java:700)
    at java.lang.Thread.run(Thread.java:680)

"Low Memory Detector" daemon prio=5 tid=7f8463094000 nid=0x113538000 runnable [00000000]
    java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=9 tid=7f8463092000 nid=0x11322f000 runnable [00000000
    java.lang.Thread.State: RUNNABLE???"VM Thread" prio=9 tid=7f8463806000 nid=0x112c62000 runnable

Gang worker#0 (Parallel GC Threads)" prio=9 tid=7f8463002800 nid=0x10e44a000 runnable

“Gang worker#1 (Parallel GC Threads)" prio=9 tid=7f8463003000 nid=0x10e54d000 runnable

"Gang worker#2 (Parallel GC Threads)" prio=9 tid=7f8463003800 nid=0x10e650000 runnable

"Gang worker#3 (Parallel GC Threads)" prio=9 tid=7f8463004000 nid=0x10e753000 runnable

"Concurrent Mark-Sweep GC Thread" prio=9 tid=7f846307f800 nid=0x11249a000 runnable

"VM Periodic Task Thread" prio=10 tid=7f84630a5800 nid=0x11363b000 waiting on condition

"Exception Catcher Thread" prio=10 tid=7f8463001800 nid=0x10b272000 runnable

 

As is the case with all things performance, you tend to end up collecting lots of information but finding very little of it to be of any use. To further frustrate efforts, it's common that the useful data only provides some direction to the answer and not the answer it’s self. In this case we've been able to widdle down the data to 12 threads, all in the service of the VM. This shorter list is much easier on the eye. Lets see if we can shorten the list even further. First, BlueDragon is a server application so there’s nothing for the AWT-AppKit thread to be doing. Even so, it’s a single thread and as such can’t be responsible for more than 100%. Same can be said for the Poller SunPKCS11-Darwin, Signal Dispatcher, VM Thread, and the Exception Catcher Thread.

When ever I’m offered a set of data that is conflicting or confusing I tend to drop back to first principles and the things I can count on. First, the CPU burn was caused by the JVM. Second, no application or application server threads were responsible. Hence the only possible conclusion is that threads native to the JVM were responsible. While it was possible that the native threads that I ruled out were responsible, each thread by it’s self could not have been responsible. That suggest that at least two different problems in the new deployment environment are responsible for . While not impossible, it's more likely that a single problem is responsible. The only single activity that is threaded in this case is GC. The next obvious step is to turn on GC logging to see what the collector is up to. In the process of setting the GC logging flags, the settings “-Xmx1024m -Xmn1024m” were discovered. Setting max heap size to young gen size caused GC to run continuously. Mystery solved!

Filter Blog

By date: