In my previous blog post, I did some experimentation with simultaneous execution of multiple threads. Since the threads were all doing relatively large chunks of work, the overhead from thread creation and management was almost irrelevant. In this post, I take a look at the overhead that launching a series of threads can have on an application.

The starting point is a modified version of the application I developed for the last post. A thread is called to perform some number-crunching. After the number-crunching is complete, a method is called to return the last result. As I said in my last post, it's a pretty useless application. However, it does a good job of fully utilizing whatever number of processors I'd like to use. And, since there's no real I/O happening (no disk reads/writes, etc.), it works quite well as a means for analyzing what happens when you perform different experiments with threads.

For today's testing, I removed all print statements (once I'd verified that the application was doing what I wanted it to do), so that the processing consists exclusively (or, at least as close as I could get to that) of the computational processing performed by each thread instance, and thread "overhead" (creating, launching, joining).

Here's the main class:

class ThreadOverheadTest {
  public static void main(String args[]) {
    int nThrCalls;
    NewThread thr1;
    double result = 0.0;
    int nWork = 1000000;
    int jWork0;
    int jWork1;
    int jWorkIncr;
    int iThrCall;
    if (args.length < 1) {
       nThrCalls = 1;
    } else {
       nThrCalls = Integer.parseInt(args[0]);
       if (nThrCalls < 1) nThrCalls = 1;
       if (nThrCalls > nWork) nThrCalls = nWork;
    System.out.println("Performing " + nWork + " total units of work using " + 
                       nThrCalls + " thread calls.");

    jWorkIncr = nWork / nThrCalls;
    jWork0 = 1;
    jWork1 = jWorkIncr;
    System.out.println("Each consecutive thread will perform " + jWorkIncr + 
                       " units of work.");
    while (jWork0 <= nWork) {
       thr1 = new NewThread("thr1"); // start thread
       thr1.SetWorkRange(jWork0, jWork1);

       try {
         // wait for other threads to end
       } catch (InterruptedException e) {
         System.out.println("Main thread Interrupted");

       result = thr1.GetLastValue();

       jWork0 += jWorkIncr;
       jWork1 += jWorkIncr;
       if (jWork1 > nWork) jWork1 = nWork;

    System.out.println("Final Result: " + result);

So, we're going to perform 1 Million units of work (nWork). The argument defines how many consecutive threads will be launched to perform all the units of work. The default value is to do all the work using a single thread.

Here are the results when I run this using a single thread on my CentOS 6.2 Linux machine:

$ time java ThreadOverheadTest 1
Performing 1000000 total units of work using 1 thread calls.
Each consecutive thread will perform 1000000 units of work.
Final Result: 14142.13562373095

real    0m14.130s
user    0m14.102s
sys     0m0.025s

Here the computation thread is called once, and told to do all 1,000,000 units of work. This, then, is the baseline timing, the amount of time required to complete the computations basically in the absence of any thread overhead.

In case you're curious, here's the computational thread that performs the work:

import static java.lang.Math.pow;

// Create multiple threads.
class NewThread implements Runnable {
  String name; // name of thread
  Thread t;
  int iVal0;
  int iVal1;
  double lastVal;

  void SetWorkRange(int i0, int i1) {
    iVal0 = i0;
    iVal1 = i1;
    //System.out.println(name + " work range: " + iVal0 + "-" + iVal1);

  double GetLastValue() {
    return lastVal;

  NewThread(String threadname) {
    name = threadname;
    t = new Thread(this, name);
    //System.out.println("New thread: " + t);
    //t.start(); // Start the thread

  // This is the entry point for thread.
  public void run() {
    //System.out.println(name + " starting, working on " + iVal0 + "-" + iVal1);
    try {
      for(int j = iVal0; j <= iVal1; j++) {
        for(int i = 1; i <= 200; i++) {
          double val0 = i;
          double val1 = j;
          double val2 = val0 * val1;
          double val3 = pow(val2, 0.5);
          lastVal = val3;
    } catch (Exception e) {
      System.out.println(name + "error" + e);
    //System.out.println(name + " exiting.");

A "unit of work" is the inner i loop that does the numerical computation. 14.13 seconds were required to perform 1 Million units of work, so each unit of work takes about 0.014 milliseconds to complete on my CentOS system.

Now watch what happens as the number of threads is increased:

Units of Work
Per Thread
Time to Complete

It took a lot of threads running before there's much of a noticeable performance hit. But ultimately, by consecutively creating and running 1 Million threads, each performing a single unit of work, I was able to bog down my application's performance pretty severely.

You can't really say that all of the extra time represents thread overhead. For example, something as simple as flipping the i and j loops in the computational thread produces a somewhat different set of results. But, I think we can fairly safely state that creating and invoking 1,000,000 threads consecutively puts a significant burden on my system.

So why, you may wonder, was I interested in taking the time to create and perform this experiment? Because it provides a baseline for similar experiments I plan to perform using the Java 7 Fork/Join Framework and other JVM concurrency options, eventually including Project Lambda. Weblogs

Since my last blog post, Harold Carr has posted two new blogs:


Our current poll asks Will you use JavaFX for development once it's fully ported to Mac and Linux platforms?. Voting will be open until this Friday, March 2.


Our latest article is Michael Bar-Sinai's PanelMatic 101.

Java News

Here are the stories we've recently featured in our Java news section: