One question I encounter frequently about the JMX Remote API is how to reduce the time taken to notice that a remote machine is dead when making a connection to it. The default timeout is typically a couple of minutes! Here's one way to do it.

Probably the cleanest technique for connection timeouts in general is to set a connection timeout on the socket. The idea is that instead of using...

 Socket s = new Socket(host, port); use...

 SocketAddress addr = new InetSocketAddress(host, port); Socket s = new Socket(); s.connect(addr, timeoutInMilliSeconds); 

The problem is that this is at a rather low level. If you're making connections with the JMX Remote API you usually don't see Socket objects at all. It's still possible to use this technique, but it requires a certain amount of fiddling, and the particular fiddling you need depends on which connector protocol you are using.

A lot of the time, a much simpler and more general technique is applicable. You simply create the connection in another thread, and you wait for that thread to complete. If it doesn't complete before your timeout, you just abandon it. It might still take two minutes to notice that the remote machine is dead, but in the meantime you can continue doing other things.

If you're making a lot of connections to a lot of machines, you might want to think twice about abandoning threads, because you might end up with a lot of them. But in the more typical case where you're just making one connection, this technique may well be for you.

Assuming you're using at least Java SE 5, you'll certainly want to use java.util.concurrent to manage the thread creation and communication. There are a few ways of doing it, but the easiest is probably aTimeUnit.SECONDS);

My first cut at the problem

In my first version of this entry, I proposed a solution with the following outline.

 JMXConnector connectWithTimeout(JMXServiceURL url, long timeout, TimeUnit unit) { ExecutorService executor = Executors.newSingleThreadExecutor(); Future<JMXConnector> future = executor.submit(new Callable<JMXConnector>() { public JMXConnector call() { return JMXConnectorFactory.connect(url); } }); return future.get(timeout, unit); } 

Half an hour after posting, I suddenly realised that this version is incorrect. It reminds of the saying that for every complex problem there is a solution that is simple, obvious, and wrong.

This solution does the right thing when the connection succeeds within the time limit, and also in the case of the problem we are trying to solve, where it takes a very long time to fail. But if the connection succeeds after the time limit, the caller will already have returned, and we'll have made a connection that nobody knows about!

The second attempt

This is the outline of my second attempt, which I believe is correct. There are several refinements we'll need to apply before having a solution that actually works.

 // This is just an outline: the real code appears later JMXConnector connectWithTimeout(JMXServiceURL url, long timeout, TimeUnit unit) { final BlockingQueue<Object> mailbox = new ArrayBlockingQueue<Object>(1); final ExecutorService executor = Executors.newSingleThreadExecutor(); executor.submit(new Runnable() { public void run() { JMXConnector connector = JMXConnectorFactory.connect(url); if (!mailbox.offer(connector)) connector.close(); } }); Object result = mailbox.poll(timeout, unit); if (result == null) { if (!mailbox.offer("")) result = mailbox.take(); } return (JMXConnector) result; } 

To understand how and why this works, notice that exactly one object always gets posted to the mailbox. There are three cases:

  • If the connection attempt finishes before the timeout, then the connector object will be posted to the mailbox and returned to the caller.
  • If the timeout happens, then the main thread will try to stuff the mailbox with an arbitrary object (here the empty string, but any object would do), so the connection thread will realise it has connected too late and close the newly-made connection.
  • If the timeout happens at exactly the same time as the connection is made, then the main thread may find that the mailbox is already full, in which case it again picks up the connector object and returns it.

Making it work

The code above is just an outline, and leaves out some necessary details. We need to refine it in several ways to make it work.

The first refinement we'll need is exception handling. The result of the connection attempt could be an exception instead of a JMXConnector. This doesn't change the reasoning above, but it does complicate the code.

The main thread calls BlockingQueue.poll, which can throw InterruptedException, so we must handle that.

About half of the final version of connectWithTimeout involves footering about with exceptions. It's times like this that I'm inclined to join the checked-exception-haters.

The second refinement is to clean up the connect thread when we're finished with it. The outline code doesn't call shutdown() on the ExecutorService, so every time connectWithTimeout is called, a new single-thread executor is created, and therefore a new thread. If you're lucky, the garbage-collector will pick up your executors and their threads at some stage, but you don't want to depend on luck.

A more subtle point about threads is that the outline code will create non-daemon threads. Your application will not exit when the main thread exits if there are any non-daemon threads. So as written, if you have a thread stuck in a connection attempt and your application is otherwise finished, it will stay around until the connection attempt finally times out. That's pretty much exactly the sort of thing we're trying to avoid. So we'll need to arrange to create a daemon thread instead.

All right, so here's the real code.

 public static JMXConnector connectWithTimeout( final JMXServiceURL url, long timeout, TimeUnit unit) throws IOException { final BlockingQueue<Object> mailbox = new ArrayBlockingQueue<Object>(1); ExecutorService executor = Executors.newSingleThreadExecutor(daemonThreadFactory); executor.submit(new Runnable() { public void run() { try { JMXConnector connector = JMXConnectorFactory.connect(url); if (!mailbox.offer(connector)) connector.close(); } catch (Throwable t) { mailbox.offer(t); } } }); Object result; try { result = mailbox.poll(timeout, unit); if (result == null) { if (!mailbox.offer("")) result = mailbox.take(); } } catch (InterruptedException e) { throw initCause(new InterruptedIOException(e.getMessage()), e); } finally { executor.shutdown(); } if (result == null) throw new SocketTimeoutException("Connect timed out: " + url); if (result instanceof JMXConnector) return (JMXConnector) result; try { throw (Throwable) result; } catch (IOException e) { throw e; } catch (RuntimeException e) { throw e; } catch (Error e) { throw e; } catch (Throwable e) { // In principle this can't happen but we wrap it anyway throw new IOException(e.toString(), e); } } private static <T extends Throwable> T initCause(T wrapper, Throwable wrapped) { wrapper.initCause(wrapped); return wrapper; } private static class DaemonThreadFactory implements ThreadFactory { public Thread newThread(Runnable r) { Thread t = Executors.defaultThreadFactory().newThread(r); t.setDaemon(true); return t; } } private static final ThreadFactory daemonThreadFactory = new DaemonThreadFactory(); 

The initCause method is only used once but it's handy to have around for those troublesome exceptions that don't have a Throwable cause parameter.

I think it would be awfully nice if java.util.concurrent supplied DaemonThreadFactory rather than everyone having to invent it all the time.

Shouldn't this be simpler?

I admit I'm a bit uncomfortable with the code here. I'd be happier if I didn't need to reason about it in order to convince myself that it's correct. But I don't see any simpler way of using the java.util.concurrent API to achieve the same effect. Uses of cancel or interrupt tend to lead to race conditions, where the task can be cancelled after it has already delivered its result, and again we can get a JMXConnector leak; or we might close a JMXConnector that the main thread is about to return. I'd be interested in suggestions for simplification.

Conclusion of the foregoing

This is a useful technique in many cases, subject to the caution above. It's not limited to the JMX Remote API, either; you might use it when accessing a remote web service or EJB or whatever, without having to figure out how to get hold of the underlying Socket so you can set its timeout.

My thanks to Sébastien Martin for the discussion that led to this entry.

[Tags: jmx timeout concurrent.]