Where's the state? This is a small but useful question when deciding how a problem domain gets carved up into objects: What things have state? What things have values that can change? When and how can they change? Can the changes be observed? Who needs to observe changes in state?

These questions make a good start for figuring out how to carve up a problem domain into objects, if you observe the principlekeep all related state in one place. I would go so far as to say that the majority of the appeal of Model-View-Controller (MVC) architecture is that it encourages you to keep state in one place.

One big reason why statefulness matters is threading. In practice, it is common to see designs where threading was an afterthought. This is to be expected. Concurrency is a not natural way for human beings to think. So commonly a library is designed with no thought about threading; then a problem shows up which multi-threading can solve. Threading ends up being introduced into a codebase that was never designed for it. Retrofitting a threading model on something designed without one is very painful and sometimes impossible.

Some threading models can be simple. For example, any time you are writing a Swing application, and you have a method that can do long running I/O, the first line of that method should be

assert !EventQueue.isDispatchThread();
under all circumstances.

 

But keeping I/O off the event thread is the trivial case. Managing mutations to a model inside an application is a more complicated, less obvious problem.

IMG_2949.png
 

Any time you find yourself writing a setter (or more generally, mutator) method, it is worth asking yourself Who does this piece of state really belong to? Am I putting state in the right place? Blindly following the beans pattern (add a getter/setter to whatever type you happen to be editing) can result in state being distributed all over objects in an application. The result will be harder to maintain, and if it has any lifespan, less and less maintainable over time. Like entropy, or the no broken windows theory, the more state is distributed all over the place, the more that will masquerade as design and make it okay for a programmer to further diffuse state throughout objects in the application. Human beings can only hold so many things in their minds at a time. The more state is diffused, the more things someone must pay attention to simultaneously to solve a problem in that code.

Another question above was Who needs to observe changes in state? There are some corollaries: Where should those changes be published? and Does the thing observing the change need a reference to the thing that changed?

Let's return to the concept of having a model for a minute. A model gets all related state in one place quite well. It does nothing for threading issues — these still have to be handled somehow within the model, or appropriate use of the model in a multi-threaded environment must be documented. There is a thing you can do at this juncture that can make life much easier — and it is all about how the code that uses the model is designed. If you're creating an API, there is probably more to it than just a data model. There is also how the model is accessed.

Essentially what you can do is this: If you design the rest of your API so that the only way to get a reference to the model is to be passed one (i.e. “don't call us, we'll call you”), then you know the context (particularly, on what thread, while holding which locks) in which your code is going to be called.

I'll pick a class at random as a case in point: SwingWorker. While useful, it does represent poor design in a couple of ways:

  • It mixes the concerns of task state (progress, title) and performing a task and the outcome of that task in a single class.
  • It uses the beans pattern in something that is obviously not a JavaBean.


saltlake.png
 

I mentioned mixing concerns: Avoiding mixing concerns is not just an academic issue of idealized architectural purity. APIs are things human beings use. If you avoid mixing concerns in a public class, you can name things more clearly, and the result will be easier to understand and need less documentation (which most people won't read unless they have to anyway).

In a perfect world, a thing representing a task should probably be stateless (subclasses can have state if that makes sense for them - but no setters or getters or mutable state in the interface/superclass). If the code that does the work is passed a reference to a model object it can set task state on, that untangles the task-state concern from it.

It also removes a bit of psychological weirdness: Someone who is writing a SwingWorker subclass being is required to think to themselves “Now I have to tell myself what my state is.” I suspect many developers anthropomorphize classes they are writing — they think of the class they are editing, or the program counter, in the first person, as I, myself. I don't have empirical evidence for this, but I know I do it, and I have heard many other developers who were discussing code do it - if there's someone out there who emphatically does not do anything of this sort, I'd be interested to know about your thinking style when coding - please speak up.

Detangling the other concern is as simple as returning ajava.util.concurrent.Future from the method that enqueues a task. The result would look something like this:

public interface Task<T> {
    public T runInBackground (TaskStatus status);
    public void runInForeground (TaskStatus status, T backgroundResult);
}

public interface TaskStatus {
    public void setTitle (String title);
    public void setProgress (String msg, long progress, long min, long max);
    public void setProgress (String msg); //indeterminate mode
    public void done();
    public void failed (Exception e);
}
       

and then perhaps you would have a concrete TaskRunner class that runs tasks

public final class TaskRunner {
    public <T> java.util.concurrent.Future<T> launch (Task<T> task, TaskStatus statusUI) { ... }
}
       

or, perhaps preferably, Task is an abstract class with a static launch(Task) method (or instance method - this is a matter of taste) and the TaskRunner class is an implementation detail. (Further refinements would be to look up an injected implementation of TaskRunner, and/or a factory for TaskStatuses on the classpath, etc., not to mention making TaskStatus a final class that delegates, but that's getting out of scope...)

What's the difference between this and SwingWorker? First, we've detangled all of the concerns and made them explicit. While sometimes wadding up a bunch of related concerns may seem like it's doing someone a favor, consider that you really need to read the documentation to figure out what a SwingWorker does. While the above design is simpler, you could probably figure out how to use the above API without reading a line of documentation and get it right.

More importantly, by passing in a TaskStatusinstance, we've encouraged healthy client code — no code can get a reference to the passed TaskStatus until theTask is actually being run, and no code can get a reference to the passed TaskStatus unless the client actually wants it to. Contrast this with SwingWorker, which (!!!) allows foreign code to call firePropertyChange() and which can set its status to "done" or "started" inside its constructor! We haven'tguaranteed correct behavior (a client could pass a reference to a TaskStatus to some object that will try to do something evil with it on another thread, although implementations could defend against this). But we have strongly encouraged proper usage by designing in such a way that theTaskStatus is only made available within the scope where it is actually used. In addition, since we're the one passing in the reference to the TaskStatus instance, we can much more safely assume what thread will call theTaskStatus instance, and what locks are held when it does. While deadlocks will surely still be possible, we're doing as much as possible to discourage them on our side.

One of our questions above was Does the thing observing the change need a reference to the thing that changed? In this case, absolutely not. In fact, to make the Task object available could only do harm. There is no reason forTask's run*() methods to be exposed to code that are interested in the status of aTask, not the Task itself. It causes harm not only because status-reporting code could potentially invokerun*() methods, but because, to the programmer who is interested in the status of a Task, therun* methods are simply noise. Most developers learn APIs through code-completion in their IDE; mixing concerns just makes it harder to figure out what is and is not relevant.

You'll notice that we've completely removed use of the listener pattern here. The listener pattern is seriously overrated, but is often used by default because developers have seen it in the Java core libraries — we use what we know. The listener pattern has a number of problems. If you've ever had to debug a Swing application where you have a severe bug that only occurs after you switch Windows from Classic to XP appearance or vice versa, then you know this pain — I know I do. What is happening is:

  • Before the look and feel change, the look and feel's listeners on components get called before the application's listeners
  • After the change they get called after the application's listeners
Some code is unknowingly depending on a side effect of the look and feel's reaction to a change — in particular, it is depending on that side effect to happen before the application's listener is called, and exceptions get thrown. While rare, the listener pattern is prone to ordering bugs which are quite painful to reproduce and fix. Additionally, in my experience, listeners are the most potent source of memory leaks in Swing applications. So it is a pattern to avoid unless you know you really need it.

 

Especially when you have a situation where it is very unlikely ever to have more than one listener, the listener pattern is probably not a good choice. Listeners monitor changes in state. It is not good design to assume that multiple things will want to monitor the status of a Task without any evidence to back up that assumption If needed, it is easy enough to provide aProxyTaskStatus (and note here the "listeners" are provided as a constructor argument - any ordering issues are those of the caller)

public final class ProxyTaskStatus {
    private final TaskStatus[] statii;
    ProxyTaskStatus (TaskStatus... statii) {
        this.statii = statii;
    }
    public void setTitle (String title) {
        for (TaskStatus status : statii) { 
            status.setTitle (title); 
        }
    }
    //...
}
       

if that's really a need; or leave that up to someone implementing TaskStatus (it can always be added to the API later if there are sufficient use-cases).

What we have, instead of the listener pattern, is a situation where we're passing in the model object (in this case, the thing that holds state about status), as opposed to having the state belong to the task itself and having other code listen for changes in the state. Most likely that model object writes directly to a UI showing task progress. I believe I've read of this pattern being called don't call us, we'll call you somewhere (if anyone can point me to where I ran across that usage, let me know so I can give credit where it's due).

To get back to threading for a minute: There are three ways I typically see threading + statefulness dealt with in code:

  • Close your eyes and pray — i.e. do nothing with regards to threading and hope there is never a problem (or better, document that the code is intended to be single-threaded and the caller needs to enforce correct threading behavior because you don't)
  • Synchronize everything — make deadlocks (you hope) somebody else's problem — you pay a price with synchronization — even if not so much with performance, with maintainability
  • Enforce an actual threading model

This last item brings me to my main point here: The combination of keep all related state in one place plusdon't call us, we'll call you creates the possibility of encouraging and even actually enforcing a threading model over a complex data model, without (generally) having to document lots of rules about threading and pray that people obey them. A weakly enforced example of this from the JDK isDocument.render(), where you pass aRunnable that is guaranteed to block any other threads writes until the Runnable has exited (it is weakly enforced because you can read the document's content outside of this method — and there is much code out there that does so).

snail.png

Lastly, for anybody about to scream that I have misrepresented SwingWorker here (it has lots of complexity about batching up interim results and so forth, and the above is a simplification), I'll mention that

  1. I don't mean to ding the authors of it - they were being consistent with the library they were working in. Nonetheless, one could do all of those things with the above general design, and have a much simpler, easier-to-use result.
  2. I could have chosen many APIs to pick on here. I work for Sun. It seems more polite to call your own baby ugly than call someone else's baby ugly :-)
  3. Patterns are easier to abstract than anti-patterns. Anti-patterns tend only to become really visible after code containing them has been in production for a while. There's lots of things we know now about what not to do that simply weren't known a decade ago.