In my last blog I’ve introduced JavaFX 2.0 beta, describing an initial port of JavaFX Balls, also in beta stage at that time. Now I finally finish JavaFX Balls 3.

Look ma, no design!

I don’t pretend to be a designer, and the consequence is that when I make a mashup of animation, video and web, that’s the result. Get the source code here.

Launch JavaFX Balls 3.0

I’ve added new layers of content – aWebView showing (press key ‘W’ to turn on/off) and a MediaView with video playback (key ‘V’). The web view is live, you can click links, scroll etc.; it’s only challenging to actually click anything because, if any ball is under the cursor, it will swallow the mouse click event… I didn’t write any code for that, everything is scene graph nodes and JavaFX dispatches events to the "top" node under the cursor. And if you run the program, your sensorial receptors may be further abused by sound effects executed when balls collide with a wall or with each other, testing the low-latency AudioClip API (key ‘A’).

I worked to make the new version of JavaFX Balls a better tool to investigate the JavaFX runtime. Besides the web & media features, the program supports all options from previous versions (like changing the ball scale and speed), plus some new tricks; refer to for command-line arguments, on-screen help for keyboard controls. The following new options are important for benchmarking:

  • Choice of strategy to update the animated balls’ x / y positions (detailed in the Binding section).
  • Choice of hit-testing algorithm: 
    • -hit:old for the “old” algorithm, similar to other Bubblemark ports (which complexity is O(N2)) but improved. Its complexity is now O(N2 / 2D), where D = 1..2, higher for denser scenes where any ball always collides to some other.
    • -hit:index for my optimized spatial indexing algorithm (the default). Its speed will be O(N2-C / D), where D is like above and C = cell-indexing factor, 0..1: 0 = all balls positioned in a single cell, 1 = all cells have at most one ball. 
    • -hit:no for no hit testing (balls just fly over each other).
  • It’s possible to set how many percent of the balls will move (-move:<factor>), and how many percent of the moving balls will be hit-tested (-coll:<factor>).
  • The scene can have a different size (-scene:<multiplier> or –scene:<width>x<height>). I will refer to “Large” as -scene:2, i.e. 1,000 x 600 pixels.
  • You can also use -opacity:<value> to set the opacity of all balls, from 0 (transparent) to 1 (opaque, the default).

My “index” algorithm could still be improved: for example, I could choose cell sizes dynamically so that denser scenes (more balls) have a large number of small cells, and few balls would use a small number of large cells. But the current algorithm is good enough, it doesn’t anymore dominate CPU usage at high node counts. Also, because the collision bouncing limits overlapping, the performance of this algorithm is already close to O(N), so a more sophisticated solution wouldn’t improve much. For example, with 4,096 balls / large scene / tiny balls, “index” does ~8,500 tests/frame, “old” needs 3 million and the original Bubblemark code would need ~5,3 million. In the standard Bubblemark test (small screen, 512 full-size balls), both my “index” and “old” algorithms need ~8,500 tests/frame but the original Bubblemark needs ~38K/frame. I invite the authors of other Bubblemark ports to improve its collision code, at least to my “old” algorithm, which is a 5-minute job but already good enough to fix the benchmark at least up to 512 balls.


Below I report the scores, with JavaFX 2.0.1 / JDK 7u2-b10, for some interesting combinations of options. Factors like node scale, vector drawing and effects, were well explored in my tests with JavaFX 1.3.1 and their performance didn’t change significantly since that version (with EA-quality Prism). The Web, Video and Audio options have little impact in the performance. Screen size and node scale will also affect the scene’s density, which impacts collision costs; but this is obvious, and not the focus of my study.

BindFPSJavaFX 2.0
JavaFX 2.0

The collision algorithm has a big impact in the scalability of the animation. Looking at the “standard real-world configuration” (HotSpot Client at 60 fps), the spatial indexing allows 4,5X more nodes to be animated. Even with HotSpot Server, which superior JIT optimizer compensates for part of the inefficiency of the simpler algorithm, the spatial index can still move 2,7X more nodes which is still a huge win.


The second interesting factor that’s new in JavaFX Balls 3.0 is the binding strategy. This is the -bind:fx option, i.e. the “recommended” way to program in JavaFX – the visual node has some properties bound to variables from application-level model objects (the bindView() method will only be invoked when the model class is created):

public class BallFX extends Ball {
    private final DoubleProperty x = new DoubleProperty(0); // +getter/setter, not shown
    private final DoubleProperty y = new DoubleProperty(0); // +getter/setter, not shown
    @Override protected void init () {

In the -bind:no option, I just set the nodes’ attributes manually every time the application model data changes:

public class BallOpt extends Ball {
    private double x; // +getter/setter, not shown
    private double y; // +getter/setter, not shown
    @Override protected void bind () {

Finally, for the bind:node option, we have no redundancy between application-level objects and JavaFX nodes; the app just manipulates the nodes’ attributes directly. My getters and setters for x and y will delegate to the view node.

public class BallNode extends Ball {
  // No attributes
    public double getX () { return view.getTranslateX(); }
    public double getY () { return view.getTranslateY(); }
    public void setX (double x) { view.setTranslateX(x); }
    public void setY (double y) { view.setTranslateY(y); }


In the benchmark scores for HotSpot Client / 60 fps,-bind:no is the most efficient strategy, followed by-bind:fx (16% worse) and -bind:node (19% worse). The overhead of JavaFX’s properties (in this case the node’s translateX / translateY) is big enough that it pays off to avoid it in performance-critical code; but it’s not too bad, so do that only when really necessary. At the very least, remember that property getters/setters are expensive, so take care to use local variables to avoid multiople calls in methods that would need a property value in several places or update it repeatedly.

All tests involve JavaFX properties, because they use node properties such as translateX / translateY; but the -bind:fx option increases this cost by allocating its own properties, and using bind() into node properties, which forces the latter to “inflate” intoDoubleProperty instances too; not to mention the costs of binding itself (registering a listener, then dispatching invalidation events at every update). That’s several thousands of extra property objects for all balls, and thousands of invalidation events per frame – so the 16% hit in the score is not bad. JavaFX Script’s “compiled” properties/binding would certainly be more efficient… but that was the single performance advantage of that language; it was offset by other features that were less efficient than equivalent Java, like sequences and functions.

Scenegraph Improvements and Parallel Renderer

The major performance improvement in JavaFX 2.0 is difficult to measure precisely with a benchmark, so you’ll have to take my word for it or just run the app. In JavaFX 1.x, changing the number of balls in the scene – either manually with the left/right-arrow keys or automatically in the locked-fps modes – had a very high cost. In the extreme example, moving from 2,048 to 4,096 balls would freeze the animation for a ridiculous amount of time, maybe a full minute on an average CPU. This was caused by poor scalability of the scene graph’s internals. Now in JavaFX 2.0, this problem simply vanished; in the same test, the transition happens instantly, without freezing or slowing down for any user-perceivable amount of time.

Some time is spent to add 2,048 new ball nodes to the scene graph; but this seems to be fast enough not not be seen by the naked eye. I initially thought Prism’s concurrent architecture was masking this cost (because I’m running on a dual-core CPU), but bug RT-15195: Allow QuantumRenderer thread and FX Application thread to run in parallel means the rendering thread (Quantum) and application thread (the EDT) are not [fully?] parallel; they are different threads, but synchronized so rendering doesn’t execute in parallel with EDT work. Full parallelism will be enabled after some concurrency bugs are fixed.

Other Bubblemarks, other engines

Besides the new collision code, comparing JavaFX Balls 3.0 with other Bubblemarks is impacted by a regression: RT-13660: QuantumToolkit can schedule the pulse better to improve performance – this puts JavaFX 2.0 in disadvantage to other engines (including JavaFX 1.3) that also support a “full-speed mode”, without any FPS capping. It seems this bug, planned to fix only in JavaFX 2.1, has some impact even in the standard execution mode (FPS rate capped by v-sync).

Looking even forward, future enhancements like RT-5205: Scenegraph performance: Binary nodes promise even better performance, scalability or memory economy. But then we’re already in the realm of advanced optimizations that are mostly necessary for sophisticated 3D animations (JavaFX 2.0’s 3D features are still modest, but the framework is designed to be a foundation for full 3D support, which will come in future releases). Is JavaFX already good enough for “AAA"-class 2D games or other advanced 2D animations?

Ashley Gullen’s blog  HTML5 2D gaming performance analysis tries to answer this question for the Web platform. That benchmark (let’s call it RenderPerf) is similar enough to Bubblemark that I could configure JavaFX Balls to match it: just replace the resourceballs.png with RenderPerf’s 000.png, then launch with -screen:640x480 -bind:no -move:-1 -hit:no -opacity:0.02, and press ‘3’ to lock at 30fps. The visual output will be exactly the same as RenderPerf’s! Here’s my scores (Intel Core i5-2430M / HD Graphics 3000 laptop, Windows 7, Chrome 16-beta):

JavaFX (Client)14,700
JavaFX (Server)15,183

JavaFX (Hotspot Client) is 70% faster than WebGL, a great performance, considering the competition is a dedicated game engine built on top of a low-level 3D API. The native program is still much better, 6X faster than WebGL and 3,5X better than JavaFX. Part of this gap is due to the native vs. VM factor – both Java and Javascript are severely disadvantaged, for this kind of work, by their typesystem (no “lightweight objects” like structs), no fine control over memory layout, and costs of managed/native interface. Optimizations like binary nodes can still close the gap, but it’s not likely that any managed language will match native code on this. But if Ashley Gullen considers 1/6th of native good enough for “intense 2D gaming”, then 1/3rd is certainly excellent.

I think JavaFX can still move closer to native and farther from pure-browser technologies. Both still have room to improve, but the Java platform has less severe fundamental limitations. For example, while Java needs an ugly hack (Direct Buffers) to push arrays of data efficiently to native 3D libraries, Javascript needs an ugly hack (Typed Arrays) to have reasonably efficient arrays at all.  This is one of the motivations for alternatives like Google’s NaCl and Dart. On the other hand, if the standard Web surprises me and catches up, it’s great too because JavaFX will eventually have a “web runtime” implemented in Javascript and Canvas or WebGL.

In a final note, half of the work of writing benchmarks is finding new pitfalls and working around them. In my RenderPerf-style test of JavaFX Balls, I had to add the flagprism.dirtyopts=false, otherwise JavaFX wouldn’t render nodes that don’t move between frames and I got ridiculously high scores. The regular JavaFX Balls test already uses this flag for another reason – when most of the scene changes every frame, this only adds overhead, so it’s not fair compared to other engines. Also, even with dirty rectangles off, JavaFX refuses to render frames if nothing changes at all (-move:0); so for RenderPerf I had to create the option-move:-1 that causes a single ball to move one pixel, right or left on even/odd frames so the ball “vibrates” but stays in the same place.

These benchmarking problems also reveal design priorities of each engine; JavaFX supports conventional application UIs, so it’s highly optimized to handle scenes which are mostly static, with visual changes in short bursts in response to user input and usually localized to a small region like a single UI control. So it pays off to do the bookkeeping necessary to avoid redundant rendering, or render only the part of the scene that actually changed. Most dedicated game engines don’t bother doing that, because games typically update most or all of the screen in every frame. For this reason, you can often observe games that keep high CPU/GPU usage even in the rare static screens such as option menus – the renderer is happily redrawing the whole screen non-stop, even when not a single pixel changes. But such behavior would be completely unacceptable for JavaFX, when used to show “regular” UIs.

UPDATE: Some deployment fixes, also I got rid of the very small use of Java 7-specific syntax, so the new deployment only requires JDK 1.6.