1 2 3 Previous Next

opinali

38 posts
opinali

JavaFX Balls 3.0 Blog

Posted by opinali Nov 25, 2011

In my last blog I’ve introduced JavaFX 2.0 beta, describing an initial port of JavaFX Balls, also in beta stage at that time. Now I finally finish JavaFX Balls 3.

Look ma, no design!

I don’t pretend to be a designer, and the consequence is that when I make a mashup of animation, video and web, that’s the result. Get the source code here.

Launch JavaFX Balls 3.0

I’ve added new layers of content – aWebView showing javafx.com (press key ‘W’ to turn on/off) and a MediaView with video playback (key ‘V’). The web view is live, you can click links, scroll etc.; it’s only challenging to actually click anything because, if any ball is under the cursor, it will swallow the mouse click event… I didn’t write any code for that, everything is scene graph nodes and JavaFX dispatches events to the "top" node under the cursor. And if you run the program, your sensorial receptors may be further abused by sound effects executed when balls collide with a wall or with each other, testing the low-latency AudioClip API (key ‘A’).

I worked to make the new version of JavaFX Balls a better tool to investigate the JavaFX runtime. Besides the web & media features, the program supports all options from previous versions (like changing the ball scale and speed), plus some new tricks; refer to Main.java for command-line arguments, on-screen help for keyboard controls. The following new options are important for benchmarking:

  • Choice of strategy to update the animated balls’ x / y positions (detailed in the Binding section).
  • Choice of hit-testing algorithm: 
    • -hit:old for the “old” algorithm, similar to other Bubblemark ports (which complexity is O(N2)) but improved. Its complexity is now O(N2 / 2D), where D = 1..2, higher for denser scenes where any ball always collides to some other.
    • -hit:index for my optimized spatial indexing algorithm (the default). Its speed will be O(N2-C / D), where D is like above and C = cell-indexing factor, 0..1: 0 = all balls positioned in a single cell, 1 = all cells have at most one ball. 
    • -hit:no for no hit testing (balls just fly over each other).
  • It’s possible to set how many percent of the balls will move (-move:<factor>), and how many percent of the moving balls will be hit-tested (-coll:<factor>).
  • The scene can have a different size (-scene:<multiplier> or –scene:<width>x<height>). I will refer to “Large” as -scene:2, i.e. 1,000 x 600 pixels.
  • You can also use -opacity:<value> to set the opacity of all balls, from 0 (transparent) to 1 (opaque, the default).

My “index” algorithm could still be improved: for example, I could choose cell sizes dynamically so that denser scenes (more balls) have a large number of small cells, and few balls would use a small number of large cells. But the current algorithm is good enough, it doesn’t anymore dominate CPU usage at high node counts. Also, because the collision bouncing limits overlapping, the performance of this algorithm is already close to O(N), so a more sophisticated solution wouldn’t improve much. For example, with 4,096 balls / large scene / tiny balls, “index” does ~8,500 tests/frame, “old” needs 3 million and the original Bubblemark code would need ~5,3 million. In the standard Bubblemark test (small screen, 512 full-size balls), both my “index” and “old” algorithms need ~8,500 tests/frame but the original Bubblemark needs ~38K/frame. I invite the authors of other Bubblemark ports to improve its collision code, at least to my “old” algorithm, which is a 5-minute job but already good enough to fix the benchmark at least up to 512 balls.

Speed

Below I report the scores, with JavaFX 2.0.1 / JDK 7u2-b10, for some interesting combinations of options. Factors like node scale, vector drawing and effects, were well explored in my tests with JavaFX 1.3.1 and their performance didn’t change significantly since that version (with EA-quality Prism). The Web, Video and Audio options have little impact in the performance. Screen size and node scale will also affect the scene’s density, which impacts collision costs; but this is obvious, and not the focus of my study.

                                                                                           
Hit-Test
Algorithm
Scene
Size
Ball
Scale
BindFPSJavaFX 2.0
(Client)
JavaFX 2.0
(Server)
OldLarge1/4No601,3163,005
OldLarge1/4No2005801,127
IndexLarge1/4No605,9328,095
IndexLarge1/4No2001,3612,011
IndexLarge1/4FX605,0287,357
IndexLarge1/4FX2001,2581,914
IndexLarge1/4Node604,7877,092
IndexLarge1/4Node2001,3261,887

The collision algorithm has a big impact in the scalability of the animation. Looking at the “standard real-world configuration” (HotSpot Client at 60 fps), the spatial indexing allows 4,5X more nodes to be animated. Even with HotSpot Server, which superior JIT optimizer compensates for part of the inefficiency of the simpler algorithm, the spatial index can still move 2,7X more nodes which is still a huge win.

Binding

The second interesting factor that’s new in JavaFX Balls 3.0 is the binding strategy. This is the -bind:fx option, i.e. the “recommended” way to program in JavaFX – the visual node has some properties bound to variables from application-level model objects (the bindView() method will only be invoked when the model class is created):


public class BallFX extends Ball {
    private final DoubleProperty x = new DoubleProperty(0); // +getter/setter, not shown
    private final DoubleProperty y = new DoubleProperty(0); // +getter/setter, not shown
  
    @Override protected void init () {
     view.translateXProperty().bind(x);
        view.translateYProperty().bind(y);
    }
}

In the -bind:no option, I just set the nodes’ attributes manually every time the application model data changes:


public class BallOpt extends Ball {
    private double x; // +getter/setter, not shown
    private double y; // +getter/setter, not shown
  
    @Override protected void bind () {
     view.setTranslateX(x);
        view.setTranslateY(y);
    }
}

Finally, for the bind:node option, we have no redundancy between application-level objects and JavaFX nodes; the app just manipulates the nodes’ attributes directly. My getters and setters for x and y will delegate to the view node.


public class BallNode extends Ball {
  // No attributes
    public double getX () { return view.getTranslateX(); }
    public double getY () { return view.getTranslateY(); }
    public void setX (double x) { view.setTranslateX(x); }
    public void setY (double y) { view.setTranslateY(y); }

}

In the benchmark scores for HotSpot Client / 60 fps,-bind:no is the most efficient strategy, followed by-bind:fx (16% worse) and -bind:node (19% worse). The overhead of JavaFX’s properties (in this case the node’s translateX / translateY) is big enough that it pays off to avoid it in performance-critical code; but it’s not too bad, so do that only when really necessary. At the very least, remember that property getters/setters are expensive, so take care to use local variables to avoid multiople calls in methods that would need a property value in several places or update it repeatedly.

All tests involve JavaFX properties, because they use node properties such as translateX / translateY; but the -bind:fx option increases this cost by allocating its own properties, and using bind() into node properties, which forces the latter to “inflate” intoDoubleProperty instances too; not to mention the costs of binding itself (registering a listener, then dispatching invalidation events at every update). That’s several thousands of extra property objects for all balls, and thousands of invalidation events per frame – so the 16% hit in the score is not bad. JavaFX Script’s “compiled” properties/binding would certainly be more efficient… but that was the single performance advantage of that language; it was offset by other features that were less efficient than equivalent Java, like sequences and functions.

Scenegraph Improvements and Parallel Renderer

The major performance improvement in JavaFX 2.0 is difficult to measure precisely with a benchmark, so you’ll have to take my word for it or just run the app. In JavaFX 1.x, changing the number of balls in the scene – either manually with the left/right-arrow keys or automatically in the locked-fps modes – had a very high cost. In the extreme example, moving from 2,048 to 4,096 balls would freeze the animation for a ridiculous amount of time, maybe a full minute on an average CPU. This was caused by poor scalability of the scene graph’s internals. Now in JavaFX 2.0, this problem simply vanished; in the same test, the transition happens instantly, without freezing or slowing down for any user-perceivable amount of time.

Some time is spent to add 2,048 new ball nodes to the scene graph; but this seems to be fast enough not not be seen by the naked eye. I initially thought Prism’s concurrent architecture was masking this cost (because I’m running on a dual-core CPU), but bug RT-15195: Allow QuantumRenderer thread and FX Application thread to run in parallel means the rendering thread (Quantum) and application thread (the EDT) are not [fully?] parallel; they are different threads, but synchronized so rendering doesn’t execute in parallel with EDT work. Full parallelism will be enabled after some concurrency bugs are fixed.

Other Bubblemarks, other engines

Besides the new collision code, comparing JavaFX Balls 3.0 with other Bubblemarks is impacted by a regression: RT-13660: QuantumToolkit can schedule the pulse better to improve performance – this puts JavaFX 2.0 in disadvantage to other engines (including JavaFX 1.3) that also support a “full-speed mode”, without any FPS capping. It seems this bug, planned to fix only in JavaFX 2.1, has some impact even in the standard execution mode (FPS rate capped by v-sync).

Looking even forward, future enhancements like RT-5205: Scenegraph performance: Binary nodes promise even better performance, scalability or memory economy. But then we’re already in the realm of advanced optimizations that are mostly necessary for sophisticated 3D animations (JavaFX 2.0’s 3D features are still modest, but the framework is designed to be a foundation for full 3D support, which will come in future releases). Is JavaFX already good enough for “AAA"-class 2D games or other advanced 2D animations?

Ashley Gullen’s blog  HTML5 2D gaming performance analysis tries to answer this question for the Web platform. That benchmark (let’s call it RenderPerf) is similar enough to Bubblemark that I could configure JavaFX Balls to match it: just replace the resourceballs.png with RenderPerf’s 000.png, then launch with -screen:640x480 -bind:no -move:-1 -hit:no -opacity:0.02, and press ‘3’ to lock at 30fps. The visual output will be exactly the same as RenderPerf’s! Here’s my scores (Intel Core i5-2430M / HD Graphics 3000 laptop, Windows 7, Chrome 16-beta):

                               
TestScore
Canvas4,312
WebGL8,661
C++/Direct3D51,910
JavaFX (Client)14,700
JavaFX (Server)15,183

JavaFX (Hotspot Client) is 70% faster than WebGL, a great performance, considering the competition is a dedicated game engine built on top of a low-level 3D API. The native program is still much better, 6X faster than WebGL and 3,5X better than JavaFX. Part of this gap is due to the native vs. VM factor – both Java and Javascript are severely disadvantaged, for this kind of work, by their typesystem (no “lightweight objects” like structs), no fine control over memory layout, and costs of managed/native interface. Optimizations like binary nodes can still close the gap, but it’s not likely that any managed language will match native code on this. But if Ashley Gullen considers 1/6th of native good enough for “intense 2D gaming”, then 1/3rd is certainly excellent.

I think JavaFX can still move closer to native and farther from pure-browser technologies. Both still have room to improve, but the Java platform has less severe fundamental limitations. For example, while Java needs an ugly hack (Direct Buffers) to push arrays of data efficiently to native 3D libraries, Javascript needs an ugly hack (Typed Arrays) to have reasonably efficient arrays at all.  This is one of the motivations for alternatives like Google’s NaCl and Dart. On the other hand, if the standard Web surprises me and catches up, it’s great too because JavaFX will eventually have a “web runtime” implemented in Javascript and Canvas or WebGL.

In a final note, half of the work of writing benchmarks is finding new pitfalls and working around them. In my RenderPerf-style test of JavaFX Balls, I had to add the flagprism.dirtyopts=false, otherwise JavaFX wouldn’t render nodes that don’t move between frames and I got ridiculously high scores. The regular JavaFX Balls test already uses this flag for another reason – when most of the scene changes every frame, this only adds overhead, so it’s not fair compared to other engines. Also, even with dirty rectangles off, JavaFX refuses to render frames if nothing changes at all (-move:0); so for RenderPerf I had to create the option-move:-1 that causes a single ball to move one pixel, right or left on even/odd frames so the ball “vibrates” but stays in the same place.

These benchmarking problems also reveal design priorities of each engine; JavaFX supports conventional application UIs, so it’s highly optimized to handle scenes which are mostly static, with visual changes in short bursts in response to user input and usually localized to a small region like a single UI control. So it pays off to do the bookkeeping necessary to avoid redundant rendering, or render only the part of the scene that actually changed. Most dedicated game engines don’t bother doing that, because games typically update most or all of the screen in every frame. For this reason, you can often observe games that keep high CPU/GPU usage even in the rare static screens such as option menus – the renderer is happily redrawing the whole screen non-stop, even when not a single pixel changes. But such behavior would be completely unacceptable for JavaFX, when used to show “regular” UIs.

UPDATE: Some deployment fixes, also I got rid of the very small use of Java 7-specific syntax, so the new deployment only requires JDK 1.6.

JavaFX 2.0 is not multiplatform! It can’t do subpixel antialiasing!! … these were among the reactions to the first beta releases, that I’m not sure to understand as trolling or simple laziness. These mysteries are usually solved with a simple look at JavaFX’s public JIRA issue tracking system. The current implementation is still a beta, not even a feature-complete beta, so there are many bugs that will be fixed and functionality that will be added or completed before the FCS.

Here’s a quick review of a few interesting open bugs (or very recently closed) at this time. This is just a snapshot at a specific date and time, but it’s a good exercise to get a feeling about the project progress.

Critical Bugs

Major Bugs

  • RT-14045: TableView perf when scrolling up and down – A more “down-to-Earth” performance bug; JavaFX’s controls can suffer high overheads from the layout and CSS subsystems. JavaFX flies on benchmarks like my JavaFX Balls, that heavily exercise graphics and animation but make zero use of layout or CSS; these things got to be optimized so conventional UIs will also have great all-round performance.
  • RT-14038: Need public API for rendering to an image and converting to BufferedImage – this is basically what I have requested here and filed here; but it’s important also to enable printing support, right now an important feature gap (although you can do it with internal APIs).
  • RT-13007: Need to add any() and discard functionality to JSL language – This apparently esoteric, recently-fixed issue is related to Decora, the internal library used by JavaFX to code GPU shaders in a portable manner (Java Shading Language, which is converted to HLSL for DirectX pipeline, GLSL for the OpenGL pipeline, or Intel SSE if your OS/GPU won’t support any of the former).
  • RT-11283: Media rendering path needs to be revised for JavaFX 2.0 – JavaFX 2.0’s brand-new and much improved media stack is GStreamer, but the integration between that code and the rest of JavaFX is still a work in progress. This bug documents some apparently low-hanging optimization fruits. It also tells you that the WebView shares the same media components what it plays HTML5 <video> and <audio>, which makes a lot of sense; apparently the web component is well-integrated into the JavaFX architecture, it’s not just a cheap embedding of upstream WebKit into the toolkit.
  • RT-9466: Support Prism / Glass on Linux – One of multiple bugs telling us that the Linux port exists and will be eventually available… but not in time for 2.0-FCS; very likely only in the next minor update. I guess a beta-quality build for Linux will be released by the time 2.0 ships, but there’s no official position and no JIRA clues about this at this time.
  • RT-5205: Scenegraph performance: Binary nodes – Describes an optimization that could potentially result in orders-of-magnitude scalability improvement for the scene graph.
  • RT-5258: Optimize 3D picking – One of the many bugs covering the big-ticket feature of (initial) 3D support in JavaFX 2.0. The same pipeline supports both 2D and 3D elements, which creates some interesting challenges.

Medium Bugs

  • RT-12100: Swing components inside JavaFX – Notice the “medium” priority; and also the evaluation “We do not currently plan…”. The reverse support (embedding JavaFX components inside Swing) is apparently the only kind of integration that will be supported, with JFXPanel. (See Side Note.)
  • RT-11191: Consider supporting ES1 pipeline in Prism – Prism already supports OpenGL ES2 as back-end pipeline; that’s important for high-end mobile platforms, and also used in the OSX port (the reporter apparently works on OpenGL/OSX porting of JavaFX). OpenGL ES1 support would be good for lower-end devices.
  • RT-9983: Circles are rendered incorrectly – I thought I knew all about rendering circles when I studied Mike Abrash’s DDJ series long ago… but the recurrence of such bugs in circle/ellipse drawing, from the early days of Java2D and into JavaFX’s Prism, tells me this stuff is not that simple.
  • RT-7644: Prism: Math.floor and Math.ceil take up a lot of cpu time– Another déjà vu bug. This old Java performance pitfall persists even in current JDKs, where all “simple” Math methods –floor(), ceil(), abs(), min(),max(), signum() – are pure-Java, without JNI overheads. But their implementations are still complex, due to corner cases like signed zeroes, NaNs and infinites, so the methods are bloated enough to prevent inlining and further JIT optimization. High-performance code that handles well-behaved floating-point numbers, such as graphics coordinates, must avoid these “simple” java.lang.Math APIs like the plague. Just write your own one-liner replacements.

Side Note: Swing Integration

JavaFX 2.0 supports embedding of JavaFX components into Swing applications. This allows the Swing faithful, or legacy Swing apps, to benefit from lots of JavaFX power, even though the integration is a bit coarse-grained and one-way. Full integration will never be possible; in the past (JavaFX 1.x blogs) I offered the rendering architecture as the main reason for that, but it’s now clear that threading and event processing are also radically different, and there’s only so much integration you can make through adapter layers without opening a whole new can of worms. So I was pleasantly surprised to know that JFXPanel is lightweight, which enables seamless visual integration and avoids the small number of lightweight / heavyweight issues that still persist. It’s even more exciting as JavaFX can do that with the GPU-accelerated pipelines, no need to fall back to Java2D. I guess JavaFX just renders its scene and then draws the resulting pixel buffer into Swing’s Java2D surface… but that’s probably easier said than done, and the extra buffer copy is certainly more than offset by JavaFX’s fully-accelerated rendering.

Another idea: why should Swing apps pay any tax to use components like the web engine and the media support? These are mostly implemented by WebKit and GStreamer, both native and neutral to the JavaFX runtime. JavaFX does us the big favor of making a Java-friendly distribution of these components (e.g. with JNI interfaces), and bundling them in a runtime that will soon be widely distributed. (And differently to third-party libraries containing native code, e.g. JOGL, it won’t require full permissions to run from the web…) You still need a high-level Java layer to make those features available to Java applications, and while the public javafx.scene.media and javafx.scene.web are JavaFX-centric, these are pretty small, and even the larger packages they rely on don’t seem to be mostly FX-specific. For example, the lion’s share ofcom.sun.webpane (as I estimated from class names and counts – no sources available) deals with neutral issues like DOM, networking, cookies and authentication. The integration of both rendering and even handling should be a very small piece of this Java layer, and one that would be easy to replace with Swing-specific support. This could allow a Swing program to have a “web view” implemented as a pure JComponent, without any further intermediation of other parts of JavaFX, resulting in less overhead and limitations. The same logic probably applies to the media stack too.

These are interesting opportunities for the future evolution of the combined platforms; even if the strategy is 100% JavaFX moving forward, Oracle could win back a lot of good will from the Swing community by putting the (apparently small) effort of making first-class web and media support available “natively” to Swing, without caveats or limitations caused by an “alien” toolkit in the middle. That support could even become official JavaSE APIs in JDK 8+, and the migration of all shared code to the JRE would make JavaFX’s runtime look much smaller (although this may be irrelevant if both runtimes are bundled together, not to mention JDK 8’s Jigsaw). If Oracle has no intention to follow that route or if the additional effort is bigger than I estimate, I bet some Swing enthusiasts would do the work, if JavaFX’s internal web and media libraries, or at least the native bindings, are documented. Open sourcing these libraries and bindings would be even better, but not really critical.

It's been a long time, well long in Internet-years, since my last blog on JavaFX. Now I'm approaching JavaFX 2.0 by porting the JavaFX 1.x programs that I had written and blogged about here. These new ports will allow me to evaluate the evolution of the platform. Has the wait been worth it?

Porting from JavaFX 1

For my first port I’ve picked JavaFX Balls. (But this blog is not about benchmarking; I’ll do that in a follow-up.) The port cost me a couple hours of tedious, but easy massaging of the original code. JavaFX Script is sufficiently close to Java that you can rename a .fx file to .java and edit it until it becomes Java. Heavy editing, but better than a rewrite from scratch. The core APIs are 99% identical. There’s no much legacy out there, but this is nice evidence of JavaFX’s maturity: the team didn’t need to use the opportunity of the breaking-change v2 release for heavy API redesign. The major changes are all in features tied to special support from JavaFX Script’s syntax. But even in these cases, the new version is very familiar.

Functions

Java does not have an equivalent to JavaFX Script’s functions, but JDK 8 will have that, and its lambdas are tied to Single Abstract Method types. The JavaFX 2 API is carefully designed around SAM types; current application code will use loads of inner classes in the same places where JavaFX 1 code would use functions, but when JDK 8 comes we’ll be able to replace all those inner classes with lambdas – so losing JavaFX Script’s functions is a temporary setback.

             
JavaFX 1JavaFX 2



onKeyPressed: function (ke: KeyEvent): Void { … }

With JDK 8:

.onKeyPressed(#{ KeyEvent ke -> … });

With JDK 6..7:

.onKeyPressed(new EventHandler<KeyEvent>() {
  public void handle (KeyEvent ke) { … }
});

Initialization Blocks

I loved JavaFX Script’s syntax to initialize complex trees of objects, but that’s also gone, so I feared that my code would double in size. Not so much:

             
JavaFX 1JavaFX 2

public def fpsTimer = Timeline {
repeatCount: Timeline.INDEFINITE
  keyFrames: KeyFrame {
    time: 1s
    action: function () {…}
}}

private Timeline fpsTimer = new TimelineBuilder()
  .cycleCount(Timeline.INDEFINITE)
  .keyFrames(new KeyFrame(
    new Duration(1000L),
    new EventHandler<ActionEvent>() {…}
})).build();

The trick is obvious: the Builder pattern. This keeps code almost as tight as in JavaFX Script. The javafx.builders package offers 257 builders, covering all the APIs. I’m not fond of the Builder pattern; I loathe the extra allocations and calls with the single purpose of making code a little more compact. But UI-tree initialization is a best-case application for that pattern. As a minimal rule, avoid builders in performance-critical code, and in any reusable library. I hope visual designers will have an option to choose between readable/compact style (builders), and optimized style (constructors and setters).

You can mix styles – there is a KeyFrameBuilder, but I didn’t use above because KeyFrame contains several constructors, including one with all properties that I wanted to set. Avoiding the TimelineBuilder though, would require setter calls, because Timeline doesn’t offer constructor initialization for all fields or even the most important ones. This is a trait of JavaFX 2’s API design: most classes offer constructor initialization only for properties that can only be set at construction time – a convention that replaces JavaFX Script’s public-init visibility. But there are exceptions for struct-like classes, such as those in thejavafx.geometry package, that offer constructor initialization even for mutable properties.

Properties and Binding

Properties and binding are definitely the big loss, in syntax, from abandoning JavaFX Script. Compare:

             
JavaFX 1JavaFX 2

view = ImageView { image: image
   translateX: bind x + (view.scaleX - 1) * 25
  translateY: bind y + (view.scaleY - 1) * 25
};

view = new ImageView(image);
view.translateXProperty().bind(x.add((view.getScaleX() - 1) * 25));
view.translateYProperty().bind(y.add((view.getScaleY() - 1) * 25));

Not all Java fields can be binding targets, only those with special types like DoubleProperty. These are similar to the classic Java wrappers like java.lang.Double, except they are observable, and optionally mutable. The bind()methods’ single argument must be some observable type. Whenever the observable value changes, the property bound to it will also change. Properties can be bound directly to a observable value (bind(x) – “identity binding”), or to a more complex observable expression like the example: methods like DoubleProperty.add() don’t perform the actual named operation like addition, they build expression trees that can be evaluated later. In both cases,DoubleProperty.bind()’s parameter should be aObservableNumberValue<?>; this is a supertype of both my DoubleProperty x (so I can do identity binding), and of the expression tree produced by add(). To make this clear, this is the full hierarchy forDoubleProperty:

DoubleProperty

The support for properties and binding is extensive, spreading over three main packages (javafx.beans,javafx.binding and javafx.collections). The API has specialized classes, interfaces and methods for all base Java types, so you won’t pay the cost of extra boxing that exists with total reliance on generics. But don’t be scared by the size of the API; you’ll usually only need the *Property classes. All other types are there just for reuse, or static-typing for the binding and expression tree building APIs. The hierarchy is extensible (no final classes), so it’s useful to know it in more detail to create your own property, observable, or expression types. You don’t need to use*Property types all the time; instead oftranslateXProperty().get(), just callgetTranslateX(). See Mike Heinrichs’ introductionsto properties. The full power of JavaFX Script’s binding lives… except the convenient syntax.

Notice also that in my JavaFX 2 code above I didn’t use anImageViewBuilder, that’s because the builders’ chained setters only allow you to initialize properties with standard values, not with observable values. This would bloat the builder APIs, because Java’s base types are not extensible soObservableDoubleValue cannot be a subtype ofdouble or even Double – higher-level languages for the JVM may not have this problem. Implicit sharing of observable values, over repeated uses of the same builder object (a nice optimization especially inside loops), might also be confusing. You can still create the object with a builder, only the bound properties must be initialized after construction.

             
JavaFX 1JavaFX 2


public var nBalls:Integer on replace {
  N = nBalls;
  fpsTarget = 0;
}

With override:

IntegerProperty nBalls = new IntegerProperty(0) {
protected void invalidated () {
  Config.this.N = value.getValue();
  fpsTarget = 0;
}};

With explicit listener:

IntegerProperty nBalls = new IntegerProperty(0);
nBalls.addListener(new InvalidationListener<Integer>() {
  public void invalidated (ObservableValue<? extends Integer> value) {
    Config.this.N = value.getValue();
    fpsTarget = 0;
}});

Triggers are replaced by invalidation events. Besides declaring the *Property field, you must handle this event in one of two ways: either register an InvalidationListener, or define a subclass of the property overridinginvalidated(). The latter is much more concise, closer to JavaFX 1’s triggers; and lighter-weight, because the property’s internal listeners list won’t be allocated if no listener is installed. The event dispatch itself is basically just as fast in either case: different from most event listeners,InvalidationListener.invalidated()‘s parameter is not a special event object, it’s simply the “self” reference to the invalidated property (here,nBalls). There’s no per-event allocation.

One important advantage over JavaFX 1 is the higher level of control. Explicit invalidation listeners are a bit bulkier even with lambdas, but they allow extra flexibility – register and deregister listeners at any time; multiple listeners for a single property, single listener for multiple properties (that’s why the “self” parameter), capturing variables from a different scope than the property initialization scope, etc.

Properties: –verbose?

Some people have complained that JavaFX 2’s properties are too verbose. Overall, it’s true that the property / binding system adds more weight to the already-bulky Java Beans; Java really needs property syntax. Maybe JDK 8 will bring that. You don’t have to wait though; DSLs like Visage, or JavaFX bindings created with any language that offers either meta-programmingor operator overloading, should be able to eliminate the boilerplate, even in complex binding expressions.

But it’s worth notice that JavaFX properties can be programmed in several styles, from the more heavyweight to more lightweight:

  1. A full-blown, “API-quality” scheme with two private attributes, allowing lazy allocation of the *Propertyobject (at least for properties that are often not set by the user), like described by Mike Heinrich. Definitely significant extra code compared with the traditional Java Beans convention.
  2. The “standard” approach: the private*Property attribute, and getX(),setX(…) and xProperty() accessors. That’s just one method more than Java Beans.
  3. Only the private attribute and the xProperty() method – I’d rather name it just x(). Users will have to invoke x().get() and x().set(…) to access the value. That’s one method less than Java Beans.
  4. Only a public attribute. Just make it final, so users can’t reassign the property object itself. That’s two methods less than JavaBeans!

There are also other possible variations, remarkably using theReadOnly*Property and ReadOnly*Wrapper APIs – these come to replace JavaFX Script’spublic-read visibility. But considering the list of options above for simplicity, the amount of boilerplate code is proportional to functionality and design choices. Not all application beans have to be as thoroughly optimized, either for programmer convenience or performance, as the JavaFX APIs. Even the option of public attributes is not an absurd proposal – not even if you strictly adhere to Object-Oriented encapsulation dogma. Guess what, the *Property types allow you to keep encapsulation! Check:


public final DoubleProperty x = new DoubleProperty(){
    public double get () { return super.get(); }
    public void set (double v) { super.set(v); }
};

In the example (similar in structure to C# properties), I can add any code to the overriddenget() and set() methods, just like in Java Beans accessors. Of course, for the common need of reacting to value changes in the setter, JavaFX’s properties have a specialized notification mechanism. This style just won’t go well with inheritance, but this doesn’t matter: I always declare all my Java Beans accessors final; both because overriding these is not good OO in my book, and because polymorphic self-calls are dangerous in constructors – but constructors should initialize attributes with setters, to not risk skipping or duplicating setter’s logic like validation of illegal arguments.

The remaining issue is reflection or other dynamic mechanisms to discover and handle properties; supporting multiple possible coding styles would be harder than a single style, but not that much. The whole thing needs more radical improvements. Java Beans was already an afterthought; the API part (java.beans) was never a great design: partially obsolete, not a good foundation for higher-level features like validation, conversion or binding.

Whither Compiled Bind?

We lose JavaFX 1.2+’s compiled bind, still a work in progress but already quite good in v1.3.1 (JavaFX Balls was a benchmark of binding among other things!). But compiled bind had a cost; javafxc had many advanced optimizations for high-level language features (and still a good backlog of desired optimizations), but its optimizations often had major space or code-size tradeoffs. The good news is, JavaFX 2’s binding and properties don’t need those compiler optimizations. Everything is pay-as-you go: inx.add((view.getScaleX() - 1) * 25), notice the mix of methods like add() and operators like ‘–’ and ‘*’. JavaFX Script’s compiled bind was necessary because every variable was an observable value, so the compiler couldn’t trivially flatten sub-expressions to primitive operators. In JavaFX 2, the programmer does that optimization by judiciously using tree-building methods and observable terms only where necessary.

Beware of visual designers that may allow to write binding expressions but generate code that uses tree-building methods everywhere, or JavaFX DSLs that may reintroduce the weight of “every variable is observable”. Without compiled-bind-like optimizations, these tools may regress to the performance of JavaFX v1.0/1.1. At least for the Visage language, this is not a concern as it inherits javafxc’s technology for properties and binding (although seamless integration with the completely new system of JavaFX 2 may create some difficulties or overheads).

I could probably write a microbenchmark that shows JavaFX 2.0 much slower than 1.3.1 with intense use of complex binding trees. But that wouldn’t be realistic; in practice, the vast majority of bindings tend to be identity binding, or a trivial tree with one or two operators (“keep component X ten pixels below Y”). Then again, if you’re writing high-performance animation, in a system-level language like Java, then blind reliance on high-level features like binding is just wrong. My aggressive use of binding in JavaFX Balls was purposeful to stress-test the platform. I could have avoided all that binding easily; indeed, JavaFX Balls 3.0 can optionally do that, so I can measure the performance impact of binding.

Internally, JavaFX’s runtime contains a ton of code to handle all combinations of operations X observable data types (OK, damn primitive types!…). This code is massively redundant, I’m sure it’s based on template files and macro-expanded for all types. And I wonder if in a future version, most of it could be created dynamically – these days, runtime bytecode generation is just as fast as loading classfiles, and a dynamic system could bring back the Compiled Bind optimizations if at all necessary. JDK 7’s new great support for lightweight bytecode generation – method handles and anonymous classes – will enable a more efficient implementation of JavaFX 2, and also many other frameworks.

Even Better Beans Binding?

JavaFX’s property and binding packages can be used independently from the rest, and they are based on Java Beans, so you can benefit from this even if you’re not sold into the whole JavaFX platform. Higher-level features such as data validation and conversion are missing, but these could be easily built on top of the extensible property APIs. It’s just not a real beans binding framework because the *Property types are required; call that “Java Beans 2.0” if you like. It seems possible to add support to standard property types through additional property / observable / expression classes, using reflection to manipulate common bean properties; this may be a good compromise for some scenarios.

Besides the obvious advantages of having a unified API for properties and binding, JavaFX is also more powerful, type-safe, and efficient than existing options that I know. JavaFX’s package makes no use of reflection and it’s fully static-typed. As for boilerplate code, again I don’t see a better alternative with the current Java language syntax; only support from other JVM languages may improve on this. But the current design is the perfect match for the JVM and Java language’s own design and performance choices – the choices of a system-level library: features, performance and robustness first. You can easily wrap that in an Expression Languageor some other, higher-level layer, if you want.

Sequences and Generators

In a final exhibition of my bitter missing of JavaFX Script’s awesomeness, check this example:

             
JavaFX 1JavaFX 2
… in the middle of an initialization tree …

content: [ Group {
  content: bind for (b in test.balls) { b.view } },
…]

final Group ballsView = new Group();

test.balls.addListener(new ListChangeListener<Ball>() {
  public void onChanged (ListChangeListener.Change<? extends Ball> c) {
     final Node[] views = new Node[test.balls.size()];
     for (int i = 0; i < test.balls.size(); ++i) views[i] = test.balls.get(i).view;
     ballsView.getChildren().setAll(views);
}}
);

… then, in the middle of an initialization tree (with builder) …

.children(ballsView, …)…

This example mashes up everything. The new code suffers from the Java language’s lack of functions, binding, sequences and generators (or any special syntax for collections). But part of the problem is that JavaFX 2’s collections are still in flux. The EA builds had a Bindings.foreach() method that could save me some code, but this method was removed, and it alone wouldn’t completely solve the problem because the type ofGroup.children is just a ObservableList, not aListProperty – there’s no such class. There are alternatives like FilteredList andObjectProperty<T>, but again these don’t completely fill the shoes of a ListProperty. The design space here is difficult because JDK 8 will introduce a rich set of new lambda-based Collections methods, like filter(),map(), reduce() and forEach(), so it’s possible that JavaFX waits for that, to keep its collections aligned with Java’s.

JavaFX Script: Epilogue

In the average, no big loss in the syntax side. Some pros: I don’t have any more to pester the JavaFX Script team ;-) about operators or null handling. I don’t miss Java features, like generics, that JavaFX Script left out due to its focus on simplicity. The new APIs are all revved up to use advanced Java constructs (advanced? In the old APIs, absence of enums was irritating…). No half-baked concurrency; no missing basic features like a map type. We still need language improvements, but there’s progress; JavaFX 2.0’s FCS will actually happen after JDK 7’s so the first batch of apps can already pick up the Coin features, and by that time JDK 8 will be closer with lambdas and other improvements.

The hidden overheads of many JavaFX Script features are gone too. Compiled code is now as tight and fast as possible in the JVM platform. Extra productivity indulgencies will soon be provided by DSLs, libraries in extensible languages, or design tools. By the way, any language that makes your jar files several times bigger than a Java equivalent, and also carries a multi-megabyte runtime, is a hard sell for any target that’s sensitive to loading time and footprint.

Digression: Fixing an old performance bug

I’ve inherited the classic Bubblemark collision-detection code that’s very simple – each ball is hit-tested to every other ball after it in a list (a Bubble Sort pattern). The original benchmark was limited to 128 balls (16K hit tests), and I’ve raised that to 512 (260K tests), a bit taxing but still viable. JavaFX suffered more than other platforms due to property & binding overheads, but even with this handicap it led the competition so I didn’t care to optimize. ;-) Now for JavaFX 2 I wanted to raise the max count again, but then the program hit a brick wall: 4,096 balls needed 8M hit tests per frame, or half a billion for a 60fps goal – impossible. You can only go so far with an O(N2/2) algorithm.

There are many ways to skin this cat, and I decided for a simple spatial index. I also noticed that each ball could bounce many times in a single frame, useless for realism (doing thatright, like in a physics engine, would need extra work). So I’ve added a simple bounced flag to the balls; each ball that bounces against a wall or any other ball is marked and never hit-tested again in the same keyframe. This trick reduces the hit tests, towards a limit of O(N) as scenes get denser. But for non-aberrant densities, the spatial indexing is still much more scalable: the brute-force algorithm improves from 8M to 2,7M hit-tests/keyframe, but the indexing reduces it a thousand-fold to ~8,400.

Concurrent Rendering

The first thing you will notice in JavaFX 2 is that it does event handling and rendering in different threads. And you will notice that… drum roll… because it will likely break your code – either legacy code ported from JavaFX 1, or new code written with old habits in mind, from Swing or other single-threaded UI toolkits or game engines. It certainly broke my initial port of JavaFX Balls (see next section). The JavaFX 2 runtime manages two threads:

  • The EDT (event dispatch thread), identified as the “JavaFX Application Thread”.That’s where all your app code runs by default, including animation events (Timeline action handlers) or any other events.
  • The Renderer thread, identified as “QuantumRenderer”. That’s where your scene graph is transformed, composed, rasterized etc., until you get nice pixels in the screen.
  • The Media thread, identified as “JFXMedia Player EventQueueThread”. This takes care of video and sound playback tasks, such as decoding and buffering.

This is a powerful architecture; multicore CPUs are now the rule even in high-end mobile devices, so if your app leaves at least one core free, you get rendering “for free”. JavaFX’s Prism toolkit is fully GPU-accelerated, but the concurrent renderer is still important, because JavaFX’s rendering is not only rasterization; it involves higher-level steps like CSS-driven layout, scene graph transformation, resource and state management, caching, etc. Compared to other GPU-accelerated platforms, JavaFX has the extra advantage of offloading this high-level rendering from the EDT, increasing multicore efficiency without any effort or risk. Not to mention “business” machines without appropriate support for full GPU acceleration; in this case JavaFX will do software rendering, and if the system has at least a dual-core CPU (or even a single-core but hyperthreaded clunker), the concurrent rendering will provide some advantage.

There are still APIs like Platform.runLater() andjavafx.async; you don’t want to perform slow operations like I/O in the EDT to avoid unresponsiveness to user events. And you can also tap into the full power of JavaSE’s concurrency.

New rules for Timelines

The disentanglement between rendering and event processing results in other benefits. In JavaFX 1, a keyframe that should fire 10ms from now might actually fire only after 15ms, due to high CPU usage or latency from the renderer. Timeline actions ran in lock-step with the renderer, so a low rendering framerate would cause skipping (if allowed byKeyFrame.canSkip), and often need special care from the programmer. Interpolators automatically keep track of the actual firing rate and compensate for jitter and delays, but interpolators are more limited and less general than custom timeline handlers.

In JavaFX 2, your keyframes will always fire exactly when they should fire (unless of course, your EDT is bogged down by some other code – but then, that’s your fault). KeyFrame doesn’t have a canSkip property anymore. For JavaFX Balls, one important consequence is that I can’t use a timeline to measure the animation’s FPS; I used to do that counting the activations of the main animation handler per second. But that code was bad and obsolete even in JavaFX 1; I should use the PerformanceTracker class, which I’m finally doing now. (This is still an internal API, but you can also activate an MBean and get performance data through JMX.) By default Prism v-syncs, so you can’t get more FPS than your monitor’s refresh rate. This can be configured, but the default is the right choice for “real world” apps; and also for good benchmarks, that should lock FPS to typical monitor frequency and stress extra work per frame.

In a final surprise, even with the new hit testing code, the program was still slow in my first tests with high ball counts. Slower than JavaFX 1.3.1!!, what was wrong? It turned out, another side effect of the new concurrent architecture. My old code used aTimeline with a frequency of 1,000 Hz to move the balls (including hit-testing), but in practice this frequency was throttled down to whatever ratio the animation engine could sustain. This reduced the overhead of moving, hit testing, bouncing, and updating view nodes. At 1,000 Hz, the 4,096-balls test (large screen, 1/4th-size balls) would need 8,5 million hit-tests per second.

The solution is obvious for any seasoned game programmer. Instead of a fixed-frequency timeline, I bind the animation to the renderer, to update the scene exactly once after each frame – the well-known “game loop”. How to do this:


new AnimationTimer() { public void handle (long now) { … }}.start();

The new class AnimationTimer is similar toThread, except that it’s scheduled by the animation engine: its handler() method is invoked once after each frame is rendered. This reduced my animation overhead in 16X for a 60fps animation – only 0,5 million hit tests per second; performance was finally normal (and way better than JavaFX 1, as I will soon report). Different from a thread but more similar to a timeline, the same timer can be stopped and started multiple times.

Video and Audio

Motivated by other benchmarks such as Microsoft’s FishBowl, and also curious to have a feeling of the new media stack, I decided to add some gratuitous media to the program. So now when you run JavaFX Balls, press ‘V’ to add a background video behind the balls, and press ‘A’ to enable a ticking sound for ball collisions. (For nostalgia’s sake, I’ve linked to the promotional video from JavaFX 1…)

JavaFX

The video worked as expected, and without significant performance overhead. In one 4,096 balls test, performance dropped from 49fps to 48fps. But then, JavaFX 1 already had video support that was good enough for my usage here.


private static void flipVideo () {
    if (commandLock) return;
    commandLock = true;
  
    if (balls.getChildren().size() == 2) new Task<Void>() {
        protected Void execute () throws Exception {
            final MediaView mv = new MediaViewBuilder().mediaPlayer(
                new MediaPlayerBuilder().media(new Media(“http://…”)
                .autoPlay(true).cycleCount(MediaPlayer.INDEFINITE).volume(0.5).build())
            .fitWidth(Ball.WIDTH).fitHeight(Ball.HEIGHT)
            .opacity(0.25).preserveRatio(false).build();
            Platform.runLater(new Runnable() { public void run () {
                balls.getChildren().add(0, mv);
                commandLock = false;
            }});
            return null;
    }}.start();
    else {
        MediaView mv = (MediaView)balls.getChildren().remove(0);
        mv.getMediaPlayer().stop();
        commandLock = false;
    }
}

The code above either adds a MediaView to the scene graph, or removes it (so the ‘V’ key toggles video on/off). I initialize theMediaView object in an asynchronous Task, because it’s not a guaranteed-low-latency operation and I don’t want to block the EDT. When that’s complete, I must go back to the EDT with Platform.runLater(), to add the media node to the scene graph. I also keep acommandLock flag that, when set to true, causes any concurrent video-toggling to be ignored, while allowing other events to be processed along a slow initialization of theMediaView (always possible with a remote resource). These techniques and best-practices are of course, no surprise for veterans of Swing or other toolkits.

Audio is harder business, due to the requirements of low latency, light weight and scale (support a large number of playbacks at the same time or in quick bursts). JavaFX 1’s media stack simply didn’t have this capability; it was good enough for a great music application, but not good enough to implement even the simplest games with trivial sound effects (ever wondered why not a single JavaFX 1 game sample had sound?). The old media stack didn’t even support the WAV format, which is preferred for sound effects. JavaFX 2’s full-new media stack, based on GStreamer, should fix these limitations.


AudioClip acTick = new AudioClip(…URI…);

acTick.play();

The low-latency API is as simple as it gets: AudioClipkeeps the entire audio media in memory and in uncompressed form; this is essential for minimum latency, so when you callplay() there is no I/O, no buffering and no decoding. I’ve tested it in JavaFX and the first impression was pretty good. Well, except that the brand-new media stack shows its early status in two bugs that I’ve reported: one was a JVM crash (with an EA build – possibly gone in the Beta), and another, a latency problem (still possible to observe in the Beta). The only “new” part of the media stack is the glue between GStreamer and the rest of JavaFX; so I expect that these bugs will be shook out quickly. Meanwhile, I’ve worked around the latency bug once again with an asynchronous task, but this should not be necessary: you should be able to just call play()from the EDT; that method should return immediately, just enqueuing the clip for the JFXMedia thread.

UPDATE: This bug was just fixed, now a cal toplay() is basically zero (small enough to be difficult ot measure due to the accuracy of System.nanoTime()). The fix will be available in a future beta refresh, so I'm removing my workaround and just playing the clip from the EDT.

Odds and Ends

If you have read so far, congratulations for not TL;DR on me! ;-) As a bonus, some items that may deserve deep analysis in further blogs:

  • Great Browser deployment. The new “plugin3” is once again (and as expected/promised) much better than before; another significant step after JDK 6u10’s plugin2. Oracle finally hit the nail on the head, it’s (at long last) in the same league of Flash. Most pending problems discussed here are gone. Only the ultra-cold-start scenario, requiring installation of the JRE and/or JavaFX runtimes, cannot be tested at this time.
  • JavaSE as we know it is deprecated. I wonder how many people realize this; if you don’t, check again Cindy Castillo’s great overview of the JavaFX Architecture. It’s not just a new library of components, animation and rich media. It’s something thatcompletely replaces: AWT, Java2D, Swing, Java Sound, Applets, ImageIO, Acessibility – in short, the entire Client layer of the JavaSE platform. (No, a JavaFX “applet” doesn’t use the java.applet API anymore.) Oracle got rid of the massive legacy of the AWT and everything that was built on top of the AWT; that’s the major reason why the new browser plugin is much better.
  • JavaFX’s runtime is not small, but it largelyreplaces, not adds to, the JRE. Corollary of the previous item. And most of the replacement is much lighter-weight, remarkably due to effective hardware acceleration and massive reliance on native code: the installed Win32 runtime has 14Mb of DLLs, impressive even though this includes the 7Mb WebKit and new versions of deployment components that override the JRE’s. The overall dynamic footprint is down from similar JavaSE applications; the loading time, remarkably improved too. The runtime installer still weighs in almost 13Mb; it may shrink by 1-2Mb after it can rely on newer JREs containing the new deployment parts, but that’s still a nontrivial download size (and this Beta is not yet feature-complete, not to mention future releases). On the other hand, we can expect all sorts of “privileges” like JRE bundling and OEM distribution, so installer size won’t be a big deal.
  • Swing supported, but still in legacy mode.JavaFX 2 did the pragmatic thing here, supporting Swing as well as possible with zero compromises of the new architecture. There’s improved, official support to embed JavaFX components in Swing apps. JavaFX 1’s preview-quality support for the opposite embedding of Swing components in JavaFX scenes is gone as I predicted (just not reasonably possible with Prism). Swing applications that just need a bit of glitz – like a business chart, video playback, or web browser component – will be well served by this support, even if that’s coarse-grained and carrying the still-remaining limitations of embedded “heavyweight components”.
  • The mysterious platforms. The “all screens of your life” plan (JavaFX Mobile & TV) was a flop, at least as originally envisioned. But Richard’s blog mentions that JavaFX will support not only Windows + OSX + Linux, but also “a whole host of different platforms”. Which ones? Even Linux is barely worth the effort, except to court developers (the few Unix-loving Java developers not yet defected to OSX...) Solaris could be in line too just because it belongs to Oracle. But I can’t see anything else to complete that “host”, unless we move to post-PC devices (tablets, TV, etc.). Let’s just wait and see.

Finally, the JavaFX Balls 3.0 application is not yet available; I should publish it soon, in a follow-up blog looking at performance.

opinali

Swing 2.0 is Coming Blog

Posted by opinali Sep 22, 2010

The biggest announcement - and the biggest surprise for many - of JavaOne 2010 was certainly Oracle's new plans for JavaFX 2.0... or, should we say, Swing 2.0?

The history of JavaFX has been contentious since its beginning, when it was clear that FX was a new toolkit, even a new platform, while most people in the brave Swing community wanted a "Swing 2.0". Well, this is basically what Oracle is planning to deliver with JavaFX 2.0 - minus the ugly legacy of the AWT/Java2D/Swing APIs and architecture. JavaFX 1.x was an ideal replacement for Swing, but ideals don't always work; JavaFX 2.0 will be a more pragmatic attempt at "Swing 2.0". It's just not a drop-in update. Swing code won't be trivial to convert. The announced "plugin3", capable of running Prism, will be 100% free of AWT dependencies. Swing interop will keep requiring using the JavaFX Swing toolkit: the one that's bigger and slower.

Non-evolutionary changes are sometimes necessary. The old UI APIs take my blame for Java's near-death on the desktop & web. Don't dream that the problem was just Sun's neglect, could be fixed by new VM optimizations, deployment improvements, JDK 7's Jigsaw... or the Tooth Fairy. As the partial success of the JDK 6uN project demonstrated once again, no amount of tuning or fixing can save Swing. Even in the Swing community, many wished for a compatibility-breaking Swing 2.0 that would just be similar enough to allow easy migration and interop.

From EJB 3.0 / JPA to the most-wanted JSR-310, experience shows that sometimes enough is enough: the only way for frameworks that didn't work, is the highway. This tells me that some people have double standards when they strongly oppose JavaFX because it's not a smooth update for the existing Swing ecosystem.

R.I.P. JavaFX Script

The JavaFX Script language was the big casualty of the new plan, and I will heartly miss it too; that was quite a fun language to play with. But maybe that's mostly because Java was, and still is outdated. Even Java 8, with lambdas and all, won't be as nice as JavaFX Script (at least for UIs).

Some people will be quite happy creating JavaFX applications with the Java language (even Java 6). Plan B is relying on alternative languages such as JRuby, Groovy, Clojure or Scala; the Scala examples look really good, often hard to distinguish from JavaFX Script. If Scala is extensible enough to build some sugar for JavaFX's binding, I'm sold.

Dropping JavaFX Script has some advantages of performance and footprint. JavaFX 2.0 promises improving on 1.x even with its much bigger feature set. If that looks hard, it's easy to understand: first, Prism is more lightweight than the Swing-compatible toolkit. Second, JavaFX Script tends to produce bytecode that's quite bloated, and a bit slower than equivalent Java code (in one experiment, I got a 4% speedup by simply rewriting a trivial POJO-like class in Java.). These inefficiencies affect both application code and parts of the JavaFX frameworks, originally implemented in JavaFX Script - not the perfect tool for the job. Finally, JavaFX Script was still work in progress: I was hoping that they would add more features, optimizations and refinements... but, time's up.

Some critics complain that JavaFX Script was a wasteful diversion, a science fair experiment. I think it was a quite cool language, and the barrier to entry pretty low to any competent Java programmer that would actually spend a couple days with it. And any new big framework needs many months of experience until you really get the concepts and architecture, understand the tradeoffs and performance aspects, and become capable of writing high-quality code. But language barriers-to-entry are more complex:

  • Any JVM framework is better developed in Java. I hate when I read (for example) about Clojure's persistent data structures, as I can't use that in my Java code (at least not in a smooth, natural way). So this great piece of engineering and innovation is locked in the small niche of Clojure fans. Nobody will be motivated to try the full Clojure package after a good experience using its frameworks from Java. From my own small niche of JavaFX Script fans, I didn't mind that others couldn't use JavaFX's awesome frameworks; but this tight coupling clearly didn't do JavaFX any favor. Lesson learned: Write system-level code with the system-level language.
  • Programming language adoption is black magic. Corporate push can only go so far - the developer community can be quite opinionated. But that community alone can also only go so far - languages / compilers / VMs are Cathedrals that typically demand enormous, long-term investment, very organized design and development. On top of that, add: leadership, evolution, standards, timing, luck... for example, I don't think Javascript would've stood any chance in a fair competition with VBScript. Javascript is much better, but it didn't win because it was better. It won because it was there first in a web that was already big, and still owned by Netscape. And because many years would pass before people really had to learn Javascript, to write complex Web 2.0 apps: that would be the opportunity for VB's argument, "it's familiar and Microsoft has a pointy-clicky IDE".Lesson learned: Proposing a new language is always a big risk.

JavaFX Script didn't realy fail, though; it taught us many interesting things, and it paved the way. The future JavaFX 2.0, coded in either "good old Java" or in some high-level DSL-able language like Scala or Groovy, will be a much better system because we have a very clear and concrete vision of the ideal way to do some things. It seems to me that at least part of this technology (the runtime, if not javafxc's code generation) can be reused in a new Java-compatible library they come up with now.

Not a Full Restart

I've seen some skepticism for JavaFX 2.0 plans, claiming that to be a new "full restart" so we'd be heading for another 2-3 years wait until JavaFX is again mature and stable (say, as good as JavaFX 1.3.1 or better). There are some important mistakes here. First, JavaFX 2.0's feature set is much bigger than 1.x. Had Oracle kept the previous course with JavaFX Script, the Roadmap would still be pretty good for next year: new concurrency model; footprint and startup improvements; GA and default Prism; texture paint (thanks!); much expanded CSS support (animations, grid layout); next-gen media stack; WebView (back from the ashes of JWebPane?); HTML DOM and extra browser interop; and a new batch of controls - complex, critical ones like TableView.

Granted, a few items here (Prism, new controls) were originally supposed to GA in 1.4, later this year. The Design Tool was also supposed to hit public beta now. But the major disruption of the new plan is the "black hole" of the next few months, at least until the EA or Beta ships; any code I can write today will have to be ported when 2.0 ships.  JavaFX Script will be maintained for some time, but eventually all code that uses the JavaFX Script language and/or the existing APIs will be legacy code (yeah, I see the smiles of Swing fans for the irony).

JavaFX 2.0 is not a full rewrite. The bulk of JavaFX is already written in Java, C/C++ (the runtime has significant native code), or shader language. Only the higher-level public APIs, like UI controls, are written in JavaFX Script (and I expect even these to rely on some Java when it gets though). Oracle will have to rewrite some parts of the runtime, but even that will be partially a port - I don't think Jonathan will ignore all existing code base and design, and write the new Buttoncontrol from zero, laboriously coming up again with the same rendering and layout algorithms, etc.

Oracle won't start coding all the new stuff next week, when the team resumes from the JavaOne break. It seems to me that they have been working on the new plan for some time now. You can see in the JIRA that the JFXC project looks dead since June when v1.3.1 development finished; 1.3.2 work didn't even start. Oracle has already presented some demos of what JavaFX 2.0 will be, so the new runtime may already have a few months of work behind it - a rough early alpha, at least.  Two big-ticket features, the advanced media stack and HTML5 engine, will mostly integrate/wrap third-party projects; and while still not trivial tasks, it seems that work has started quite some time ago in both fronts.

Finally, JavaFX is increasingly less dependent onany programming language. Releases 1.2 and 1.3 started a big push to move as much content as possible to the FXD format, and to web-happy CSS stylesheets. The roadmap for 2.0 further the trend: animation and layout will be scripted by CSS (using standard CSS specs when available). I didn't hear anything about the FXD format, but I think it will also be maintained and improved. So in the end, we don't really lose much of JavaFX Script's nice declarative syntax. Support for these components - from the CSS and FXD parsers / internal metamodel (already implemented in Java), to the NetBeans plugin and Production Suite (already implemented in Java or native code) - should be unaffected by the transition away from JavaFX Script.

Mark Reinhold announced today that the JDK 7 / JavaSE 7 project has slipped once again: mid-2011 without Jigsaws and Lambdas, late 2012 for JavaSE 8 with those. The delay (or some other bad news like dropping features) was already expected by anyone who tracks the project. But really, how big and bad is this delay?

As a big enthusiast of both Jigsaw and Lambdas - and as a tech writer who just published two massive articles on Java 7 & JavaSE 7 (in the Brazilian Java Magazine) - I was initially... very unsatisfied, to be polite. But doing a reality check, the slip is neither as big, nor as bad as it seems at first sight.

The reason of course, is JDK 6. I have continuously tracking the "post-6uN" releases, where Sun/Oracle continues to push the envelope, delivering as much improvements as they can without breaking their own TCK. See my recent coverage of 6u21 for example. I'm already testing the first build of 6u23, that (besides a bunch of Swing fixes) carries another massive VM update, now to the bleeding-edge HotSpot 19 (the very latest one from JDK 7 at the moment). This includes such high-profile JDK 7 features as the latest G1 collector, the complete VM support for JSR-292, and other items like CompressedOops with 64 Gb heaps, CMS fixes and tons of smaller VM/runtime fixes and improvements.

Version numbers are a somewhat arbitrary thing. Sun changed theJ2SEJavaSE versioning schema a few times, always to the dislike of half the planet. Some people think that 6u10 should have been called 6.1. In this case, I'd rebrand 6u14->6.2, 6u18->6.3, 6u21->6.4 and 6u23->6.5. Maybe we would feel better: "Yeah JDK 7 delayed, that sucks... but at least we're getting 6.5!"

The restriction of no changes in the language syntax or public APIs puts some limits on the features that can be delivered with the (so-called) maintenance updates. But these restrictions are not impossible to circumvent; they are just cumbersome. For example, you may wonder how good is having the VM back-end for JSR-292, if the front-end (the java.dyn APIs, some bits of syntax sugar) cannot be available. It will probably be good enough for the projects that need this the most: dynamic languages that target the JVM, such as JRuby, Groovy and Jython. I suppose that a future update of the JSR-292 backport will be able to make use of the new back-end, perhaps by adding an extension jar with the missing APIs to the bootclasspath. Even the new invokedynamic bytecode, that needs a new classfile version, may not be a big problem: the backport solution might use tricks with "magic" internal APIs that the VM would replace to emit a dynamic invoke; or the VM launcher could have some -XX:+InvokeDynamic option. Both carrying all necessary disclaimers as unsupported, VM-specific extensions. Nothing new here, the JDK already contains literally hundreds of private, magic APIs and options. And the argument of "only runs on Oracle JDK" is much less important than before... which brings my next subject: OpenJDK.

Who cares if something works "only" in the Oracle JDK? Any such feature will also work on OpenJDK, that is free software and (thanks to community projects like IcedTea, Zero and Shark) runs on even more platforms than the official Oracle JDK. (The entire OpenJDK project was a great move by Sun as it basically subsumed the older Free Java projects - these days, nobody cares to test Java apps on Classpath or GCJ.)  I suspect that this thinking will do a lot to promote the Oracle/Open JDKs as an even stronger de-facto standard than before. Another important JavaSE implementation, the JRockit VM, is owned by the same company, and will eventually be merged with HotSpot. So in the end there are only two relevant JDKs: Oracle's and IBM's. All other production-quality JDKs, such as Apple's or HP's, are mostly ports of Oracle's code, so they tend to follow its steps including most extensions. The notable exception is the Excelsior JET static compiler, but they have a smaller market share, and their business depends on having the best possible compatibility with the standard (read: Oracle) JDK.

Additionally, features like JSR-292 are not really vendor-specific extensions. They are just standard features... from the next platform release. I don't see a reason why IBM wouldn't want to make a similar move, and backport their own JSR-292 support from the ongoing IBM JDK 7 project to some update of IBM JDK 6. (Just keep the feature disabled by default, so ultra-conservative WebSphere admins won't freak out.) Last time I checked IBM was making some noise in dynamic languages support, so that would even be a business necessity. Maybe a future release of projects like JRuby would need different extension jars and launch scripts for Oracle/Open JDK and IBM JDK, and that would be all.

Programmers love new syntax and new APIs, and it's much neater when these are included in the core platform: less stuff to install, less worries about support, toolchain or portability. But truth be said, these conveniences are not deal-breakers. New APIs have been traditionally available from multiple sources, from open source projects to the "upper" JavaEE specs; and an increasing number of Java (Platform) programmers are now comfortable with alternative languages and compilers, from Scala and Clojure to Oracle's own JavaFX Script.

My last comment for Oracle: Please include the Tiered Compiler in 6u23, and I will definitely be completely fine with the slip of JDK 7. Quality first, it's done when it's ready, and all that. ;-)

If you want to work for DropBox, they have an interesting programming test which solution must be submitted together with the CV. I’m not considering a position at DropBox, but their test was too fun to ignore: an interesting challenge in algorithms, and another opportunity to exercise JavaFX as any geometric problem surely deserves some GUI.

(Don’t read this blog if you actually plan to apply for a job at DropBox.  I don’t think the company would use this problem as its single method of recruitment; this is more valuable for the candidate.)

The problem is to find the ideal packing of some number of arbitrary “boxes” (rectangular objects), so the total area is minimized. The boxes can be rotated if necessary, and the area of the smaller rectangular area that contains all packed boxes is the solution.

Like most geometric problems, this is easy to solve by sketching some empiric solution and pondering how to translate it to an algorithm. The first, somewhat obvious idea:

  • Sorting the boxes: First place those with bigger areas, to avoid inefficient placement that could happen when trying to place large boxes on a very fragmented free space caused by smallish boxes.

I don’t have a formal justification for this heuristic... it's just a reasonable analogy with similar fragmentation problems, from memory allocation to disk filesystems. Partitioning by size is usually a good idea.

Let's start the implementation with the sorting function and some random utilities:

function area (c: Point2D)     { c.x * c.y }
function area (r: Rectangle2D) { r.width * r.height }

class RectComparator extends java.util.Comparator {
    override public function compare (r1: Object, r2: Object) : Integer {
        (area(r2 as Rectangle2D) - area(r1 as Rectangle2D)) as Integer;
    }
}

function sort (rects: Rectangle2D[]) {
    Sequences.sort(rects, RectComparator{}) as Rectangle2D[];
}

I was surprised that I needed to write a Comparatorclass like that. I can't just pass an equivalent  JavaFX Script function to the sort() method, because the language doesn't support SAM conversion (like proposed for Java 7). I wonder if some kind of support for Java's generic types would also be needed in that case, so afunction (r1: Rectangle2D, r2: Rectangle2D) {...} would be type-safely converted to aComparator<Rectangle2D>.

Also, I had to write helper functions for areas. I noticed thatjavafx.geometry is much simpler than the equivalent AWT package. JavaFX APIs generally have a minimalist design, due to footprint especially on its lower profiles; but I wonder if we could have some extra power here, at least for the Desktop profile. Sophisticated manipulation of geometric models is something I see a significant number of JavaFX applications doing. Thejavafx.scene.shape is much more complete, but its classes are UI components and not adequate for anything else.

The second idea came easily too:

  • Candidate positions: Have a list of possible positions for box placement. In the initial state, the only candidate position is the origin (0, 0): the top-left corner of a virtual bounding box that grows to contain all placed boxes. This list will be updated as each box is positioned.

The figure below shows the virtual bounding box (thick lines); the initial candidate position (red disk); the next box to place (dashed line); and the ideal position for that box (blue arrow). This ideal position is calculated by our brain, not by the algorithm: we're still working on the problem of finding some algorithm that will select this position!

box1

Placing the first box is trivial because there’s only one candidate position, but this trivial special case doesn't help much to further reveal the algorithm.

Let's write down the data structure that will represent each iteration, or step, of the solution:

class State {
    public-init var output: Rectangle2D[] = [];
    public-init var candidates: Point2D[] = Point2D { x: 0 y: 0 };
    public-init var limits: Point2D = Point2D { x: 0 y: 0 };
}

In this class State, I have a sequence of output rectangles (the already-placed boxes), a sequence of candidate positions, and the limits of the virtual bounding box of all placed rectangles. This class will be immutable; functions that operate on its data will be member functions (methods) of State.

box2

I consumed the original placing position, and this reveals two new interesting candidate positions: the northeast (top-right) and southwest (bottom-left) corners of the placed box. The southeast (bottom-right) corner, or other positions (e.g. at the middle of some edge), don't make sense. All boxes must be tightly packed, so each new box should go as much north and west as possible, always touching other boxes' edges (or the origin axes).

This looks like a promising solution, at least a good start. In fact, this solution is already good enough for the simple 3-box problem from DropBox’s challenge. Now I can envision the core algorithms for box placement:

  • Placing a box: For each new box, try placing it in all candidate positions. Try this twice per box - with its original shape, and rotated (inverting its height and width). Discard placements that create intersection with any existing box. Calculate the total area for each attempt, and choose the option that delivers the smaller bounding area.
  • New candidate positions: Remove the position consumed by the placed box. Add the NE and SW corners of the placed box.

Here is the code for the box-placing part:

function flip (r: Rectangle2D) {
    if (r.height == r.width)
        null
    else
        Rectangle2D { width: r.height height: r.width }
}

function intersects (placed: Rectangle2D) {
    for (o in output) if (o.intersects(placed)) return true;
    false
}

function pack (next: Rectangle2D) {
    var bestLimit = null;
    def currArea = area(limits);
    var bestArea = Float.MAX_VALUE;
    var bestRect: Rectangle2D = null;
    var bestCand: Point2D = null;

    for (n in [ next, flip(next) ], c in candidates) {
        def newLimit = Point2D {
            x: max(c.x + n.width, limits.x),
            y: max(c.y + n.height, limits.y)
        }
        def newArea = area(newLimit);

        if (newArea < bestArea) {
            var placed = Rectangle2D {
                minX: c.x minY: c.y width: n.width height: n.height
            };

            if (not intersects(placed)) {
                bestLimit = newLimit;
                bestArea = area(newLimit);
                bestRect = placed;
                bestCand = c;
                if (newArea == currArea) break;
            }
        }
    }

    State {
        output: [ output, bestRect ],
        candidates: nextCandidates(bestCand, bestRect),
        limits: bestLimit
    }
}

Function Scene.pack() is basically half the solution; it depends on a nextCandidates() function that we'll see later. It also uses two new helper functions, flip() andintersect().

Notice that flip() returns null if the rectangle is square - in that case we don't try the same placement twice! The use of this null value is a JavaFX Script trick. The loop for (n in [ next, flip(next) ]) iterates a sequence that contains the "next box" and also its flipped version. But if flip(next) returnsnull, that sequence will contain only next, because sequences cannot contain nulls; sequence construction silently ignores nulls.

One important decision is the handling of multiple candidate placements that don't increase the current bounding area - an extremely frequent event, for smaller boxes that fit in some existing "hole". My first instinct was trying a best-fit decision, scanning all candidates and having a bias for placements closer to the top-left corner or some other heuristics. But the performance cost of this decision was big, because it implies in a full Cartesian product of boxes X candidate positions; and the result in better overall placement was virtually zero. So in the end I opted for a mix of best/first-fit strategy: keep searching candidates, but abort that search when I find the first placement that is "good enough" - doesn't increase the bounding area - the critical variable that the algorithm must optimize.

box3

Back to our boxes... We placed the second box; so far so good, but look at the next box... this will cause us some trouble.

box4

Now the rule for adding new candidate position has failed; the new set of candidate positions is clearly not sufficient. The ideal position for the next box is just under the last placed box, touching its bottom edge; but with the X coordinate more to the left, touching the right edge of the bigger box.

Our original rule was too simple, it ignores the gaps caused by placements that don't preserve the neat, ladder-like arrangement of the first three figures.

At this point in the problem, there are some possible solutions:

  1. Brute-force. Add to the candidate list the full Cartesian product of the right and bottom coordinates of each placed box X all existing boxes.

This will produce a large number of candidate positions, most of them useless, like this:

box5

This solution works; the problem is scalability. The number of candidate positions will be roughly O(N2) on N = number of boxes, even with some rules to discard a few positions (repeated positions produced by different combinations of boxes; positions already used by some placed box; SE position of any placed box). Positions that land in the left or top edge of any box are especially useless, can't place boxes of any non-zero area. You could trim such useless positions, but that would be expensive, requiring intersection tests with all existing boxes.

  • 2. Rocket-science. We could design a smarter placement algorithm that uses the candidate positions, but is able to "shift" blocks to the left (or to the top) until they touch another box.

box6

The picture shows this idea: the green arrow is the shift-left that will adjust the placement from the initial candidate position. How to implement this? I considered mapping the free space, starting with the bounding rectangle for all previously-placed boxes, then subtracting all these boxes to produce a shape containing all the free space... then partition this shape into a list of rectangular free-slots... so I can finally make intersection tests between my next rectangle and any free slots immediately to its top or left... this would be a potentially incremental process, stopping when there are no more free neighbors or when I intersect some other box.

This solution will also work; the problem is, it's complex. I am basically implementing a small physics engine - just add gravity, momentum, mass centers, some Newton's formulas... and my head is starting to hurt!

Remember that I will have to do this smart-ass stuff forall candidate positions X all "next" boxes! The preprocessing of the initial state (like creating a map of free space) can be performed incrementally from each state to the next; but the remaining effort per iteration is still significant, and the code would be quite complex.  Overall, it doesn't look much better than the brute-force approach.

  • 3. Think, Iterate, Refine. I have actually considered both previous solutions before I found the ideal solution - and as you will see, it can be considered a refinement of the previous idea.

I have changed the algorithm to create the new candidate positions after placed boxes. Now it reads like this:

  • New candidate positions: Remove the position consumed by the placed box.  Add the NE and SW corners of the placed box. For the SW corner, find the intersection with the right-most edge of all existing boxes at its left. For the NE corner, find the intersection with the bottom-most edge of all existing boxes above.

You will notice that this simple scan of intersections is a watered-down version of the "rocket-science" solution: this effectively finds the positions that would be created by shifting boxes to the left or top. The next picture illustrates the result; the green arrow shows the SW-originated candidate position that was shifted left.

box7

At this point I considered the algorithm complete and proceeded to code the GUI, write this blog, analyze the results. But I eventually found a small problem. What happens if, in the state above, the next box to place has a different shape so its ideal position is that inner corner (red circle right above the one pointer by the arrow)? In this case, the position that we just shifted to the left will be unusable because it will be in the middle of the new box's left edge. That position is not even removed from the candidates list, because it was not used as a placement position.

box8

This last figure shows what should happen: while placing the central square box as indicated, we must find any existing candidate positions that are captured by its left edge, and move these positions to the right (green arrow), so they coincide with the right edge of that box, at the same vertical coordinate. This trick will keep the candidate position useful, for example for the smaller square box as indicated by the figure. Similar handling will be applied to candidate positions captured by the top edge of a placed box.

The final algorithm:

  • New candidate positions: Remove the position consumed by the placed box. Add the NE and SW corners of the placed box. For the SW corner, find the intersection with the right-most edge of all existing boxes at its left. For the NE corner, find the intersection with the bottom-most edge of all existing boxes above. Move any position captured by the left edge of the placed box to its right edge. Move any position captured by the top edge of the placed box to its bottom edge. Avoid any duplicates.

So now we can code it...

    function findIntersections (ne: Point2D, sw: Point2D) {
        var bestX = 0.0;
        var bestY = 0.0;
    
        for (o in output) {
            if (o.maxX < sw.x and o.maxX > bestX and
                    o.minY < sw.y and o.maxY > sw.y)
                bestX = o.maxX;
                    
            if (o.maxY < ne.y and o.maxY > bestY and
                    o.minX < ne.x and o.maxX > ne.x)
                bestY = o.maxY;
        }
    
        Point2D { x: bestX y: bestY }
    }
    
    function nextCandidates (usedCand: Integer, next: Rectangle2D) {
        def ne = Point2D { x: next.maxX y: next.minY };
        def sw = Point2D { x: next.minX y: next.maxY };
        def inters = findIntersections(ne, sw);
    
        [
            for (c in candidates) {
                if (indexof c == usedCand) null
                else if (c.y == next.minY and c.x >= next.minX and c.x < next.maxX)
                    Point2D { x: c.x y: next.maxY }
                else if (c.x == next.minX and c.y >= next.minY and c.y < next.maxY)
                    Point2D { x: next.maxX y: c.y }
                else c
            }
            ne, sw,
            if (inters.y == ne.y) null else Point2D { y: inters.y x: ne.x }
            if (inters.x == sw.x) null else Point2D { x: inters.x y: sw.y }
        ]
    }
    

    I ignore intersections that coincide with the original candidate positions - a common case, that happens when no shift is possible because that position is already touching a placed box or the axes.

    I consider the code above "self-documenting" - it reads almost identical to the English statement of the algorithm. In a real project, I'd only add comments to explain the reasoning of the algorithm (why these intersections and moves must be performed, etc.). This is my litmus test for readable code: you don't need comments to explain what it is doing. Higher-level language syntax makes this obviously much easier to achieve.

    That's it - our solution is almost complete. It's only missing the "driver" function that packs all boxes from an input sequence:

    function pack (input: Rectangle2D[]) {
        var state = State {};
        for (next in sort(input)) state = state.pack(next);
        state
    }
    

    Now let's write some test code. I want to create a sequence of random boxes.

    def rnd = new java.util.Random(77);
    
    function randomInput (size: Integer) {
        def skew = if (size <= 10) 5 else 10;
        for (i in [1 .. size]) Rectangle2D {
            width:  round(pow(2, (rnd.nextFloat() * skew)))
            height: round(pow(2, (rnd.nextFloat() * skew)))
        }
    }
    

    An exponential random distribution produces more interesting input data. Also, I'm forcing the random seed to a fixed value so I can run reproducible benchmarks.

    User Interface

    Finally, let's make some GUI - a JavaFX program wouldn't be complete without that :-)

    def SIZE = 512;
    var solution: State = null;
    var solutionDebug = false;
    
    function solve (panel: Panel, count: Integer) {
        panel.parent.scene.stage.title = "Boxes: {count}...";
        def input = randomInput(count);
        def start = DateTime {};
        solution = pack(input);
        solutionDebug = false;
        def time = (DateTime {}).instant - start.instant;
        var totalArea = 0;
        for (r in solution.output) totalArea = totalArea + area(r) as Integer;
        var limitArea = area(solution.limits);
        def scale = SIZE / max(solution.limits.x, solution.limits.y);
        panel.parent.scene.stage.title =
            "Boxes: {count} Usage: {100.0 * totalArea / limitArea}% "
            "Time: {time}ms ({(time as Float) / count}ms/box)";
        panel.content = [
            Rectangle { fill: null stroke: Color.YELLOW
                width: solution.limits.x * scale height: solution.limits.y * scale
            }
            for (r in solution.output) Rectangle {
                x: r.minX * scale y: r.minY * scale
                width: r.width * scale height: r.height * scale
                strokeWidth: 0.5 stroke: Color.BLACK fill: Color.rgb(0,
                    (r.minX * scale * indexof r) mod 256,
                    (r.minY * scale * indexof r) mod 256)
            }
        ]
    }
    
    function showDebug (panel: Panel) {
        if (solutionDebug) return;
        solutionDebug = true;
        def scale = SIZE / max(solution.limits.x, solution.limits.y);
        def fill = Color.rgb(255, 0, 0, 0.25);
        insert for (c in solution.candidates) Circle {
            centerX: c.x * scale centerY: c.y * scale radius: 8 fill: fill
        } into panel.content;
    }
    
    def stage = Stage {
        width: SIZE height: SIZE title: "DropBox JavaFX - Press SPACE, 1..0, ENTER"
        scene: Scene { content: Panel {
            layoutInfo: LayoutInfo { width: SIZE height: SIZE }
            onKeyTyped: function (e) {
                def panel = e.node as Panel;
                if (e.char.compareTo('1') >= 0 and e.char.compareTo('9') <= 0)
                    solve(panel, 100 * Integer.valueOf(e.char))
                else if (e.char.equals('0'))  solve(panel, 1000)
                else if (e.char.equals(' '))  solve(panel, 10)
                else if (e.char.equals('\n')) showDebug(panel);
        }}}
    };
    
    stage.scene.content[0].requestFocus();
    stage
    

    The GUI is quite simple. If you press any key, the program creates a new input dataset, solves the box placing problem, and creates Rectangle components that show the solution ("placed" boxes). Most of the code is concerned with secondary stuff, like measuring execution time and computing and formatting some interesting statistics. There's also some scaling logic to fit the result neatly in the scene, and some pseudo-random color computing for a clear and interesting display. The dataset will have 10 boxes if you press SPACE, or 100 ... 1.000 boxes for '1 ... '0'.

    You may notice the awkward code that maps keys to box counts. I can't do arithmetic like e.char - '0' becausee.char is a String; the language has noCharacter type. Writing e.char.charAt(0) is no help, that returns the same single-char string (this is bad enough to make this Java API useless, so that's an important interop bug). I can't typecast e.char to any integral type. In short, no easy way to manipulate a character as a numeric value. On top of that, e.code is useless for my purposes, because theKeyCode type doesn't expose an ordinal number, so no luck writing e.code - KeyCode.VK_0. Even a range check likee.code.compareTo(KeyCode.VK_0) doesn't work: this compiles, but returns bogus results!. (The language has noenum construct.) Finally, JavaFX Script has noswitch/case, and it doesn't support a native map type construct that could allow me to build aKeyCode->Function mapping; programmers depend completely on the if-else-if... syntax for multiple-choice decisions. Now add this to the difficulty of making range checks on typed keys (or any enum-like API), and the result is awkward.

    I love JavaFX Script, but it does show a few traits of an Ivory Tower design: a language that didn't yet mature in the trenches of real projects, as it fails in such a common "pragmatic" idiom like mapping a range of keys to numeric values through simple arithmetic. Yes, it's trivial to call a small Java class to compensate for these limitations; but this facility should not make the language designers complacent, sitting on top of basic deficiencies. Even the DSL status is no excuse - my code needs to handle some keyboard input and this certainly belongs to the scope of a GUI DSL; if the language can't do that in a simple and elegant way, it has to be fixed.

    DropBox

    The picture above - click it to launch a JNLP app - would make Piet Mondrian proud! Notice the semi-transparent red circles showing the full list of candidate positions at the final state: these circles don't appear by default, you must pressENTER after the solution.

    The efficiency of the algorithm looks pretty good: packing densities are usually in the 96%-98% range for 100 boxes, and never below 99,4% for 1.000 boxes, for my input data. My naked-eye analysis of a couple dozen solutions didn't show any obvious failure of optimal placement (well, not after all algorithm iterations summarized above). But this is by no means a formal proof; I doubt the algorithm is optimal. I didn't make any research to find my solution, but I know this is an important and well-researched problem.

    For one thing, my algorithm uses a mix of first-fit decisions (for performance) and best-fit heuristics (for good placement). Also, my code does local optimization (finding the best placements for each box independently), instead of global optimization (considering all boxes at once). An ideal solution should be both purely best-fit, and globally-optimizing (perhaps through backtracking, or maybe smarter preprocessing like pre-grouping boxes with edges of similar sizes). The ordering or bigger boxes first seems to be a very good heuristic, but it's not sufficient for global optimization.

    Performance

    I'm basically done with algorithm work, so now it's the right time to think about code efficiency. I fired up the NetBeans profiler and looked at one run for 1.000 boxes. On HotSpot Server:

    BoxProfileS

    The biggest offender is intersects(). I expected this function to be the most called, but I didn't expect it to use so much time! The problem of course, is that each of the N boxes have at least one candidate position, and each position must be tested for intersection against many previously-placed (up to N-1 at worst-case) boxes.

    I added some code to display the final number of candidate positions, and total number of invocations to intersects()and intersection tests inside that method:

    • Iterate rotation, then candidates: 2.740 candidates, 438.019 calls, 54.142.832 tests
    • Iterate candidates, then rotation: 2.719 candidates, 652.823 calls, 84.139.905 tests

    If you asked why pack() does have the iteration c in candidates in the innermost loop and n in [ next, flip(next) ] in the outer loop, that's why: it's more efficient to scan the candidates in the inner loop, which really makes sense. Anyway the number calls to intersects() is roughly O(N2), while the number of intersection tests is roughly O(N3). (The latter seems to be closer to O(N3 / 20), but in Big-O analysis this dividend is basically irrelevant.)

    One obvious solution is sorting or indexing. If the candidate positions were sorted, I could narrow the search and only perform intersection tests against a relatively small number of boxes. But any single-dimensional ordering, e.g. by distance from origin, would at best allow me to cut intersection tests by a linear factor; moving from O(N3 / 20) to (say) O(N3/ 100) doesn't seem worth the extra code and effort. I clearly need something that cuts time quadratically, so the total intersection number becomes O(N2). Some ideas are Binary Space Partitioning, Quadtrees, or maybe just a simple grid (e.g., I split the space in regions of 10 x 10 units of length, and I keep track of which boxes and/or candidate positions belong to each region). But I have to finish this blog some day, so this improvement is left as an exercise for the reader. ;-)

    How fast is the code in real-world units? I benchmarked with the following procedure: start the program; run a single untimed 1.000-box test for warm-up; then run timed runs for 100 ... 2.000 boxes.

          
                                                                                        
    BoxesHS ClientHS ServerCli / Srv
    1000,10 ms0,05 ms2,00 X
    2000,18 ms0,05 ms3,60 X
    3000,28 ms0,13 ms2,15 X
    4000,36 ms0,16 ms2,25 X
    5000,52 ms0,22 ms2,36 X
    6000,66 ms0,29 ms2,27 X
    7000,86 ms0,37 ms2,32 X
    8001,04 ms0,44 ms2,36 X
    9001,27 ms0,54 ms2,35 X
    1.0001,56 ms0,66 ms2,36 X
    2.0006,25 ms2,40 ms2,60 X
    DropBoxPerf

    HotSpot Server is ~2,35X faster than Client, but both degrade performance in a curve of similar shape. A linear regression of the most regular section of the series (for Server) results in the formula 0,0002 * N1,5, better than expected, but our data series is limited and the cost of per-box intersection tests extremely small. At higher box counts (many thousands and up) it's likely that the curve would reveal a higher exponent.

    My code is not tuned for performance; for one thing, I could easily get rid of some object allocations - including the full recreation of the output and candidatessequences at each step of the solution. JavaFX Script is not Lisp - its sequences are not linked lists. It's not Clojure either - its sequences are "persistent" but their structure is not optimized for frequent updates of a few items. Anyway, allocation and GC are not a problem here. A 1.000-box solution causes 73 Mb of data to be allocated and collected, in 17 GC events, total time 13ms (0,83% of the total execution time for HotSpot Client), average pause time 0,7ms, heap occupation min = 3,8 Mb / max = 8,4 Mb, and back to 2,2 Mb when the solution is complete (even with the JavaFX GUI up). These numbers look excellent.

    But I am a stubborn, low-level, optimization-obsessed hacker, so I had to make the experience of changing some code to avoid the perceived language inefficiencies. I did these changes (MainJ.java in the project):

    • Using a java.util.LinkedList (instead of a sequence) for candidates and output;
    • Abandoning all functional purism: these sequences, and theState object, are updated in-place at each step;
    • Defer some object allocations (newLimit andplaced in the pack() function), even at the inconvenience of writing code like intersects(c.x, c.y, n.width, n.height);

    As I expected, these changes saved a lot of memory churn. Now the program burns only 13,3 Mb for the 1.000-box test; this is more than 5X better, a really amazing feat so I was all slapping on my own back... and performance (for a 2.000-box test) improved ~5% for HotSpot Client. But it degraded ~20% for Server, which was shocking. JavaFX Script's are really very efficient, but remarkably when they benefit from a superior JIT compiler. Even when you consider that the VM that's available for RIA clients is just HotSpot Client, the 5% speedup is probably not worthy of the much uglier, "optimized" code.

    The exercise revealed a little undocumented secret of JavaFX Script: its for loop can iterate Java collections, so I could declare output: LinkedList, and the codefor (o in output) would still compile and run perfectly! The major inconvenience was adding typecasts, because generic types don't carry over to the JavaFX side.

    Inspection of javafxc-generated bytecode shows a lot of useless overhead (e.g. binding support, missing sequence optimizations that I've already blogged over). This may explain the very big advantage of HotSpot Server: it does a much better job of removing redundancy (this is what advanced code optimization is all about). I expect the Server x Client gap to decrease with time, asjavafxc evolves to produce more efficient bytecode.

    More Exercises for the Reader

    In the final algorithm for new candidate positions, the positions created by intersection with left/top edges are added together with the base SW/NE positions. Why can't we drop these base positions and only add the ones created by the intersections? At least in this figure, the base SW position seems redundant.

    For large box counts like 1.000, you'll notice that most runs produce a solution with very skewed shape. Either the total width is equal or little bigger than that of the widest box, or the total height is equal or a little higher than that of the tallest box. Can you tune the algorithm to have a bias for a more "square" result, or for a specific shape? (The easiest way of course is using a fixed-size bounding box, which is an important real-world scenario; but I don't want that).

    Evolve the algorithm and the code (including the GUI) to a 3D version, e.g. for filling shipping containers. :-)

    Now that JDK 6u21, JavaFX 1.3.1 and NetBeans 6.9.1 are all finally released, I'm back to checking the latest news and improvements in JavaFX. The official Release Notes points to the deployment improvements as the single new end-user feature, so I've checked the latest improvements in this area.

    The really major feature of this release is for developers: debugging and profiling will now, well, work as expected. With my excuses to the javafxc team that worked a lot to make this happen, it's just not exciting, headline-worthy material... still I have some comments about the compiler update in the end.

    It's the Deployment, Stupid!

    JavaFX's feature set is decent at least since 1.3, although more is needed / is coming (TableView, etc.). But even by 1.2, deployment was already perceived as the biggest problem - for any Client Java, including Swing and runners-ups like Apache Pivot. Arecent blog from Max Katz summarizes this well, section The Ugly stuff. Java's deployment has been ugly for so long that it takes some faith to believe it can be fixed.

    Faith has been rewarded, though, even if not with miraculous speed. The JDK team continues the "6uN" project, delivering incremental client-side improvements in a roughly half-yearly pace. The latest such release is 6u21, which brings these significant enhancements:

    • Java HotSpot v17: Not client-specific, but extra VM performance never hurts! More fresh goodies backported from the JDK 7 project, including memory management improvements (better CMS and G1 GCs, Escape Analysis, 64-bit CompressedOops, code cache changes to reduce the risk of PermGen blowups).
    • Support for Custom Loading Progress Indicators: Used to great effect by JavaFX 1.3.1.
    • The usual batch of bugfixes: Java2D, AWT, Swing, plugin2 and other components. Remarkably for Java2D, a fair number of font-related fixes, including rendering quality enhancements and optimizations.

    For JavaFX, 1.3.1 is a maintenance release, but it boasts one significant new feature, the new Application Startup Experience including a cool new progress indicator and more fine touches. Let's check Max's comments:

    1. Browser freezing for cold-startup of Applets:Problem persists... why? How difficult is to initialize a plugin in the background, not holding any browser thread / lock / whatever until fully initialized? There are excuses for the JVM's loading time and resource usage (big, 15-year-old platform...); but the Browser Freeze From Hell is hard to accept, remarkably after the plugin's 6u10 full rewrite.
    2. The dreadful animated Java logo, that didn't report real loading progress: Gone! Point scored.
    3. Scary security dialog: That's slowly improving. The main thing is avoid nagging for apps that shouldn't have security concerns. JDK 6u21 fixes one bug involving drag&drop and security. Other recent JDK builds have tweaked more details (but, too bad that 6u18's removal of mandatory codebase didn't work, reverted in 6u20 - temporarily?). You can go very far with the current FX platform without signing or permissions; and when you need that, a "scary" one-time dialog is the right thing to do. A RIA runtime should not be a big backdoor. Notice that unneeded security warnings are sometimes due to poor application programming / packaging.
    4. Error reporting: This is still an issue. The Java Console is great for developers, but a fiasco for end-users. It should be disabled by default, and a new, user-friendly mechanism is needed to report any failures. That must be native code, so it works even if something like JAWS fails.

    Some extra items that come to my mind:

    1. Installation: Also slowly improving. JavaFX 1.3 modularized its core runtime so most apps won't need to fetch all of it from the web (or even from the plugin cache) if they don't use certain feature (like the javafx.fxd andjavafx.scene.chart packages). JavaFX 1.3.1 further hides the annoyance of first-time FX execution behind the progress indicator.
    2. EULA: Argh, 1.3.1 still shows an atrocious EULA dialog when you run the very-first FX app. But that's probably hopeless.
    3. Redundant Plugin: Anybody noticed thatall modern browsers have an OOPP (out-of-process plugin) architecture? Java 6u10 created its own out-of-process layer (jpl2launcher), but this is not required for the latest browsers. When I run a Java applet in (say) Firefox 3.6.4+, I get this Java plugin process and also Firefox's ownplugin-container process. That's stupid, one extra process (and I guess one extra level of IPC indirection, security barriers...), to achieve nothing. The plugin should detect these new browsers and then just run "in-place" (that is, isolated in the plugin container process).
    4. Plugin/Browser Artifacts: I only see refresh and scrolling artifacts, for Java Applets, on Firefox 4, and only with the new Windows rendering acceleration (Direct2D / DirectWrite / Layers) turned on. FF4 is still beta, with many known bugs on the new accelerated pipeline, so that's hopefully just a transient browser issue.
    5. JNLP files: WebStart's launch experience is polluted by these files, that are dumped into some temp directory and appear as regular downloads in your browser's download manager or statusbar. If you're unlucky enough, you'll even get a download/run confirmation dialog. The reliance on .jnlp's MIME type is fragile, both in the browser and in the server. The extra HTTP request may be noticeable for small applets over slow connections. The plugin and JAWS should create some alternative mechanism. Like an <object> tag containing all JNLP metadata - hey, that works for both the old-and-busted Flash and the new-and-cool Silverlight, so I assume that <object> is just great: so, anybody care to explain why was the external JNLP file invented in the first place?

    In my account, that's 2 items fixed (2, 8) and 2 items improved (3, 5) out of 9 issues. That's not bad, but Oracle definitely needs to keep working hard and fast, remarkably on item 1 that is very often the single big offender in "Java-powered" webpages.

    The Undocumented Bits

    JavaFX 1.3.1 published some important new documentation: the complete CSS Reference Guide, and a good FXD Specification. Keep 'em coming!  We're still missing some important docs:

    • Updated, complete, high-quality JavaFX Script Language Reference and formal spec;
    • Javadocs for the Preview controls This is here, missed it somehow!!
    • Any official information about the Prism toolkit;
    • Any official "internal" information, e.g. for the various system properties that can be used for tuning and diagnostics.

    I understand that all these items are in the "under construction" category - even the JavaFX Script language is still a moving target, although not as fast-moving as in the past. But some docs help a lot the early adopters, enthusiasts andbetapreview testers.

    Testing the New Deployment

    I exercised 1.3.1 by updating all the JavaFX applets and JAWS apps in my blogs: JavaFX Balls, StrangeAttractor and Game of Life. I performed some startup tests - warm, cold, and "freezing-cold" (cleaning my plugin cache to force reload of the JavaFX runtime). The combination of the latest Java and JavaFX runtimes, and the updated JNLP files to request the new progress indicator, have definitely improved all scenarios of startup. So, this feature worked as advertised, pretty good! Definitely not yet in the Flash league of startup experience and speed, but definitely another significant step forward.

    Now, what about real-world, complex JavaFX apps? The samples are updated, but these are smallish. So I went back to the Vancouver Olympics applet, the big showcase of JavaFX 1.2. Unfortunately it's not updated; not even to 1.3.0, it loaded the crappy old JavaFX 1.2.3 runtime with the now-ancient-and-crude startup experience. Yeah I know that these Winter Games are long over, but their site was apparently a partner of Oracle to promote JavaFX, so I'd expect them to keep it updated to the latest JavaFX release. On the plus side, Sten Anderson's great Music Explorer FX was updated; try it. Maybe Oracle's marketing dept is just pouring cash in the wrong pockets! ;-)

    Testing the New Compiler

    I've also rebuilt all jars, but that's just for my second test: Checking if the updated javafxc compiler had any improvement or regression. As I blogged before, javafxc 1.3 was a step forward in performance (remarkably the compiled-bind optimization), but a step backward in code size; but many further optimizations are planned, including several ones to reduce current space/speed tradeoffs. 1.3.1's major theme for javafxc is the JDI support; its list of fixed bugs doesn't seem to contain any code generation improvements (the next batch seems to be planned for 1.3.2). But... who knows? Also, I was worried if the big debugging work could mean a regression in compiled code size: maybe the new compiler would generate additional debug information, annotations or helper code.

    So, I've repeated my simple Static Footprint benchmark. The numbers for 1.3.0 are slightly different due to a few small updates in my programs. I've also added a pack.gz metric that shows the smallest possible deployment size, just for the classes + META-INF data (removing any resources, such as images, from the source jar). Finally, the stripped metric is for a .pack.gzthat's additionally stripped of any debug info (with the--strip-debug option; the pack200 tool won't do that by default!).

                                         
    ProgramJavaFX 1.3.0JavaFX 1.3.1
    HelloWorld2 classes, 2.579 bytes
    pack.gz: 954 bytes
    stripped: 782 bytes
    2 classes, 2.731 bytes (+5,8%)
    pack.gz: 1.024 bytes (+7,3%)
    stripped: 876 bytes (+12,0%)
    JavaFX Balls20 classes, 118.559 bytes
    pack.gz: 18.787 bytes
    stripped: 13.828 bytes
    20 classes, 116.539 bytes (-1,5%)
    pack.gz: 17.935 bytes (-4,5%)
    stripped: 13.852 bytes (+0,1%)
    Strange Attractor57 classes, 393.332 bytes
    pack.gz: 21.119 bytes
    stripped: 13.040 bytes
    57 classes, 378.641 bytes (-3,7%)
    pack.gz: 20.038 bytes (-5,1%)
    stripped: 13.167 bytes (-0,1%)
    Interesting Photos46 classes, 457.561 bytes
    pack.gz: 49.587 bytes
    stripped: 37.842 bytes
    46 classes, 438.619 bytes (-4,1%)
    pack.gz: 46.073 bytes (-7,1%)
    stripped: 37.845 bytes (0%)
    GUIMark27 classes, 224.904 bytes
    pack.gz: 17.224 bytes
    stripped: 12.955 bytes
    31 classes, 217.390 bytes (-3,3%)
    pack.gz: 16.552 bytes (-4,0%)
    stripped: 12.989 bytes (-0,2%)

    The first results look surprising: all programs (except the unrealistically-small HelloWorld) show improved code size, up to 4,1% smaller without pack200 compression, and up to 7,1% smaller with Pack200. These would be excellent numbers for a maintenance update that's not supposed to contain any code-size optimization! But checking the stripped numbers shows virtually identical sizes. The conclusion is that all differences are very likely just side effect of the debugging support changes. The updated javafxc is smarter, producing debug info that's both smaller and better.

    Of course, for stripped bytecode there is no advantage at all. But the advantage of non-stripped files is still important, because Java developers very rarely strip debug info. (The NetBeans project settings page doesn't even offer an easy checkbox forpack200's --strip-debug option; I'd bet that many Java developers don't even know that such option exist.) Besides that, the fact that the maintenance for debugging support didn't cause any regression in code size, is another good news.

    Missing Deployments

    The new SDK contains a new /runtime directory with the redistributable Desktop runtime. But it's not clear if we actually have the right to redistribute these files, and under which conditions - I didn't find a redistribution license. This option is very important for some people; we need some enlightenment about this.

    The absence of a redistributable JavaFX Mobile package is quite remarkable. The mobile runtime was updated in both 1.3 and 1.3.1 cycles, it just wasn't released to the public, so the only version of JavaFX Mobile that you can actually install in a real handset is the now-Jurassic v1.2. JavaFX's mobile plans are stuck for non-technical reasons, as Oracle probably works on its strategy; the M.I.A. JavaStore may also be part of the same imbroglio. The JavaFX TV runtime is not available either, although its status doesn't seem so bleak (it's not late to the race; it didn't have a faux pre-launch like JavaFX Mobile and Java Store; and it depends on Prism and other components so its non-shipping status may be just for the reason of not being ready).

    Well, that was the speculation we already did by 1.3's launch. Now with Oracle's moves against Android, we may just be watching the beginning of the next chapter. I have already posted some thoughts on a specific, technical part of this debate; but I'm holding my breath for the final consequences for everybody - Java and Android developers. In my dreams, my next smartphone would be a 'droid that could also run JavaFX programs... let's see how all this works out. Hopefully we only have to wait another month, as Larry Ellison and Thomas Kurian will spill the beans about Java Strategy and Directions. We need directions indeed, they can't come soon enough.

    I was doing some JavaFX hacking, and I had to create a sequence initially full of zeros. How can you do that? There's apparently only one way:


    var bits = for (i in [1..64]) (0 as Long);

    Problems: First, I need a loop - OK, a comprehension - to initialize the sequence. There is no syntax, no API helper or type constructor, that directly expresses "Long[] with N elements". I could use a literal like [0, 0, 0, 0, ...], but this doesn't scale to large sizes.

    Second, I have to write the (0 as Long), because JavaFX Script doesn't support Java's type suffixes like 0L for zero-as-Long. JavaFX Script drops many complexities from Java; but dropping the numeric type suffixes looks like a wrong move. I mean, it's not like Long numbers are some niche feature. And 0 as Long is butt-ugly.

    JavaFX Script should try being as close to Java as reasonably possible, given their different design criteria. I'm OK with big diffs like no generic types (don't fit in FX's complexity budget), different attribute and method/function declarations (required by FX-specific features), and most other changes. But numeric literals is something I'd expect FX to just clone from Java. It's also missing Java 5's hexadecimal FP, and will soon miss Java 7'sunderscores and binary base (I'd vote to add all these in FX 1.4). These are compile-time features, no cost of any kind. You don't need it, you don't use it -- no impact on APIs, no interaction with other language features. And Java's rich numeric syntax could also be useful in the FXD spec.

    Sequence optimizations are not powerful enough here. For the alternative code var bits:Long = for (i in [1..64]) 0, the compiler will create a sequence of type Int[], then in the assignment to bits it will invoke a helper method that converts that to a new Long[] sequence. This could be fixed by new optimizations: in code like xx = for (...) y, where xx is a sequence of x and y needs conversion to x, the compiler could first perform a high-level rewrite to xx = for (...) (y as x), avoiding the cost of allocating a temporary sequence yy only to immediately need a copy-with-conversion to xx. Notice that the per-element conversion (y as x) is often a zero-cost operation, like in our example ofInteger->Long


    LongArraySequence jfx$177sb = new LongArraySequence();
    int i$upper = 64;
    for (int i$ind = 1;; i$ind <= 64; i$ind++) {
        int i = i$ind;
        long jfx$178tmp = 0L;
        jfx$177sb.add(0L);
    }
    $bits2 = (Sequence)Sequences.incrementSharing(jfx$177sb);

    The decompiled code above shows that there are no optimizations for initial capacity. The internalLongArraySequence class contains the necessary constructors that take an initial size as argument; but this is probably only used internally, by runtime code written in Java - the javafxc compiler has no intelligence yet to use it.

    There are two ways to fix the performance of this code:

    1) (The Wrong Way) Adding special syntax to allocate a sequence of fixed size, e.g. new Long[64] or justLong[64].

    This is the "wrong way" because we're thinking in Java, not in JavaFX Script. First, generator syntax is already good and terse enough. Second, the proposed syntax makes more sense to work with mutable sequences, and this is not the paradigm that JavaFX Script's sequences are pushing. Sequences are immutable (more exactly "persistent", to use a Clojure term); onlysequence variables are mutable, each mutation creates a new sequence. This is expensive even though javafxc does some optimizations to minimize the churn.

    Having said that, sometimes we need mutable sequence variables, and the costs can be low if we program carefully, and if the compiler does its part - that's what I am missing here. And there isn't really a better way (except "impure" ways, like using nativearrays or Java classes).

    2) Adding the necessary optimizations, so a forthat produces a sequence filled with a single value will do it as fast as possible.

    Or, just make the for generator efficient. First, preallocate the sequence whenever the size can be detected. Second, bulk-fill the sequence [slice] by invokingArrays.fill().

    The latter optimization is very interesting to discuss:

    - It's only valid if the body of the for loop is either a compile-time constant, or a functional-pure expression that also doesn't depend on variables changed inside the loop. Thejavafxc compiler already does simple pureness detection (for 1.3+'s binding), and I hope that will keep improving because this makes many new optimizations possible and even easy.

    - Arrays.fill() is JavaSE-only, so this optimization wouldn't be supported when compiling with -profile mobile. But this can be worked around with some runtime-lib stubs.

    Without these enhancements, programmers are temped to drop to native arrays (ugly), and invoke Arrays.fill() manually (non-portable). We can imagine a new API with methods to fill a sequence [slice] with a single value, and other bulk-ops like copy. But this is not the "JavaFX Way". Like said before, youare supposed to use for-comprehensions. It's certainly cleaner than something like "bits = new Long[size]; Sequences.fill(bits, 0, sizeof bits, value)" - yuck!!

    Yes there's already a javafx.util.Sequences API with functions like sort() and binarySearch(), but these are complex enough to deserve APIs... at least in the current language. Simpler things like Sequences.max() would disappear in a language with some extra functional programming tricks, e.g. max = reduce(seq, >, Long.MIN_VALUE). [You can do that today, but not with enough clean code or enough performance.]

    javafxc's next steps?

    For the JavaFX Script Compiler project, 1.3 was the Compiled Bind Release, and 1.3.1 is the Debugging Release. 1.3.2 will apparently be mostly a maintenance release fixing low-priority JDI and binding bugs, and a few optimizations - at least JFXC-4388: Iterating over sequences created by bound for loops is very slow is planned and very important (this actually seems to be a closure optimization, not a sequence optimization). Next feature release, 1.4 (Presidio), will address many binding optimizations that slipped the 1.3 deadline - remarkably code-bloat fixes to reduce the speed X size tradeoff from the initial Compiled Bind. I see plenty of sequence-related fixes too, but not (yet) significant sequence optimizations.

    There is a bug JFXC-1964: Umbrella: Sequence optimizations, but this bug (now in "After-Soma" limbo) was from the JavaFX 1.0-1.2 releases. It enrolls 27 bugs, 24 of which are fixed. Most of the described optimizations seem to be "fundamental" things (e.g. flattening optimizations), or low-handing fruit things like looping with straight indexing instead of Iterators. There's no [visible] plans, yet, to enable a good set of higher-level sequence optimizations. Given the importance of sequences in JavaFX Script, I hope this won't take too long :-) it seems to me to be the next logical step to enhance the implementation of the current language. There are other open avenues, like closure optimizations (but I guess these will have to wait for JDK 7... JavaFX will eventually have to sync with Java's lambdas/closures, both for interop sake and also to benefit from new VM support).

    A high-level language like JavaFX gives the source translator vast opportunity to enhance code performance, without imposing any cost to the runtime (library size, warm-up time, memory usage or anything). This is very different from the tradition of the Java language, which syntax is very close to the "machine", sojavac is purposefully a non-optimizing translator: the responsibility for optimization is fully in the shoulders of the runtime. A design that doesn't always win, because new language features are often introduced without sufficient new support from the bytecode/VM. Or because the weight on the runtime's shoulders has long become an important problem - remarkably for client-side or mobile apps, needing fast startup and low resource usage.

    Before Java, I was a C++ programmer and I appreciated that the compiler would do a massive optimization effort. Long build times could be avoided with a mountain of hacking (precompiled headers, incremental linking...) and even if they are big, that's a good tradeoff for the very best application performance. Java changed this upside-down by moving all optimization to the runtime; and this has advantages like portability, dynamic features, and dynamic optimizations that often beat C/C++. But Java's move was maybe too radical. Over the last few years and releases, we've been slowly compensating this with some efforts to move overhead to the compile- or installation-time: the Class Data Sharing (CDS); the bytecode preverification (created for J2ME and adopted by JavaSE 1.6); the JIT caching of some JVMs; hybrid AOT+JIT VMs like JET. New languages like Scala, JavaFX Script, Clojure and JRuby seem to be yet another evolutionary step, as they offload more optimization responsibility back to the source compiler. JavaFX is a layer atop Java and it can't fix problems like long warm-up and poor memory sharing of JVM processes; but it can (and must) not add any extra runtime load.

    I've finished the development of my Game of Life, with a couple final fixes and new features... including a solution to the bad performance reported before. Once again the work has uncovered some surprises; read on.

    Un-Scripting JavaFX Script

    The first version used a "scriptish" style, all code thrown in a single .fx file, only average effort in structure. Now I have three files: World.fx with the World class (data model and Life algorithms); IO.fx with new support for loading patterns; and Main.fx with the UI. This refactoring required declaring some classes, functions or properties to public[-read|-init]. Some extra noise, but the Java veteran inside me feels much warmer and fuzzier with encapsulated code. I still appreciate though, the facility to bang prototype code without thinking about such issues.

    I'm a bit annoyed with the absence of privatevisibility, but arguably that's unnecessary: if you have global functions/variables or multiple classes in the script, you are likely in the prototype stage and won't bother with encapsulation. On the other hand, I'm worried that the javafxc output uses public visibility for all source features, losing VM-level enforcement of visibility.  The bytecode contains some annotations like @ScriptPrivate but these serve only to the compiler, they are ignored by the VM's classloading and verification. You cannot trust JavaFX's visibilities for security purposes. A more important impact, perhaps, is that bytecode optimization/obfuscation tools can't take full advantage of restricted visibility for closed-world analyses.

    Some I/O and Parsing

    Several people complained that it's too much work setting Game of Life (GOL) patterns manually, one click per cell. The Internet is literally infested with GOL resources. (Indeed, the web can be divided in four major groups: Game of Life; Fractals; Retrocomputing; and Boring sites. Thanks to me, java.net is just moving out of the Boring category.) There are many popular GOL programs for all computers since the ENIAC, and they have developed a few standard file formats, the most popular being LIF and RLE (each with a couple variants...). The LIF (Life 1.06) format is braindead simple, it can be parsed with very modest code:


    function parseLIF (text:String):Point2D[] {
        for (cell in text.split("\n") where indexof cell > 0) {
            def xy = cell.split(" ");
            Point2D { x: Integer.valueOf(xy[0]) y: Integer.valueOf(xy[1]) }
        }
    }

    My parsing functions uses JavaFX's Point2D as a cell coordinate; the output is a sequence of such coordinates for all "live" cells in the pattern. I can use Java's string manipulation facilities including regular expressions, so the job is pretty easy. JavaFX's sequences and generator contribute again for minimal coding.

    Problem: I've used String.split(), not available on the JavaFX Mobile platform. The compiler will catch this only if I reconfigure the project for JavaFX Mobile.

    RFE for the people writing IDE plugins: allow me to create a project of type "JavaFX library", that I can configure to any of the JavaFX profiles including common, so the compiler will allow me to use only the strict set of APIs (from both JavaFX and the underlying Java runtime) that are guaranteed to exist in the selected profile. Notice that themobile profile is a proper subset of desktop but only for the JavaFX APIs; for the underling Java APIs this is not true, I cannot use mobile as a G.C.D. configuration for code that should run in any profile, because this would allow the project to use JavaME-specific APIs that are not available in JavaFX Desktop (even CLDC alone, includes at leastjavafx.microedition.io - the base GCF package).

    This remembers me that the Generic Connection Framework is a great API that should really be available on JavaSE. That was the plan of JSR-197, but unfortunately this idea never took off. JavaFX lacks a full-blown I/O API; javafx.io is a good start as an "80/20 rule" API for higher-level needs, but many complex programs will be tied to tons of java.io / java.nio / java.net / ..., or equivalent JavaME APIs. Except that they wouldn't, if the GCF was an official part of JavaSE. Perhaps Oracle should promote this idea - add the JSR-197 jars to the JavaFX runtime as an extension package (i.e. a separate jar, only downloaded or loaded by apps that need it). But inclusion in JavaSE would be much better, perhaps also solving that platform's embarrassing deficiency of standard support for some kinds of I/O (yes I know about JavaComm, which is another part of the problem, not the solution.)

    But the LIF format is very dumb (bloated files); what you really want is the RLE format:


    function parseRLE (text:String):Point2D[] {
        def lines = for (line in text.split("\n") where not line.startsWith('#')) line;
        def header = for (l in lines[0].split(", ")) l.substring(l.lastIndexOf('=') + 1).trim();
        def x = Integer.valueOf(header[0]);
        def y = Integer.valueOf(header[1]);
        var currX = 0;
        var currY = 0;
        def run = new StringBuffer();
        for (line in lines where indexof line > 0) {
            for (i in [0 ..< line.length()]) {
                def c = line.charAt(i);
                if (Character.isDigit(c)) {
                    run.append(c);
                    null
                } else {
                    def len = if (run.length() == 0) 1 else Integer.valueOf(run.toString());
                    run.setLength(0);
                    for (l in [1..len]) {
                        if (c == '$'.charAt(0) or currX == x) {
                            ++currY;
                            currX = 0;
                        }
                        if (c == 'b'.charAt(0) or c == 'o'.charAt(0)) {
                            def cell = if (c == 'o'.charAt(0)) Point2D { x: currX y: currY } else null;
                            ++currX;
                            cell
                        } else null
                    }
                }
            }
        }
    }

    The big, outer for loop that contains most ofparseRLE() will produce (and return) a Point2D[](the return type declaration is optional, the compiler could infer it). The entire state machine that parses the RLE format is inside this for. Each step through the state machine will either deliver a Point2D value that is appended to the return sequence (actually, to a sub-sequence that is eventually flattened into the return), or a null value that is ignored (JavaFX Script's sequences cannot contain null; insertingnull is a no-op). It's a nice example that justifies both the auto-flattening and the restriction of nulls. These features let me code parseRLE() in a quasi-functional style, without any ugly explicit sequence mutation. The only explicit variables are the locals currX and currY, part of the state of my state machine. The remaining state is the iteration variables line, i and l, but these are all "managed" - JavaFX Script fixes Java's mistakes by not allowing user modification of loop control variables, and also function parameters. This makes the for construct functional, unless you throw extra variables and assignments.

    This is the popular Glider pattern in RLE format:


    # The Glider
    x = 3, y = 3
    3o$o$bo

    The most remarkable thing in parseRLE() is the ugly handling of characters, e.g. if (c == '$'.charAt(0)).JavaFX Script doesn't have a first-class character type, a common trait of scripting languages. The problem is, JavaFX Script does not "box" chars - coming from non-FX APIs like String.charAt() - into strings of length 1. These chars remain with the Character type. But the language doesn't have a character literal syntax; '$' is a string and not a character. Writing if (c == '$') will grant you a compiler error about incomparable Character andString types.

    RFE: Either add a character literal syntax, or promote chars to strings (but with the necessary unboxing optimizations, to keep the efficiency of a simple charwherever possible).

    The problem is bigger than his, however; even strings are a second-class type in JavaFX Script. It seems to me that strings should be handled as a special kind of sequence, which elements are characters (or 1-char strings). I want to iterate a string with for (c in line); I want to get a substring with slicing syntax like line[5..<10]. Today you can declare a variable with the sequence type Character[], that is even optimized internally with a special-cased sequence class CharArraySequence; but that is completely unrelated to the String type.

    First-class support for strings could be added as compiler sugar. The same old good, efficient and interoperablejava.lang.String class could be used to store string data, without any extra wrapper; but the compiler would overload the syntaxes of sequences and for to handle strings. As a simple example, the code:


    noSpaces = for (c in line where not Character.isSpace(c)) c;

    could be de-sugared into this (Java) code:


    StringBuilder noSpaces$sb = new StringBuilder();
    for (int c$index = 0; c$index < line.length(); ++c$index) {
        char c = line.char(c$index);
        if (!Character.isSpace(c)) {
            noSpaces$sb.append(c);
        }
    }
    String noSpaces = noSpaces$sb.toString();

    Now I know that this is easier said than done, because it's not just throwing a handful of special-case translations. The right way to do this requires that strings and sequences are "normalized" to a sufficiently homogeneous AST, so the code generation is able to implement either common or separate handling as necessary, for every combination of strings vs. other kinds of sequences, as well as with other language features.

    The language already performs some custom handling of strings, for interpolation with {}. A great start, but we need more :) besides sequences integration, first-class (and portable) regex support would be another hit. This obvious RFE is already filed asJFXC-2757: JavaFX Script should support regex literals, and as the comments explain, it's not that as easy as in other languages that have this feature because there are interactions with binding and triggers. (But this means also, that first-class regex would be more powerful than in other languages.)

    Reading from the Web

    I won't embed Life patterns in the program; it will fetch these from the web. The site conwaylife.com contains many patterns, well organized and available in stable URLs and in several formats. The front page is also a great Life Java applet, a surprise for me because it loads very fast and smooth. When I wrote the original Life blog & program, I didn't find this superior Java GOL (but that's a very complex, optimized implementation - the Game of Life (and cellular automata in general) allows some crazy optimizations - not adequate to my purposes).


    public class LifeRequest extends HttpRequest {
        public-read var result:Point2D[];
        override var onInput = function (is) {
            try {
                def sb = new StringBuffer(is.available());
                while (is.available() > 0) sb.append(is.read() as Character);
                result = parseRLE(sb.toString());
            } finally {
                try { is.close() } catch (e:IOException) {}
            }
        }
    }

    Class LifeRequest makes an HTTP request to an URL that contains a Life pattern in RLE format, then reads the input stream and parses it. Yeah the code that consumes the stream is stupid (one byte at a time). But it seems the underlying HTTP stream - for the record, a FX-specificcom.sun.javafx.io.http.impl.WaitingInputStream - is buffered; I didn't notice any performance impact reading large patterns. Once again I wish we could have some extra string power, or perhaps higher-level I/O APIs. I cannot use methods likeread(byte[]) because I don't want to write a Java class just to allocate a nativearray. And I don't want either, to rely in additional JavaSE-only APIs likeBufferedReader; even with that wouldn't help a lot - I'd still a loop, invoking readLine() for each line and using a StringBuilder. What I really need is an API that "slurps" the whole stream into a string. Or perhaps something more JavaFX-style like being able to create a "view sequence" of several component types (think java.io buffers); this would probably need the language to offer forward-only sequences that support sequential iteration but not random access (but this opens yet another big avenue of new language designs... let's skip that).


    def patterns = [
        "b52bomber", "B-52 bomber",
        "blinkerpuffer1", "Blinker puffer",
    ...
    ];

    Follows a static list of the Popular Patterns offered by the site mentioned above. This is a simple list of key/value pairs, where the key is part of the URL that will fetch the data. Except of course, that this is a flat sequence. Now I can plug once again my favorite RFE: I need a native map data type. :-)

    /*** Begin Digression: How "complete" should JavaFX be?

    At this point, I hear some people screaming - "just use Java!!" for these things that JavaFX is not yet ideally suited, like nontrivial string manipulation or I/O. And not pile layers of new RFEs demanding the language to become morepowerfulcomplex and the javafx.* APIs more completebloated.

    In fact, I often don't even need to drop to Java code, I can just use Java APIs directly, in the Java way (without insisting in support for sequences and other JavaFX features) but all inside normal JavaFX Script functions. That would be uglier JavaFX Script code, but would arguably be smoother than moving some code into a separate .java source, with a different syntax and harder integration e.g. for methods that would need to call back into JavaFX Script objects. (Only problem here is that I cannot allocate a nativearray from JavaFX Script.)

    All so-called scripting / higher-level languages assume that you may have to fall back to "system" code for some tasks. That's why languages like Ruby, Python, Perl etc., have a system interface (to C language / native shared libs) that's much less torturing than JavaSE's JNI. For alternative JVM languages it's even better, the system fallback usually means calling Java classes, not C/native code. Even with issues like the SE-vs-ME fragmentation, Java is usually an order of magnitude better than C as a system-level language for carrying the load that a higher-level language cannot. (For the few exceptions, there's still JNI so you lose nothing... well, except for that JNI=torturing detail.)

    The only issue of course, is where exactly to draw the line. People coming from Java may consider JavaFX already good enough. You can't build a complex app in pure JavaFX, but so what? "It's a goddamn UI DSL! Just use Java for any non-UI work." I don't see that way, I think JavaFX has great potential to be a great platform on its own.

    Even if you by the DSL argument, the frontiers between application layers is blurred and dynamic... even in a well-architected front end, the UI typically shares significant code with other layers: POJOs, validation, general utilities. And you have lots of communication between these layers, e.g. querying some business Facade to populate a form. This is typically smooth when all layers share a single language and SDK, but much harder otherwise. And what happens when you change your mind or find a design mistake, and need to push a bit of code from one layer to another? Any refactoring that straddles a barrier of language/SDK will be much more difficult, certainly beyond the ability of IDEs's automatic and safe refactoring commands... Obviously, it's much more convenient being able to code the entire application in a single language/SDK. Then you fall back to the system level in a much more limited and ad-hoc manner, e.g. to optimize a performance-critical algorithm, or to better reuse a system library that doesn't have a wrapper for the higher-level language/SDK, or for legacy support, etc.

    The high-level language/SDK should provide at least the reasonable basics, on all fundamental features. That RFE for a built-in map type is fundamental, because you can govery far with "only" sequences and maps, while only sequences is definitely limited (if you ignore performance, having only maps would be less limited; maps are more general). But having a very rich data structures library, like JavaSE'sjava.util, is not fundamental - I'd say >95% of the Collections API are just performance optimizations (or convenience algorithms/APIs e.g. Stack) over the basic list/sequence & map that most scripting languages offer as their single built-in data structures.

    Notice that language-integrated data structures are very powerful; the compiler can often perform decisions such as selecting a specialized implementation of sequences or hashtables that's more efficient for a specific program usage. You don't need manual choices such as ArrayList vs. LinkedList: you trust the compiler to do that choice. Only when the compiler fails in such magic optimizations, and only when that failure is found to be a significant performance problem, you optimize it manually.

    I don't want to bloat the JavaFX APIs either, but many interesting FX-specific APIs could be implemented as a thin layer over some SE/ME-specific APIs. We still need that FX layer because it makes the same features more portable, and more powerful and easier to program as the API can take advantage of features like binding, sequences and first-class functions & closures. This is again not different from other JVM languages, see for example Groovy or Scala. Both communities seem to believe that it's worth the effort and runtime size, to either wrap or replace many Java APIs like JAXP, Swing, Collections, concurrency, JDBC; or to provide full-new frameworks for critical tasks like web development. Not to mention the languages that are independent from the JVM and carry over their own completely independent set of standard libraries for everything, plus big app frameworks (e.g. Rails for JRuby).

    Compared to these languages, JavaFX would need less and lighter API wrappers. The language is very close to Java (Groovylooks closer to a superset of Java; but Groovy's dynamic typing and high reliance on metaprogramming make it actually much less close than the surface syntax suggests). Different from the likes JRuby, there's no need to support any feature or library that was not designed for the JVM. Different from Clojure, there's no radical paradigm shift towards full-blown functional programming. I think we could have a nice set of "thin wrapper" APIs, with very small weight in runtime size and CPU/memory overhead, to cover a very good range of extra functionality like XML(*), I/O, concurrency, perhaps some enterprise / distribution stuff (CDI and some extra client-side support for trivial consumption of EJB / JMS / JAX-WS servers), etc. The NetBeans JavaFX Composer already has some draft of this - if you add a JDBC Data Source to your design, Composer will spit ten .fx files into your project - a thin FX API for things like RecordSet. But everybody hates these IDE-proprietary libraries. I guess that in the future, these will evolve into official JavaFX APIs, e.g. javafx.sql. The canonical example is JavaSE 6's GroupLayout, first born as a proprietary library of the NetBeans "Matisse" Swing editor.

    (*) Yes JavaFX does XML, but it's a simple API with its own small parser implementation. The same is true for some other JavaFX APIs that one could imagine to be thin wrappers for Java APIs. This is actually nice for light weight (no Mb-size parser like Xerces making your applets slower to load) and portability (exact same parser implementation used in all JavaFX profiles). But some apps will need the full power of JAXP, and JavaFX could make this power available, with a friendly JavaFX wrapper, at least for the higher profiles like desktop and tv.

    End Digression: How "complete" should JavaFX be? ***/

    Back to the UI...


    def patternCB = ChoiceBox {
        layoutInfo: LayoutInfo { width: 160 }
        items: for (p in patterns where indexof p mod 2 == 1) p
    }

    This new ChoiceBox allows me to pick one of the patterns.


    onMouseClicked: function (e:MouseEvent) {
        if (e.button == MouseButton.PRIMARY and not
                (e.altDown or e.controlDown or e.shiftDown or e.metaDown)) {
            world.flip(xx, yy);
        } else {
            def req:IO.LifeRequest = IO.LifeRequest {
                location: "http://www.conwaylife.com/pattern.asp?p={
                    patterns[patternCB.selectedIndex * 2]}.rle"
                onDone: function () { world.set(xx, yy, req.result) }
            }
            req.start();
        }
        toolbar.requestFocus();
    }

    I've changed the existing mouse event handler: now only the left mouse button will toggle a cell. For the right button(UPDATE: or your Mac's single-button + any control key), I pick the ChoiceBox selection, do some simple arithmetic to get its "key", build a full URL, then invoke the LifeRequest. I provide a onDonehandler that passes the result (as well as the closured-captured cell position) to the new World.set() function:


    public function set (x:Integer, y:Integer, cells:Point2D[]):Void {
        for (cell in cells) {
            def xx = (x + cell.x) as Integer;
            def yy = (y + cell.y) as Integer;
            if (xx >= 0 and xx < SIZE and yy >= 0 and yy <= SIZE)
                this.cells[yy*SIZE + xx] = true
        }
    }

    The latter is pretty easy. It would be half the size if I didn't have to cast Point2D's coordinates to Integer(this reuse of Point2D was questionable... but I'm lazy). Notice that the Life pattern is contained in a rectangle, and I rubber-stamp the live cells in that rectangle to the world, using the selected cell as the top-left corner.

    Exercise for the reader: (or maybe I will do it later) Make the right-click-down event activate an outline rectangle with the exact width/height of the selected pattern; so at right-click-up the pattern is actually set in the world. This needs reading the pattern at right-click-down, so you know its shape... a better idea is reading it even before, when theChoiceBox selection is set or changed; just do that in background so the UI doesn't freeze. Then the pattern loading would appear to happen instantly. In a variant of this idea, instead of a boring rectangular outline, the preloaded pattern could be overlaid (with the obvious translucency-with-radial-fade effect) on top of the live world, until you "drop" it in the desired position.

    Behold!...

    The finished program, for this version - click to launch.(If you didn't read the whole blog: use right-mouse click, or click while pressing any control key, to load the selected pattern at the cell under mouse pointer.)

    Life2

    The source code is now 3 files and ~200 LOC, including imports and metadata for 25 patterns. Notice that the "oscillator" patterns are also good for performance benchmarking.

    The screenshot above is taken with Prism; it's noticeably different from the previous screenshot (antialiasing of the rectangle borders). I'm not sure which toolkit is "wrong" here, but most likely Prism as it is still in early access, and its output looks more "blurred".

    Performance Mystery I: JavaFX Script Functions

    The JavaFX team clarified to me that they don't recreate the internal scene graph nodes after property changes (like I do withRectangle.fill); this destroys my obvious shot for the cause of bad performance. On the other hand, they found that text formatting and rendering (for my status label) was a bottleneck (at least for the simpler testes without actual Life action). Part of the problem here is bug JFXC-3483: Use of String.format for string concatenation hurts performance.

    I tried now simple quick profiling with the NetBeans Profiler, and a lot of cycles go in binding (remarkably runtime methods likenotifyDependents()), and in several compiler-generated methods like World$1Local$57.doit$$56(). As it turns out,javafxc is compiling some of my functions into something... different. My World.life() method, that calculates the new state of a single cell, contains an inner class1Local$57; this class is a closure that captures all local variables from the life() method (the parametersx and y, the local count, and the receiver this). In short, the entire content of the life() function is wrapped as a closure. This is the (decompiled) code generated for the "do it" method of the closure. (The mangled names and synthetic methods should disappear in JavaFX 1.3.1, thanks to JDI support - at least in the debugger and profiler, but not in decompiled bytecode.)


    public boolean doit$$56() {
        _cls57 receiver$ = this;
        VFLG$Local$57$count = (short)(VFLG$Local$57$count & 0xffffffc7 | 8);
        applyDefaults$(0);
        _cls57 _tmp = this;
        int yy$ind = Math.max(y - 1, 0);
        for(int yy$upper = Math.min(y + 1, get$SIZE() - 1); yy$ind <= yy$upper; yy$ind++) {
            int yy = yy$ind;
            int xx$ind = Math.max(x - 1, 0);
            for(int xx$upper = Math.min(x + 1, get$SIZE() - 1); xx$ind <= xx$upper; xx$ind++) {
                int xx = xx$ind;
                if(elem$World$cells(yy * get$SIZE() + xx))
                    $Local$57$count = get$Local$57$count() + 1;
            }
        }
        return get$Local$57$count() == 3 || get$Local$57$count() == 2 &&
            ((Boolean)isLive$bFunc$int__int(FXConstant.make(Integer.valueOf(x)), 0,
            FXConstant.make(Integer.valueOf(y)), 0).get()).booleanValue();
    }

    This code is pretty good... except for all the closure overhead. The closure class contains several other methods, and invocations to life() must go through all this baggage including allocation of the closure, extra indirection for locals lifted to the heap, and full binding support for locals (!). This overhead is not related to the first-class status of JavaFX Script's functions (a different, very efficient mechanism is used to wrap functions into values).

    The life() method finishes invoking another function,isLive(), which is compiled with even extra weird stuff (name mangling, different calling convention) that's due to being abound function.

    And it gets worse: if I add to life() a conditionalreturn statement before that function's end, thisreturn is compiled as a closure's non-local return. That means raising a (runtime-internal)NonLocalReturnExceptionthat will be handled by the (also generated-code) caller. Non-local returns are necessary to allow the code inside a closure tobreak/continue a loop that contains the closure, or return from the method that contains the closure. Java exceptions are a great mechanism to implement non-local returns. But it seems that javafxc is abusing this technique, using the non-local return exception for trivial return statements that are not non-local returns - in javafxc-generated closures, no less. Also, it seems the technique is not implemented efficiently, showing Throwable.<init>() as the third top CPU hotspot in one of my profiling sessions.

    Then I further investigated this issue, and discovered that this trivial optimization...


    function life (x:Integer, y:Integer) {
        var count = if (cells[y * SIZE + x]) then -1 else 0;
        for (yy in [max(y - 1, 0) .. min(y + 1, SIZE - 1)])
            for (xx in [max(x - 1, 0) .. min(x + 1, SIZE - 1)])
                if (cells[yy * SIZE + xx]) ++count;
       count == 3 or count == 2 and isLive(x, y)
        count == 3 or count == 2 and cells[y * SIZE + x]
    }

    ...would change the generated code into:


    public boolean life(int x, int y) {
        World receiver$ = this;
        int count = elem$World$cells(y * get$SIZE() + x) ? -1 : 0;
        int yy$ind = Math.max(y - 1, 0);
        for(int yy$upper = Math.min(y + 1, get$SIZE() - 1); yy$ind <= yy$upper; yy$ind++) {
            int yy = yy$ind;
            int xx$ind = Math.max(x - 1, 0);
            for(int xx$upper = Math.min(x + 1, get$SIZE() - 1); xx$ind <= xx$upper; xx$ind++) {
                int xx = xx$ind;
                if(elem$World$cells(yy * get$SIZE() + xx))
                    count++;
            }
        }
        return count == 3 || count == 2 && elem$World$cells(y * get$SIZE() + x);
    }

    The whole closure overhead was gone. No closure class anymore. A single method is generated, which bytecode is just as efficient as what javac would produce for equivalent Java code. No locals lifted to the heap, no extra binding support, etc. Notice for example, the simple "count++" instead of the previous gobbledygook "$Local$57$count = get$Local$57$count() + 1".

    The big performance screwup was the fact that I was invoking abound function, isLive(). This caused the caller function life() to "inherit" a ton of overhead that's apparently necessary to deal with bound functions. But this is probably a compiler bug/limitation, because isLive() is not itself a bound function, unless I don't understand the reason for that compilation strategy.

    The bad news is that javafxc has some potential performance bugs (or missing optimizations):

    1. Inefficient use of NonLocalReturnException: a) Use in places where it is apparently not necessary; b) should reuse a preallocated exception object;
    2. Absence of optimized compilation of script-private functions that are never used as values (don't need the code for "first-class" support);
    3. Unnecessary propagation of overhead from boundfunctions to common (non-bound) caller function;
    4. Induction of binding overheads for local variables that are lifted to closure fields;

    All these issues must be confirmed, I'm not intimate with thejavafxc compiler. Alas, the identified overheads are actually pretty common in other high-level languages... although they are often "hidden" inside interpreters or runtimes, but "exposed" in JavaFX Script which is fully static-typed and compiled. This exposure is good because programmers can easily spot useless bloat and complain about it. ;-) The compiler will certainly keep improving its intelligence to only add extra overhead where it is really necessary.

    But if I found a single important new fact about JavaFX's performance, that's it: Bound functions are expensive and dangerous. The extra overhead is not limited to the compiled code of the bound function itself, or even to call-sites; if you have any common function that contains call-sites to any bound function, this entire function will compiled with lots of extra overhead. In my Life program, the bound function was very simple so I just manually inlined it. Otherwise I would have refactored it into a pair of functions: a (possibly script-private) function that performs the actual work, and a public bound function that wraps over it and is only invoked by code that really needs the bound behavior.

    Performance Mystery II: Redundant Binding

    This section could also be titled: "I am stupid".

    Text rendering performance was still a major problem, so I proceeded to investigate it. I know that Java's string formatting APIs are somewhat expensive, but they shouldn't be thatbad - the profiler was showing some enormous overhead, in CPU and memory allocation, coming off places likeMatcher.<init>() andFormatter.format().

    Then I noticed the bug. I have a label with a bound expression:


    Label { text: bind "({animSlider.value as Integer}) Gen: {world.gen} Pop: {world.pop}" }

    The bug is simple: the variable world.pop is updated incrementally, once for each live cell, in the methodWorld.run().


    public function run ():Void {
        ++gen;
      var pop = 0;
        cells = for (y in [0 ..< SIZE]) for (x in [0 ..< SIZE]) {
            def cell = life(x, y);
            if (cell) ++pop;
            cell
        }
      this.pop = pop;
    }

    Fixing the bug was trivial: I created a local variablepop, so I can do a single update to the field in the end of the method. The previous code was forcing the entire rendering of the Label (formatting, rasterization, layout, clipping...) to be repeated for each live cell accounted in each generation.

    This is the flipside of JavaFX Script's binding feature to be so simple, so seamless: you don't notice the overhead. There are not explicit setters orfirePropertyChange() calls. A Swing programmer would never make this kind of mistake, because the property-change stuff is all explicit. Spotting this kind of performance bug is difficult, maybe due to the immaturity of tooling: no JavaFX-specific support in profilers. Two JavaFX engineers, who told me that they found a huge bottleneck in the Label formatting and rendering, didn't notice the cause.

    My new rule of thumb: Don't update public[-read] properties inside loops. Ever.Even for non-public properties, you are advised to avoid repetitive updates. Just mirror the property in a local variable, and update the field only at method's end.

    Even in Java this is an interesting micro-optimization, although in JavaFX Script (definitely not a system-level language) we're not supposed to use such low-level techniques... except if, as demonstrated now, there are new, higher-level reasons for that. ;-)

    Conclusions

    My Life program is now incredibly faster; it runs the "Life" test at full 64 rows @ 50ms delay, without dropping frames, scoring ~19.9 fps. Memory allocation is much saner at ~1095Kb/s (~1 young-GC of 4Mb, costing only 3ms each 4s). CPU usage is still higher than that of a competing Swing program, but that's due to my purist use of sequences and binding; I could easily optimize these... but I'm happy that I didn't, because this pushed me to find my real performance problems.

    The graphics / animation engine is not the bad guy that I suspected in the previous blog. It's not doing any stupid reconstruction of the entire internal scenegraph just because I change a trivial fill property of some nodes. Even the string interpolation bug was ultimately insignificant.

    People planning to use JavaFX for advanced animation and games must only take some care, like not allowing an avalanche of binding events in every frame, and not updating bound(able) properties inside tight loops (duh!). I also advise to completely avoidbound functions in code that's even remotely performance-critical.

    As a final note, I know that my animation strategy is "wrong"; I shouldn't trigger direct changes to the scene graph when a new Life generation is calculated. I should use a separate Timelineto refresh the display. The current strategy, coupling internal state changes to display updates, makes impossible to run GOL in high-speed mode - I can easily calculate many thousands of generations per second, but no graphics technology would be able to catch up in that much frames-per-second.

    My previous blog, presenting the Life program, was quite long, still not really complete. I've continued the work, but soon found some interesting surprises. This new blog starts investigating an API bug, then trips into some surprising language behavior, and ends in a proposal for a small change in the JavaFX Script language.

    A Slider bug

    I was annoyed to see that the Slider control ignores my attempts to enforce granularity - I wanted that only multiples of 50ms could be selected. Setting the recommended properties, likesnapToTicks and majorTickUnit, only works for clicks ("paging"), but not for dragging the thumb. The user can still use the thumb to set the slider's value to any number in the min..max range. I looked up the JIRA and found that this is an already-known bug: RT-5914: No change in behavior after changing the snapToTicks flag for Slider. Fortunately there is a workaround:


    def animSlider = Slider {
      override var value on replace { value = round(value/50) * 50 }
        snapToTicks: true majorTickUnit: 50 min: 0 max: 1000 value: 50 blockIncrement: 50 layoutInfo: LayoutInfo { width: 120 }
    }

    This demonstrates JavaFX Script's nice override varfeature. The variable itself (class field) is not overriden, only the default value (including bind) and trigger. Overriding a trigger is similar to overriding a setter method in Java, except that you cannot remove existing triggers from the inherited class. This removes a common pitfall of OO (forgetting thesuper-call), which has some tradeoff in flexibility but it's a good tradeoff here because those on replace clauses are often critical to maintain the class state's consistency.

    For much the same reasons, in my Java code all getters and setters are always final - no exceptions, no questions asked. When I have some rare scenario that justifies allowing subclasses to override the behavior of state changes, I create extra methods that don't follow the JavaBean get/set pattern. The final setters can be safely invoked from constructors, avoiding another pitfall from Java semantics, without breaking attribute encapsulation in constructors.

    Trigger adventures

    But... now I have another bug, although it's less severe and apparently not FX's fault. When I drag the thumb the on replace trigger rounds value like I wanted, but if you look at the Label that shows this value, it doesn't seem to be working at all - it shows the non-rounded value! Se the code:


    Label { text: bind "({animSlider.value}) Generations: {world.gen} - Population: {world.pop}" }

    I update Label.text with binding. When I drag the slider thumb, this actually does two things: 1) setSlider.value to the "raw" value, 2) invalidate this variable, causing any triggers and dependent bound expressions to fire.  I understand that I cannot rely on the relative order of execution of multiple dependent bindings; this is yet another reason why javafxc tries to enforce bound expressions to be functional-pure (although this can easily be circumvented). But I should be able to rely on the ordering of triggers before bindings. Something evil is going on!!

    Suppose you write this code:


    var x = 7 on replace { x = x * 2 }
    var y = bind x + 1;
    x = 77;
    println("x={x}, y={y}");

    What will be printed? As it turns out, "x=0, y=1". Theon replace is recursive. It stops evaluating at -2147483648 * 2 => 0 (this is Integer maths), then 0 * 2 = 0. The field setter ignores attempts to assign the same value, so the recursion ends at 0 -> 0. (If x was aDouble, it would stop at x = Infinity.) A on replace clause that doesn't reach a stable value, e.g. x = if (x == 1) 0 else 1, produces aStackOverflowError.

    Back to the animSlider.value trigger, it sort-of-works because the clause value = round(value/50) * 50 doesn't risk creating a lot of recursive activations of the trigger. The first execution will round the value; the second execution will get an input already rounded to 50 units, so no extra recursion happens. But, the problem is that at least one recursion happens (unless of course I am really lucky and the initial value is already multiple of 50).

    I have inspected briefly the code generated by javafxcfor the field setters, including support for triggers and binding notifications. It's a bit confusing, e.g. with multiple "phases" of binding invalidation. It seems the language makes an effort to avoid problems like multiple firing of binding even if recursion happens due to triggers (or bound expressions with side effects). It's easy to see the resulting side effect in my Labelbug, where the recursion plays a critical role to cause bindings to fire before the trigger.

    Another fun puzzler: In the following code,


    var x = 7.0 on replace { x = round(x) }
    var y = bind x + 1;
    x = 77.5;
    println("x={x}, y={y}");

    What gets printed?

    Answer: "x=78.0, y=79.0". This is the expected output, which may be surprising because the on replace does one-time recursion for x = 77.5. But, here is the real puzzler:


    var x = 7.0 on replace { x = round(x) }
    var y = bind x + 1 on replace {};
    x = 77.5;
    println("x={x}, y={y}");

    The only difference is that var y now contains a trigger too (even though it's an empty one). Sparing you from the surprise, the new code will print "x=78.0, y=78.5". This happens because binding is lazy by default in JavaFX 1.3+; but if the bound variable also has a trigger, it becomes eager. This changes the order of the various events (two executions ofx's setter due to recursion; execution of x's trigger and y's bound expression). Only eager binding is subject to the whole mess caused by recursion. Lazy binding is safe because (I guess) no matter how many times the setters and triggers (and maybe eager bindings) are invoked, all the language does is marking variables assigned to dependent bound expressions as invalid; they will only be reevaluated when somebody reads their value, usually after the whole recursion deal is complete.

    In the Label.text bug, the problem is that theLabel control should be using a trigger internally, so it can update the presentation of its value when the textproperty changes.

    Language growing pains

    I proceeded to search the JIRA for this bug, and quickly found it in JFXC-4284: self-modified in on-replace: Wrong binded value is displayed. This bug was a regression caused by the many binding enhancements in JavaFX 1.3; fortunately, a fix is already available and scheduled to ship in 1.3.1.

    The JIRA evaluation includes a comment "I have a fix, however, the order that the on-replace is called is undefined, and it is currently called last, this may even be forced by correct semantics (...) The state is being set to valid after on-replace. Correct behavior is that all state is set before on-replace is called." We're getting a fix soon; but still I'm not happy.

    Defining that some behavior is "undefined" is necessary and good in some situations; remarkably in concurrency (e.g. which thread wins a lock when many are blocked waiting for it), or in algorithms that need some flexibility for reasons that contribute to their very purpose (e.g. the iteration order of a HashMap). But this is the exception, not the rule. In most cases, completely well-defined behavior is very important even in APIs, and much more in language features. Suggestion: The ordering of triggers and binding evaluation should be defined, at least in the extend that's possible and makes sense. (In particular, it's OK to define as unspecified the relative order of multiple bound expressions that must be reevaluated at some point.)

    The javafxc team itself has filed several bugs complaining that the compiler sometimes produces different code for the same input, due to internal HashMaps that didn't preserve iteration order. This order is irrelevant for the correctness of the produced code; still, it's a problem for the toolchain - tools could work around it, but at the expense of extra complexity and effort. These bugs have been fixed, usually by throwing in a LinkedHashMap.

    And just beating a dead horse, the order of trigger execution in 1.3 is not random; that would indeed be much better! The order is well-defined (although not documented), it's just illogical and in the worst possible way: 99% of the time it matches the programmer's intuition, so programmers get used to think that such observed order is reliable; but it can sometimes differ from that - and as a side effect of code artifacts that should not have this impact.

    Looking also at the forest: For binding, the language is already designed to enforce side-effect-free expressions. This is the way to go, I hope that future releases of javafxc will perfect the pureness validation. (Possible if the compiler marks all functions that have side effects, or invokes others that have, including all APIs; so all that no "impure" code can be invoked from a bound expression.) This would have extra bonus of not allowing any new recursion caused by updates to variables with triggers, not to mention extra optimization opportunities.

    But, what about triggers? This feature serves the dominant, imperative part of the language, so functional-purism is not an option. Perhaps we could only forbid self-modification? But the workaround suggested for the Slider bug is actually a great use of this facility: the "post-processing" of the value being set is a common purpose of triggers - maintain object invariants. This is a feature, not a bug. But it produces at least one level of recursion, and that was sufficient to create mysterious bugs with eager binding. (And the latter is another essential feature; even in the über-lazy Haskell language they often need "strictness annotations" to enforce eager execution.)

    A Proposal

    RFE: Change triggers so we can still update the subject variable of a trigger, but self-update will not cause recursion: the trigger just assigns the field value directly (or return the new value so the calling setter may do that assignment in the most convenient time). I fail to see the utility of recursive trigger execution. If one wants that behavior, just invoke a (locally-)recursive function, that wouldn't cause new activations of setters, triggers or bound expressions.

    As a second rationale, the recursion caused by self-update is counter-intuitive. I was surprised when I discovered this behavior, perhaps because coming from Java, I relate triggers to polymorphic setters. But in a setter method you don't set the field with a recursive call; you may assign directly to the field (it's theonly place where that is allowed, for OO best-practices), or you may super-call an inherited setter. But no sane Java developer would write something like: setX (double newX) { setX(round(newX)); }. Not even with an if (x != newX)clause. It's just asking for trouble.

    I wonder if JavaFX Script's designers created triggers that way because they considered it simple and homogeneous, avoiding a special case - the trigger semantics has a single rule: "any assignment causes invocation of the trigger". Well, except the initialization! that's already one special case, unless you dismiss it as another category (construction, non-destructive assignment)... Anyway, simpler language rules don't always lead to a simple global behavior, or to intuitive behavior, not to mention useful behavior. The fix for bug JFXC-4284 is not sufficient in my opinion; this fix will avoid some bugs caused by relative order of triggers and binding, but it won't avoid other bugs that we can easily imagine, like my example { x = x * 2 } that reduces any value to zero or infinite.

    My proposed change is not a silver bullet against all bugs of that kind. The trigger can invoke a function that will update the same variable containing that trigger, causing recursion again. Both triggers and bound expressions may invoke functions that will update unrelated variables, but those variables may have their own triggers resulting in indirect recursion, which is even more confusing and much more difficult for a compiler to prevent (although the improvement of javafxc's validation of pure bound expressions could at least isolate these risks to triggers alone). But we can avoid >90% of the bugs, for the most common usage. And there is no tradeoff that I can see (but correct me if I'm wrong).

    Finally, it's tempting to imagine if we can go even further and make triggers bullet-proof, by having a smarter compiler that prevents any "dangerous" code in triggers. But this is an intractable problem; this last mile must be walked by best-practices. My new rule: Triggers and bound expressions should avoid any kind of I/O, any expensive computation, and any large-scale side effects (for bound expressions, no side effects at all). They should (at worst) only have "lightweight" side effects, such as setting a flag that will later cause some I/O or expensive computation to happen. Remember that both triggers and bound expressions (and any external functions they invoke) are executed in the middle of a complex mechanism that has unbounded execution. Even if the JavaFX teams accepts my suggestion of removing recursion from triggers with self-modification, it's not hard to write a program that, with too much work and side effects in triggers and bound expressions, will enter in a never-ending chain of invalidations.

    UPDATE: Filed the bug JFXC-4382: Avoid trigger recursion for self-update. I've also found a few interesting correlations with other bugs - and that was a 15-minute research in the JIRA.

    opinali

    JavaFX's Game of Life Blog

    Posted by opinali May 21, 2010

    There is an unwritten tradition that John Conway's Game of Life must be implemented in every programming language and every GUI toolkit. Well, OK I just invented this tradition, but it's a smart introduction and Life is one of the easiest games / cool animations you can program. But it's not too simple that we can't learn a few important things about JavaFX...

    My goal: a good-looking and feature-complete version of the Game of Life (GOL), but keeping code simple, short, "canonical". I won't resort to low-level optimizations (e.g. reaching to JavaSE APIs), but I may use high-level ones (e.g. good algorithms, careful selection of JavaFX features). How well JavaFX handles the task the way it is indented to be used?

    So, let's start. The complete app is short enough that it fits in this blog, in a few small pieces.


    class World {
        var SIZE:Integer on replace { reset() }
        var cells:Boolean[];
        var gen:Integer;
        var pop:Integer;
        function reset () { gen = pop = 0; cells = for (i in [0 ..< SIZE * SIZE]) false }
        bound function isLive (x:Integer, y:Integer)   { cells[y * SIZE + x] }
        function flip (x:Integer, y:Integer):Void      { cells[y * SIZE + x] = not cells[y * SIZE + x] }

    The World class implement the game's data model and the GOL algorithm. The cells sequence containstrue=alive, false=dead; it would ideally be a matrix, but JavaFX Script doesn't support multidimensional sequences so I have to do some index arithmetic. I could have used primitive arrays (with JavaFX Script's nativearray) but that would be impure, as native arrays are only intended for Java integration and don't completely integrate with JavaFX Script.


        function life (x:Integer, y:Integer) {
            var count = if (cells[y * SIZE + x]) then -1 else 0;
            for (yy in [max(y - 1, 0) .. min(y + 1, SIZE - 1)])
                for (xx in [max(x - 1, 0) .. min(x + 1, SIZE - 1)])
                    if (cells[yy * SIZE + xx]) ++count;
            count == 3 or count == 2 and isLive(x, y)
        }

    Function life() is the finite state machine for an individual cell; nothing JavaFX-specific here. Except that I hate and instead of &&.

    Oh, I I didn't make the obvious optimization of creating two extra rows and columns to avoid the min/max tests to prevent out-of-bounds errors at border cells without all neighbors, because this would reduce the general seamlessness of working with sequences. (It takes a lot of discipline to resist the urge of micro-optimization... ugh...)


        function run ():Void {
            ++gen;
            pop = 0;
            cells = for (y in [0 ..< SIZE]) for (x in [0 ..< SIZE]) {
                def cell = life(x, y);
                if (cell) ++pop;
                cell
            }
        }

    Function run()recomputes the whole world (all cells). The inner for xxbuilds a sequence for each row, and the outer for yyconcatenates all row sequences in a single big sequence ("auto-flattening"). I didn't worry, because the compiler may optimize this by adding the inner elements directly into a single sequence for the outer loop.

    ((( Begin Parentheses to investigate the compiler (((

    Sequences are immutable; updates are performed by creating a full-new sequence, copying all non-updated elements. The compiler can optimize this too, with temporary mutable representations in methods that perform multiple updates; ideally you trust the compiler by default, and only if optimize if necessary (as indicated by profiling). Having said that, my run()function replaces the current sequence by a new one, requiring a single assignment - but I didn't do it to optimize code, I did it because it's more elegant: the code explicitly calculates the entire state N+1 as a function of state N. In fact, run() was a one-liner before I augmented it to update the generation and population counters.

    Notice that the GOL algorithm cannot be implemented easily with in-place updates because the new state of each cell depends on the current state of all cells around it. I could have used an in-place algorithm, but that would be uglier and also require some mutable data type like a nativearray.

    Another interesting aspect of JavaFX Script is that is sequences are optimized for all basic types. My cells:Boolean[] uses a primitive boolean[] as internal storage, consuming a single byte per element; I've certified this behavior in the profiler. Let's check all these optimizations in the generated bytecode (decompiled):


        @ScriptPrivate
        public void run() {
            World receiver$ = this;
            set$World$gen(get$World$gen() + 1);
            set$World$pop(0);
            BooleanArraySequence jfx$25sb = new BooleanArraySequence();
            int y$ind = 0;
            for (int y$upper = get$World$SIZE(); y$ind < y$upper; ++y$ind) {
                int y = y$ind;
                BooleanArraySequence jfx$26sb = new BooleanArraySequence();
                int x$ind = 0;
                for (int x$upper = get$World$SIZE(); x$ind < x$upper; ++x$ind) {
                    int x = x$ind;
                    boolean cell = life(x, y);
                    if (cell) set$World$pop(get$World$pop() + 1);
                    boolean jfx$27tmp = cell;
                    jfx$26sb.add(jfx$27tmp);
                }
                Sequence jfx$28tmp = jfx$26sb;
                jfx$25sb.add(jfx$28tmp);
            }
            Sequences.set(this, 1, jfx$25sb);
        }

    Oh, crap - the compiler didn't use a singleBooleanArraySequence like I expected. Unless my memory fails, javafxc is capable of this optimization, but maybe just for simpler cases. It seems the compiler has ways go go. Another missing optimization is preallocation: the maximum number of elements that will be inserted can be statically determined (SIZE for the inner sequences, SIZE*SIZE for the outer), so the compiler should create the sequences with these initial sizes, avoiding growing costs. Finally, every iteration of the outer loop allocates, uses and then discards a temporary sequence (its elements are copied to the outer sequence); this inner sequence could be allocated only once and cleared/recycled in all outer loop iterations. The latter optimization is unnecessary if the compiler could just avoid the inner temporary sequence, but I can see other scenarios where this wouldn't be possible but the reuse of temporary sequences would.

    There are also other gratuitous inefficiencies in the generated code, like several redundant temporary variables. (One of these,receiver$, is an artifact of traits, already planned to disappear from unnecessary places). Also I wonder if the order of the synthetic $ind and $upper variables in the bytecode may confuse loop optimizations (just like it confused my decompiler). Such small issues won't impact runtime performance as the JIT compiler will just optimize them out; but the redundancies affect startup/warmup performance and also code bloat.

    Why I am complaining so much? JavaFX Script is a high-level programming language, in the sense that its mapping to the compiled form (Java bytecode) is not trivial (like it is for Java). And it actively promotes a high-level programming style, both by offering very convenient high-level features such as sequences and binding, and by not offering alternative low-level features (except for the recourse of "native interface" into Java classes). The net result of this design is that the compiler must assume responsibility for all the low-level optimizations that programmers can't do anymore (or, are convinced that it's not good style to do anymore - e.g. explicitly mutating sequences). In my Java code, I always do such things as preallocating collections, recycling expensive objects (remarkably big collections), or eliminate intermediary collections produced by inner loops.

    The javafxc compiler already includes some impressive amount of such high-level optimizations; but we need more. Performance is already pretty good, but there is a lot of potential to be even better; I expect the code generation to keep improving for many updates to come.

    ))) End Parentheses to investigate the compiler )))

    Anyway, let's continue the program...


        function scroll (dx:Integer, dy:Integer):Void {
            cells = for (y in [0 ..< SIZE]) for (x in [0 ..< SIZE]) {
                def yy = y + dy;
                def xx = x + dx;
                yy >= 0 and yy < SIZE and xx > 0 and xx < SIZE and isLive(xx, yy)
            }
        }
    }

    World's last function, scroll(), allows me to scroll all cells in any direction. Nothing remarkable here.


    def world = World { SIZE: 64 }
    def CELL_SZ = 8;
    def animSlider = Slider {
        min: 0 max: 1000 value: 50 blockIncrement: 50 layoutInfo: LayoutInfo { width: 120 }
    }
    def anim = Timeline {
        repeatCount: Timeline.INDEFINITE
        keyFrames: KeyFrame { time: bind animSlider.value * 1ms canSkip: false action: function () { world.run() } }
    }

    Now we start the game UI. I declare the world object, and the animation timeline that triggers a new generation in fixed delays. A Slider allows to change this delay; I had to declare it here so I can use binding to automatically adjust theKeyFrame's delay from the slider value.

    Notice the value * 1ms calculation, necessary to convert a Double to a Duration. The multiplication is a no-op, as 1ms is Duration's fundamental unit. You can't use a typecast (value as Duration), because the Duration type needs a unit (ms, s, m, or h) and there is no default unit, not even for 0. I like that, and I'd love to see JavaFX Script evolving to embrace user-defined units in its core typesystem; this would make a lot of sense for a high-level language serving business applications stuffed with manipulation of "real-world" data.


    def toolbar =  HBox { spacing: 8
        content: [
            Button {
                text: bind if (anim.running) "Stop" else "Go"
                layoutInfo: LayoutInfo { width: 60 }
                action: function () { if (anim.running) anim.stop() else anim.play() }
            }
            Button {
                text: "Clear" layoutInfo: LayoutInfo { width: 60 }
                action: function () { world.reset() }
            }
            animSlider,
            Label { text: bind "({animSlider.value}) Generations: {world.gen} - Population: {world.pop}" }
        ]
        onKeyPressed: function (e:KeyEvent) {
            if (e.code == KeyCode.VK_DOWN)       world.scroll( 0, -1)
            else if (e.code == KeyCode.VK_UP)    world.scroll( 0,  1)
            else if (e.code == KeyCode.VK_LEFT)  world.scroll( 1,  0)
            else if (e.code == KeyCode.VK_RIGHT) world.scroll(-1,  0)
        }
    }

    I have a top row of controls that allow to stop/start the animation, reset it to the initial state, control its speed, scroll the cells in four directions, and show the generation and population stats. Only remarkable part is the if-elsecascade in onKeyPressed(), because JavaFX Script lacks aswitch/case statement. The language already has first-class functions and closures, so adding a map type would allow efficient (hashed) branching for larger numbers of keys, and reasonably compact code too.


    def life = Group { content: for (yy in [0 ..< world.SIZE]) for (xx in [0 ..< world.SIZE])
        Rectangle { x: xx * CELL_SZ y: yy * CELL_SZ width: CELL_SZ height: CELL_SZ
            fill: bind if (world.isLive(xx, yy)) Color.BEIGE else Color.BLACK
            stroke: Color.BLUE
            onMouseClicked: function (e:MouseEvent) {
                world.flip(xx, yy);
                toolbar.requestFocus();
            }
        }
    }

    The main "game" region is a grid of Rectangles to show each cell. Once again I use nested Y/X loops, producing the sequence expected by Group.content. For eachRectangle, I've used binding to set its fillcolor according to the corresponding cell state.

    Yup, that design (and even my choice of the Game of Life -mwahahaha!) was a purposeful stress-test of both binding and the scene graph; with the 64x64 world size, this means4.096 nodes and 4.096 bound properties, so I'm relying a lot on compiler and runtime efficiency.

    The handling of mouse clicks, used to toggle cells, is trivial because I can attach the event handler to each Rectangle, so I don't need any picking logic. Notice also that my even handler is a "full closure" that reaches to the xx, yyvariables - the indices of the for loops that built theRectangle sequence.

    Finally, in that same mouse handler I force the keyboard focus to the toolbar because that's where I installed theKeyEvent handler for scrolling.


    Stage {
        title: "Life" resizable: false
        scene: Scene { content: VBox { content: [ toolbar, life ]}}
    }

    The Stage and its Scene, with the toolbar on top of the game region. It's complete! Click the image below to launch:

    Game of Life screenshot

    The resulting functionality and even the look are, IMHO, surprisingly great for a program that's under 100 lines of code. Just google "Game of Life Patterns", the web is shock-full of GOL resources. Just click the cells; sorry no import/export of LIF or RLE files yet - may appear in the payware version ;-)  This validated my impression about JavaFX's productivity...

    A B[l]inding Puzzler

    ...but I confess that my first code didn't work; the cells didn't change in the screen, as if world was not being recalculated at all. The bug was here:


        function isLive (x:Integer, y:Integer)   { cells[y * SIZE + x] }
    ...
        Rectangle { fill: bind if (world.isLive(yy, xx)) Color.BEIGE else Color.BLACK ... }

    My fill: bind... was not firing whenworld.cells changed. The problem is, the only variables captured by the bind expression are world,yy and xx. These are the only data which updates would trigger reevaluation of my bind expression. Thecells sequence is encapsulated by the Worldclass, and it's not directly referenced from the bindexpression. This may be a significant puzzler as someone could start with a more "scriptish" prototype code full of script-scope variables, and later refactor these into classes.

    The fix was trivial once I found the problem; just declarebound function isLive..., and it works. Now the binding system knows that the subexpressionworld.isLive(xx, yy) is also invalidated when thecells sequence is changed, because that field is used inside isLive() and this dependency propagates to bound expressions that invoke isLive(). (Such propagation is not a completely obvious feature; it shows that the binding mechanism is pretty well rounded, with robust dependency tracking.)

    Timeline issues

    The KeyFrame.time property is read/write; this is very convenient because I can change the animation speed by just updating this property in my single KeyFrame. Unfortunately, this doesn't work very well. The program starts with a configuration of 50ms; if you click Go and then drag the slider (left = smaller delays / faster animation, right = larger delays / slower animation), the animation will adjust its speed but not smoothly. Sometimes I observe a pause of a few seconds, sometimes a "race" of very fast animation while dragging the slider. The animation engine must be doing some timing/scheduling that become temporarily confused when a KeyFrame'stime property is changed.

    I have tried some alternate implementations - using an intermediary variable with an on replace trigger that stops (or pauses) and timeline, changes the KeyFrame delay and resumes it; and even create a new Timeline. But the result was always similar.

    Performance

    This is a simple app, so it shouldn't put a big stress on JavaFX... except perhaps, for my sub-optimal state management, and large counts of nodes and bindings.

    Idle test: Empty world, animation stopped. CPU usage is 0 as expected, and GC log shows zero activity. This test looks trivial, but it's good to assert that no part of the system (binding, scene graph) uses polling, busy-waiting or other brain-dead techniques. Some platforms are known for non-zero CPU usage in idle apps, so it's good to show that JavaFX won't do that. ;-)

    Dead test: Empty world (no live cells), animation on at 50ms (20 gen/s == 20 fps). CPU usage was a lowly ~1,1% (on a quad-core Q6600; so that's ~4,4% of a single core). Most work is due to the recalculation of the world; we can inspect the GC log:


    [GC 15.049: [DefNew: 4421K->5K(4928K), 0.0005543 secs] 13263K->8847K(15872K), 0.0005950 secs]
    [GC 16.201: [DefNew: 4421K->5K(4928K), 0.0006040 secs] 13263K->8847K(15872K), 0.0006480 secs]

    That's ~3,8Mb/s = ~190Kb per frame/generation = ~46 bytes per cell update. It's a bit higher than I originally expected; the missing sequence optimization is certainly the cause, as it produces a lot of extra allocation. But even that is not so bad, because Java's excellent GC produces near-zero pauses.

    Dead & Headless test: Similar to the previous test, but I commented out the bind inRectangle.fill, so the entire GUI layer is a no-op after initial startup. GC behavior was identical, but CPU usage down to 0,89% (of a single core). This means that the binding was costing 0,22% core (or 0,00005% per cell: how's that for precision?). Notice that my code is assigning new values to every cell; the fact that the new values are identical to the old values only saves the effort to repaint the rectangles, but the bindexpressions must be reevaluated every time.

    Life test: I changed the life()function to just flip all cells in the first few rows. [I only change the return statement, to not remove the effort of calculating all cells with the normal GOL algorithm.] This provides a stable animation test; a real Life run is difficult to benchmark because the number of cell changes in each generation varies chaotically.

    The animation engine could not keep up with many rows - the updated cells are only refreshed in some frames. Testing with 2 rows (128 cells) is fine; at 4 rows (256 rects) I could already see skipped frames. Garbage collection was intense, let's see it for 2 rows:


    [GC 9.439: [DefNew: 4422K->9K(4928K), 0.0014487 secs] 13267K->8855K(15872K), 0.0014851 secs]
    [GC 9.480: [DefNew: 4425K->6K(4928K), 0.0015159 secs] 13271K->8851K(15872K), 0.0015514 secs]

    The animation engine does a lot of allocation when I simply change the Rectangle.fill property to a differentColor. We're up to ~120Mb/s = ~6Mb / generation = ~46Kb per updated cell. it seems that the scene graph completely rebuilds its internal node objects ("SGNode's") when some property changes. These preprocessing techniques are essential to accelerate such things as transforms and effects, but in this case I'm just changing a simple Rectangle's internal painting from one solid color to another solid color.

    The program is doing full vector rendering; if you check JavaFX Balls, drawing things from geometric elements may be much slower than just blitting a bitmap image. But JavaFX Balls used a complex drawing with curves and gradients; Life only draws pretty dull rectangles without rounded corners, transforms, effects or anything else. I've tested the Prism toolkit too but this time it didn't save JavaFX; basically the same behavior.

    I experimented some optimizations that I didn't originally want to use:

    • The first obvious thing is using a single ImageViewfor the background of all-dead cells. Then I have one fixedRectangle per live cell, just hiding it when the cell isn't live.
    • Using ImageView also for the live cells (so, full "bitmap rendering").
    • To show/hide the live cells, I've tried both flipping thevisible property, and moving dead cells away from the view (change y to a big negative value - this requires some layout tweaks).
    • Finally, I changed the code so that the entire Groupof rectangles is protected by a single bind expression, and only live cells will generate a Rectangle. This replaces all live-cell nodes, if any, at every frame. The advantage is that most cells are usually dead, so the scene graph has less nodes.

    All these optimizations net me a maximum of ~2X speedup; I could animate 4 rows of pulsating cells with proper behavior (no visible frame skipping - still, high CPU and GC activity).

    It seems that JavaFX's scene graph is already perfect for GUIs with controls, but must improve its support for general animation. Changing the state of a large number of nodes per frame shouldn't have such a high cost. Even adding/removing many nodes should be faster, although I realize this is harder and will accept tradeoffs in coding effort - e.g. carefully breaking the scene graph into many groups, then adding/removing these to the scene, maybe with a hint to let the engine do all preprocessing in parallel and only make the new nodes visible when they're fully realized but not hang the animation until that happens.

    This would be perfect for problems like Joeri Skora'sIsometric tile rendering in JavaFX; notice that JavaFX 1.3 has pretty good performance for a very big scene graph - even in Use brute force mode, his animation scrolls over a 65.536-node scene with surprisingly good performance. But that's just because the scene is completely static; and the approach of dynamically adding and removing nodes, even with optimizations like quadtrees, suffers from the overhead of changing the scene tree.

    I don't expect that JavaFX's scene graph would be optimized for huge scenes (e.g., with a spatially indexed node tree for more efficient clipping). JavaFX is not meant to compete, out of the box, as a high-end game engine. But it should be sufficiently powerful and flexible to allow programmers add extra tricks and optimizations that become necessary in each application niche. Besides games (a huge business even in its "casual" category), there are other important cases for advanced animation, such as sophisticated data visualization.

    JavaFX versus Swing

    I've quickly googled "Java Swing Game of Life" and found one program that's very close to mine. (Even closer after I stole its idea of having a slider to change the animation speed.) I've made a few changes to the Swing app to make both comparable - same cell number and size, same optional hack for the stable Life tests.

    Code size and clarity: JavaFX wins. The Swing code is ~230 lines, this after I've stripped many redundant comments and {}s. Even removing all remaining comments (not fair because the code is clearly not all obvious) and tightening the formatting/indentation even more, it's more than 2X the size of the JavaFX Script code. And that's with less features (no scrolling). There is no contest in code size - or much more important, code clarity, that is more subjective but if you check the code I don't think there's much space for argumentation. (This program may not be the best possible Swing code, but I don't think that would be much better.)

    The Swing program does no custom painting, it creates a customJLabel for each cell and changes its background color - this is nice because it's the closest thing to a "scene graph-based" Swing program: all rendering is performed by the toolkit. Also, the panel objects contain the cell state and the GOL algorithms, using two state variables and two complete passes over all cells to enable in-place updates. That should put the Swing program in performance advantage over JavaFX. (Once again I could optimize my GOL program, remarkably for in-place update - but I don't want to; I'm focusing on easy to write, easy to read code.)

    Memory usage: Swing wins. Measuring the Life test, JavaFX uses 54Mb working set / 100Mb private bytes (8,852Kb heap); JavaFX+Prism is better at 51Mb / 82Mb (8,875Kb heap).  The Swing program uses 55Mb / 94Mb (3,170K heap, but without any significant allocation/GC activity even in the Life test). JavaFX uses more heap, remarkably for bindings and closures and the scene graph. JavaFX 1.3 has improved the efficiency of both binding and the scene graph, but if you use both in the range of thousands, there's still enough overhead to care. But JavaFX really loses in its excessive memory allocation when scene graph nodes are updated.

    Performance: Swing wins. The Swing program doesn't suffer from the issues I discussed with JavaFX's scene graph; it happily runs the Life test with near-zero CPU and GC activity - like we should expect from a simple, 64x64 Game of Life running on a current computer, and executed by native code (in that case JITted) and a reasonably hardware-accelerated toolkit (which includes Java2D/Swing).

    Last Conclusions (and RFE's...)

    I am still quite happy with my Life program. It was a pleasure to write, and some of the performance issues (remarkably with sequences) are easy to fix if I care. Perhaps even the scene graph limitations have a smarter workaround that I didn't try - e.g., rendering all live cells with a dynamically-builtPath (I was just too lazy to try that one...) or some other trick.

    I'm not happy though with these scene graph limitations; I should be able to write an efficient version of something like the Game of Life without any optimization effort. Adding/removing many nodes from the scene graph is very expensive; I can accept and understand this, it's the core tradeoff of scene graphs (but still, in JavaFX the tradeoff seems unreasonably high). But I neither understand, nor accept a big overhead for trivial updates in existing nodes - like changing a solid color, flipping the visibility state, or even just translating. At least in this area, it seems that JavaFX must still improve significantly. Even if JavaFX is now very close to be an excellent and complete platform for some important use cases - control-centric (e.g. business front-ends) and media-centric - the platform is still in the beginning of a steep adoption curve and it can't afford to not serve other niches very well.

    As a JavaFX enthusiast, I like to refer to all former Java GUI toolkits (AWT/Java2D/Swing, LCDUI, even SWT) as "legacy" and "obsolete"; but this is clearly not fair while there are programs that I can write in these old toolkits with excellent results, but not in JavaFX. I'm optimist because JavaFX has already improved a lot since its v1.0; the foundations are very solid and the JavaFX team is now very fast catching up in areas like high-quality controls, layout and styling. The compiler is also maturing fast; the optimization of binding in 1.3 was massive (even though not yet complete) and the sequence optimizations are ongoing.

    Finally, we could argue that the scene graph paradigm is not ideal for all graphics applications, but I don't believe that. I see immediate mode rendering as the future Assembly coding of graphics. On the other hand, shader programming is an important piece of modern graphics stacks; JavaFX uses this intensely (remarkably in Prism), but unfortunately it's only internal. With that support, I could write all the cell rendering easily inside a single canvas node - the Life "world" can be rendered as a big functional texture, and its rendering is a ridiculously-parallelizable task that's perfect for the shader paradigm. Shaders are often great replacement for important use cases that don't favor scene graphs. So, in my (non-expert) opinion, the big missing piece in the JavaFX stack is not a traditional immediate-mode API, but opening Decora(the desktop runtime's portable shading engine) for applications, with a public shading API.

    opinali

    Flash Is a Right Blog

    Posted by opinali May 8, 2010

    Ian Bogost's recent article Flash is Not a Right highlights some new aspects of the debate about Apple's iPhoneOS development restrictions. I have a different opinion.

    I understand Ian's pain as a teacher. Programmers who aren't curious, don't like to explore varied languages and paradigms, are doomed to rank-and-file roles. But this is secondary. The purpose of computing is to serve the needs of end users. But for this to happen, computing has to be a healthy industry: one that allows fair competition, rewards efficiency and quality - these are core values of our economic regime, and the reason behind many consumer protection laws.

    Granted, not all platforms are like the PC; many require you to pay a developer fee, sign NDAs, adopt DRM technologies, abide to the vendor's certification and distribution channels, etc. These restrictions have been around since the dawn of computing, and developers generally don't have issues with reasonable terms. But I have no knowledge of a previous computing platform that would enforce the kinds of non-reasonable restrictions that Apple wants to enforce now.

    I'm not saying that Apple should helppeople to use their preferred tools. I'm not asking Apple to OEM-install Adobe Flash Player; it's their product and they deal the deck. But if that deck seems to have some deuces to me, I should be free to use my aces - as long as I put up with the effort and cost, and do it within the behavioral rules imposed by the platform (i.e., only install unprivileged userland code; follow standards of security, reliability, UI guidelines, etc.).

    Blocking high-level tools opens a huge can of worms that nobody is talking about: it creates artificial, unfair disadvantage for smaller developers. It's a plutocratic move, that favors huge companies and screws with small shops and indie developers. High-level programming languages and frameworks that create a layer over the raw platform are popular for several reasons. Portability is not the only reason;productivity is another huge reason too and it's even more important - Steve Jobs's Thoughts on Flash smartly avoids this issue. Multiplatform was never Flash's primary selling point; its popularity boomed even when the Mac was down (so Windows was the single desktop to matter). The major reason for Flash's adoption was, by far, features and productivity. Pure web browsers are now catching up with the features, but Flash still benefits from a powerful suite of design tools; simple validation & deployment; and trivial hosting.

    Tools like Flash bring power and productivity for the masses. Apple can stop Flash; but they cannot stop all similar tools. Take any huge software company (e.g. Electronic Arts - a big iPhone game provider), and they have their own high-level platform that they rely on: frameworks, design tools, code generators. Even embedded languages and compilers like Lua. (The latter violates even the previous iPhoneOS terms, but Apple pretends to not see it.) They often use game engines too - very big components that actually become the real platform; most "application code" is written to the game engine's API, not to native APIs. Apple's new terms would certainly forbid that. Even if Apple wanted, they couldn't block the likes of EA to use high-level tools, because these may be in-house and not public, and cannot be detected without expensive reverse-engineering of the binaries.

    Ian Bogost mentions game engines in a purely negative way - "same plain-vanilla experience (...) lowest-common denominator". Jobs complained that cross-platform tools may not provide full access to system. This is not necessarily true. MonoTouch enables full access to the OS APIs, and it's updated to new iPhoneOS SDKs within days. Others may just need extra work, e.g. a JVM supporting JNI. There's also "right tool for the job" - not all applications need every iPhoneOS feature; if Flash is good enough for your app, why not use it? And when it is not, it's your choice to either create a mediocre product, or use another tool. (If you make the first choice, Apple might reject your app because it's Trash - not because it's Flash. But they probably won't reject it for the former reason, as the absolute majority of mediocre apps in the AppStore shows; and that' fine too, we should just let the free market push bad products to the bottom of the heap.)

    Some of the greatest games of all time were built on a reusable, portable game engine (Maniac Mansion anyone?). Yes, the most innovative games are often those thatintroduce a new engine - or at least a major revision (e.g., The Day of the Tentacle for SCUMM v4). But this happens mostly because there are only few opportunities for breakthrough innovations: a new smart algorithmor game concept; next-gen CPU or GPU - some lucky game will be first, and will be famous mostly because it was first. Also, games that use a common engine often step outside it, or customize the engine, for incremental innovation. The engine provides the 80% of common features for some game category and platform generation, stuff that's just stupid to rewrite for each title. While ancient games would access the video hardware directly, all current games will instead rely on a thick stack, from GPU microcode to low-level drivers to relatively high-level APIs like D3D and OpenGL. Yet Apple is not telling people to skip these layers and program the iPhone's PowerVR hardware directly. This illustrates the idiocy of opposing higher-level stacks. Multi-layer architectures and increasing abstraction are among the core foundations of Computer Science. Steve Jobs is basically asking us to ignore some of the most important best-practices of our profession, and this is Wrong. (Of course he wouldn't know better - Jobs has never written a "10 GOTO 10" program in his life - that's why I'm being a bit scholar here, just in case he reads me.)

    (Just wrapping up on games, don't forget design and content; these play a major role to provide a unique experience. The majority of all good games don't have innovative coding, and don't milk the platform's utmost capacity. Tetris had amateur graphics even for 1984. To paraphrase Bill Clinton:It's the creativity, stupid.)

    Apple's contempt for developers is way too blatant to not deserve revolt. Apple is a successful company because they have a strong, competent focus on end-users; this can only be lauded. But they have crossed the line when they handle developers with fascist manners - authoritarianism, interventionism, indoctrination. For Apple, developers are servile sharecroppers who should be grateful for profiting from the landlord's properties. Apple is ruling over factors that have no objective impact on application quality, such as programming language choice. They are manipulating their developer base, for their exclusive benefit - pushing Apple's agenda against Adobe and other competitors. This is ultimately dangerous even for Apple, that's losing its famed customer focus; end-users will not benefit from this business.

    Programming to any platform in Flash - or C#, Java, whatever you like - is your right.

    Performance: JavaFX Balls

    As soon as I've got JavaFX 1.3 and NetBeans 6.9-beta, first thing I did was obviously running benchmarks, and the new update delivers on its promise. Let's first check JavaFX Balls (port of Bubblemark). I've last reported results for 1.2 here; but scores for 1.2 are updated again to account for changes in my test system, remarkably the JDK (now 6u21-ea-b03).

                                                                                                                                                                     
    TestJavaFX 1.2
    (Client)
    JavaFX 1.2
    (Server)
    JavaFX 1.3
    (Client)
    JavaFX 1.3
    (Server)
    JavaFX 1.3
    (Prism/C)
    JavaFX 1.3
    (Prism/S)
    1 Ball999 fps1000 fps1000 fps1000 fps  
    16 Balls998 fps998 fps1000 fps1000 fps  
    32 Balls986 fps998 fps998 fps998 fps  
    128 Balls490 fps636 fps608 fps666 fps  
    512 Balls90 fps108 fps124 fps151 fps  
    @ 60 fps642 Balls699 Balls815 Balls878 Balls817 Balls1.173 Balls
    @ 200 fps285 Balls358 Balls366 Balls428 Balls  
    Effect, 1 Ball666 fps666 fps666 fps972 fps  
    Effect, 16 Balls150 fps165 fps162 fps220 fps  
    Effect, @ 60 fps44 Balls47 Balls45 Balls66 Balls377 Balls642 Balls
    Effect, @ 200 fps12 Balls13 Balls12 Balls14 Balls  
    2D, @ 60 fps70 Balls70 Balls68 Balls71 Balls96 Balls105 Balls
    2D, @ 200 fps18 Balls20 Balls20 Balls20 Balls  
    2D+Eff, @ 60 fps27 Balls28 Balls25 Balls26 Balls75 Balls82 Balls
    2D+Eff, @ 200 fps7 Balls7 Balls7 Balls7 Balls  

    JavaFX 1.3 shows once again good improvements in the scene graph's scalability - its advantage over 1.2 is bigger for higher node counts, topping at a 37% more fps for 512 Balls, or 28% more balls for 200 fps. JavaFX Balls is a worst-case animation in some aspects, all its nodes move every frame; in many real-world animations, some elements are either static (background) or semi-static (objects that only move or change when reacting to some event), so these will likely scale up to thousands of nodes as JavaFX uses standard tricks like dirty regions and bitmap caching to avoid redundant work.

    The performance impacts of Vector rendering("2D" tests) and Effects are unchanged: both options cost a lot. The Effects framework is the worst offender - a simpleBoxBlur effect will bring your performance from 815 to 45 Balls (20X times worse) @ 60 fps. But..., this just for the standard graphics toolkit (identified as "Swing", because it's build on top of some core classes from the legacy AWT/Java2D/Swing stack).

    Now let's activate the next-generation Prism toolkit (with -Xtoolkit prism; currently in Early Access). For the bitmap and vector tests, Prism is just as good as the old toolkit. But enabling Effects changes everything;Prism is almost 10X faster than the Swing toolkit, scoring an incredible 377 Balls @60 fps. Like long expected, Prism finally renders effects with full hardware acceleration and without the extra buffer copies that spoil effects on the Swing toolkit.

    How good is Prism's score? I didn't focus on benchmarking against other RIA runtimes here, but these results are much better than the top scores I measured last June for PulpCore and LWJGL/Slick. The latter is a dedicated, lightweight 2D game engine and it's also OpenGL-accelerated, which makes Prism's advantage impressive. The Prism Bubblemark program is not available anymore (server down as I write this), but the latest PulpCore scores only 30 fps for 512 Balls. (PulpCore can do 65 fps with a new "Pixel snapping" optimization that rounds all coordinates to integer values - that looks interesting, I will add it to JavaFX Balls later.)

    I've also repeated these tests with HotSpot Server. This VM is not viable for client deployment, but we can notice areas of possible improvement - anything that runs substantially faster with the Server compiler is (a) written in Java, and (b) has optimization potential. And we can really see HotSpot Server beating the pants off Client, in the simpler tests that only measure the scene graph's performance. Combining HotSpot Server with the Prism toolkit, I've broken the 1.000 Balls barrier for the first time in the @60 fps test.

    Problem: When I add many balls in the JavaFX Balls animation in a single step, e.g. from 128 to 512 balls, the animation "freezes" for a noticeable time - close to a second. This happens because JavaFX relies on preprocessing (pre-allocating/computing objects that are reused at each frame for translations and other pipeline tasks). Prism's delay to add many nodes is not worse than Swing's, but not better either. My test case is perhaps extreme - most real-world animations should not add hundreds of nodes to the scene in a single keyframe. Anyway this shows one possible bottleneck that may deserve optimization in future releases.

    Performance: Strange Attractor

    My next test is the Strange Attractor benchmark. I didn't expect any improvement in this program, because it makes minimal use of JavaFX's scene graph - all animation is performed by manual writing of color values to a large array of pixels that is finally blitted to the screen.

                                             
    TestJavaFX 1.2
    (Client)
    JavaFX 1.2
    (Server)
    JavaFX 1.3
    (Client)
    JavaFX 1.3
    (Server)
    MainListDouble74 fps96 fps80 fps94 fps
    MainSeqDouble62 fps77 fps65 fps78 fps
    MainFloatRaw144 fps166 fps162 fps250 fps
    MainListDouble3D62 fps78 fps50 fps64 fps

    The performance delta was modest as expected - except for the large improvement in MainFloatRaw with HotSpot Server, and the regression in all scores for the MainListDouble3Dtest. The latter test has extra code inside the inner rendering loop (for smarter calculation of pixel colors), so the smaller performance may just be some unlucky effect of differentjavafxc code generation over JIT optimizations.

    Why no scores for Prism? The Strange Attractor program had to be ported, because it reaches into Image.platformImage, which I empirically found to be a java.awt.image.BufferedImage(containing a DataBufferInt) in previous releases - and still in JavaFX 1.3 with the Swing toolkit. But Prism's runtime type is com.sun.prism.Image; inside this object there is ajava.nio.HeapByteBuffer object that contains the pixels. And the pixel data was only 8bpp, because the Image was created from an 8bpp blank.png file. Well, I changed the code that reaches to the pixels and recreated this PNG at 32bpp. But the program still doesn't work - the image has wrong colors, and I can only see the first frame because my trick to force refresh (calling ImageView.impl_transformsChanged()) has no effect on Prism. I've tried other methods, including some with promising names like impl_syncPGNodeDirect()... but nothing makes Prism sync the window with the updated pixel buffer. I'll be happy to hear about your findings, otherwise we cannot efficiently program bitmapped animation for JavaFX anymore. Another problem is that performance sucks - I get only ~29fps with full usage of one CPU core, and that's without any refresh.

    Performance: GUIMark

    GUIMark is one benchmark that used to be a disaster for JavaFX, as I briefly reported before. The problem is tracked by bug RT-5100: Text layout in FX is much slower than a pure Swing app. The root cause of this umbrella bug is RT-5069: Text node computes complete text layout, even if clipped to a much smaller size. These bugs are still open - although they report some progresses; for one thing, part of the problem is blamed to the JavaSE's Bug 6868503: RuleBasedBreakIterator is inefficient, and that bug is closed as fixed in JDK 6u18. So I decided to test GUIMark again.

                         
    ProgramJavaFX 1.2
    (Client)
    JavaFX 1.2
    (Server)
    JavaFX 1.3
    (Client)
    JavaFX 1.3
    (Server)
    JavaFX 1.3
    (Prism/C)
    JavaFX 1.3
    (Prism/S)
    GUIMark1,81 fps2,22 fps2,81 fps4,44 fps78 fps120+ fps

    Text layout performance is better in JavaFX 1.3, but the bug is still alive; the ~2X better scores are still awful. But, that's only true for the Swing toolkit. Prism doesn't suffer that problem, delivering wonderful GUIMark scores.

    Notice that I've tweaked the benchmark to use a 0ms keyframe, and used JavaFX's internal FPS logger. It's the only way to allow maximum FPS count when the animation runs too fast, and at the same time, get precise performance numbers when it runs too slow. Also, I cannot measure the real score for Prism / HotSpot Server because Prism caps fps at 120 - but in my system this test consumes 20% CPU (0,8 core in a quad-core system), so I can project ~150 fps.

    In the same test machine I get these scores: Java / Swing = 43 fps (Client) / 50 fps (Server); HTML (Firefox 3.7-a4, with DirectDraw & DirectText enabled) = 47 fps; Flash 10.1rc2 = 53 fps; Silverlight 4.0 = 55 fps. Thanks to Prism, the Order of the Universe will be restored, with Java's performance ruling once again.

    The other GUIMark implementations are also capped, either by their own code or by their runtimes. I removed this limit only for Java/Swing, changing a timer's delay from 17ms (= max 60 fps) to 5ms (= max 200 fps); but as I expected there was no effect in the performance because the program cannot reach even 60 fps. The HTML, Flash and Silverlight programs can't reach 60 fps so they're not limited by capping. Additionally, they all saturate the CPU - HTML uses a full core (25%), Flash uses a bit more (30%). Silverlight uses two full cores (50% of my quad-core CPU!), very surprising because I didn't run the multithreaded version of the benchmark, and because the score is actually terrible considering that it consumes 2X more CPU power than other runtimes that deliver similar fps ratios.

    GUIMark was designed to measure only a RIA runtime's animation & graphics pipeline - layout, drawing and composition engines; it doesn't run any significant amount of "application code", so it should not benefit from a more efficient language and JIT compiler... Except of course, that JavaFX eats a lot of dog food - its core runtime is partially Java bytecode. But it also contains significant native code (including "GPU-native" shading code), and remarkably in Prism I wouldn't expect the Java code to be critical; still, HotSpot Server consistently makes a big difference: roughly double GUIMark performance. Profiling the VM, I noticed that HotSpot Server optimizes java.nio's direct buffers much better, as well as some other APIs involved in bulk data manipulation likeArrays.fill(); these methods are all over Client's profile but totally absent in Server's (which means intrinsic compilation). Prism heavily relies on these methods for the interface with the accelerated pipeline (D3D in my tests on Windows). This seems to hint that even after Prism ships, JavaFX performance could gain yet another significant performance boost: the Client VM just needs to acquire a few critical optimizations that are currently Server-exclusive.

    Static Footprint

    JavaFX 1.3 promises many performance enhancements, including reduced startup time and memory usage, and this is critical because - remarkably now with 1.3's already very good core feature set - deployment is by far the most important factor for JavaFX's adoption.

                                         
    ProgramJavaFX 1.2JavaFX 1.3
    HelloWorld2 classes, 2.726 bytes2 classes, 2.579 bytes
    JavaFX Balls19 classes, 95.19 bytes19 classes, 117.005 bytes
    Strange Attractor62 classes, 563.769 bytes62 classes, 427.992 bytes
    Interesting Photos53 classes, 238.902 bytes46 classes, 431.741 bytes
    GUIMark9 classes, 93.841 bytes27 classes, 224.904 bytes

    The tally of compiled classes, for these few programs, shows a regression in javafxc 1.3 - it may produce 25% less bytecode (Strange Attractor), but will most often produce more bytecode, up to 140% more (GUIMark).

    Strange Attractor is the single app (except the trivial HelloWorld) that consists in a single .fx script file (more exactly, several .fx files, but they are all independent variations of the same program). The javafxccompiler can perform some important "closed-world optimizations": for example, a private or script-private property that is not involved in any binding expression in that script can be compiled without support for binding. On the other hand, when this overhead cannot be optimized out, generated code is typically bigger than in 1.2 - largely thanks to the awesome enhancements of compiled bind, that delivers higher-performance binding with a tradeoff in more sophisticated code generation. But even for the applications that have bigger static footprint like Interesting Photos, we are promised a net gain because the dynamic footprint is greatly reduced (no expression trees for interpretation of bound expressions); so you loose some Kb in code size, but you win more than you've lost in reduced heap usage. (This is the theory - but check the next section!)

    JavaFX Optimization Rule: Fine-grained decomposition into many small .fx files, with generous public members, will balloon the code footprint of your JavaFX app. Even if this is more than compensated by reduced dynamic footprint, you want both costs down if possible! Existing bytecode optimizers/obfuscators for Java (e.g. ProGuard) may help a bit, but won't be optimal as javafxc's code generation is very complex, and performing the closed-world optimizations I mention above is not just a matter of stripping unused class members.Suggestion: add a javafxc option to request these optimizations for all public properties in a project, assuming that no external, separately-compiled code will have "incoming binding" on our code - that's a safe assumption for most JavaFX apps (but not for libraries).

    Finally, the worst-case of GUIMark is related to binding: this programs makes some extensive use of binding - 53 bindexpressions, in a program that has 430 lines including spaces and comments. The "compiled bind" system seems to be as heavy on the call sites, as it is on the definition of bound variables. The program is also written in a very "scriptish" style, 80% of it is variables and functions in the global scope, it's possible that a more OO style with higher encapsulation would help the binding optimizations. Notice that compiled bind is not really complete; no less than 9 compiled bind optimizations have slipped into JavaFX "Presidio" (1.4). This includes at least one item that would apparently remove the code bloat at call sites:JFXC-4199: Use a delegate class instead of creating a new class for objlit.

                                                              
    ProgramJavaFX 1.2JavaFX 1.3
    (Swing)
    JavaFX 1.3
    (Prism)
    HelloWorld1.660 classes1.717 classes984 classes
    JavaFX Balls1.847 classes1.885 classes1.111 classes
    Strange Attractor1.894 classes1.996 classes1.201 classes
    Interesting Photos2.095 classes2.032 classes1.193 classes
    GUIMark2.033 classes2.207 classes1.360 classes
    AWT HelloWorld1.050 classes
    Swing HelloWorld1.206 classes
    Swing GUIMark1.511 classes

    In the table above, I run each program with -verbose:gcand check the number of loaded classes up to the startup screen. JavaFX 1.3 loads a few more classes in 3 of the 4 tests; some tiny average increase is expected considering its much increased feature set, but it's nothing to worry about.

    On the other hand, Prism turns an impressive win: a minimal HelloWorld program loads less than a thousand classes, saving 60% off the Swing toolkit. We shave off 733 classes for HelloWorld, 839 classes for Interesting Photos. The latter is still a simple program, not even completely loaded - I stop counting when the initial screen appears, before any action or user event - but this is the right way to measure startup overheads. For a bigger real-world app the ~800 classes saved by Prism may be a small fraction of all loaded classes; but this is irrelevant for loading-time experience if the app is able to display its welcome screen quickly and then load the rest of classes and resources on demand or in background.

    Just ask Flash developers: it's hard to walk into a real Flash app (excluding tiny widgets like menus, animated ad banners, or video-player shells) that doesn't need incremental loading techniques - including the infamous loading progress animations. Still, Flash gets a rep of "instant-load" experience, just because it bootstraps fast - the core runtime and the application's startup code are able to load quickly. When JavaFX can do the same, we are in competitive territory.

    Looking at the classloading stats for standard JavaSE applets, they are almost as lightweight as JavaFX+Prism - an AWT-based HelloWorld applet loads 66 extra classes (6% more), a Swing version loads 222 extra classes (22% more). JavaFX sans Prism has the worst classloading story, because it combines two graphics/GUI toolkits, as the JavaFX stack is layered on many core classes from AWT, Java2D and even Swing... JavaFX was initially a big regression in bootstrapping costs. Prism not only fixes this regression, it's even more lightweight than the legacy APIs, which is impressive. Sun has really faced the challenge to dump a huge piece of Java legacy and rewrite it from the ground up, introducing a full-new toolkit that has zero dependency on the old one. There are more advantages in this approach, like starting from a clean slate (no need to support bug-per-bug backwards compatibility with 15 years of legacy), embracing modern architecture (GPU acceleration), and much better portability (Prism is reportedly very portable across OSes and even JavaFX profiles; the JavaFX TV is built on CDC + Prism, and inspection of the SDK's jar files apparently shows that Prism shares more components with the Mobile runtime too).

    Sun has really pulled a Swing Killer of their own. Too bad that IBM's SWT, for all the secession it caused in the Desktop Java ecosystem, only focused on conventional, GUIs, without much thought for graphics/media-rich apps or direct tapping of GPU capacity... perhaps it was too early for that in ~2001. The fact is that JavaFX+Prism renders obsolete not only all of AWT/Java2D/Swing (and JavaME's LCDUI and more), but also SWT/JFace. The Eclipse Foundation seems to only believe in pure-HTML RIA, they are only evolving their RCP technologies towards the web (RAP & RWT), so JavaFX is the major option forward for Java rich GUIs apps that can't fit in the feature and performance envelopes of HTML5 + JavaScript.

    Dynamic footprint

    In the last test, I look at each program's memory usage, using-XX:+PrintHeapAtGC and executing jmap -histo:live <pid> on the JVM after it was fully initialized (this triggers a full-GC on the target VM, so I can see precise live-heap stats).

                                        
    ProgramJavaFX 1.2JavaFX 1.3JavaFX 1.3
    (Prism)
    HelloWorldHeap: 628K
    Perm: 2.716K
    Heap: 671K
    Perm: 3.318K
    Heap: 421K
    Perm: 3.199K
    JavaFX BallsHeap: 1.085K
    Perm: 3.685K
    Heap: 801K
    Perm: 4.161K
    Heap: 635K
    Perm: 3.779K
    Strange AttractorHeap: 13.876K
    Perm: 3.764K
    Heap: 18.448K
    Perm: 4.306K
    Heap: 17.629K
    Perm: 3.957K
    Interesting PhotosHeap: 2.073K
    Perm: 4.264K
    Heap: 2.039K
    Perm: 5.308K
    Heap: 739K
    Perm: 4.501K

    JavaFX 1.3 uses significantly less heap memory than 1.2 for JavaFX Balls; a bit less for Interesting photos, a bit more for HelloWorld, and a lot more for Strange Attractor. The latter creates a big linked list of tiny Particle objects, and I wouldn't expect these objects (simple {x, y, z, next} class) to have a different footprint, so I decompiled the classes emitted byjavafxc and checked the instance fields emitted for theParticle class. For JavaFX 1.2:


    int VFLGS$0;
    public double $strangeattractor$MainListDouble$Particle$X;
    public double $strangeattractor$MainListDouble$Particle$Y;
    public double $strangeattractor$MainListDouble$Particle$Z;
    public Particle $strangeattractor$MainListDouble$Particle$Next;

    Now let's check JavaFX 1.3:


    public short VFLG$Particle$X;
    public short VFLG$Particle$Y;
    public short VFLG$Particle$Z;
    public short VFLG$Particle$Next;
    public double $Particle$X;
    public double $Particle$Y;
    public double $Particle$Z;
    public Particle $Particle$Next;

    The fields with VFLG$... names are bitmaps used by the binding system to keep control of which variables are dirty, needing reevaluation. JavaFX 1.2 used to need a single bitmap for the whole object (up to 32 mutable fields; objects with more fields could need additional bitmap fields). But now, it seems that JavaFX 1.3 uses extra control fields that are per each object field - possibly for lazy binding or other compiled-bind enhancements (seemy past discussion of binding). This added four shortvalues to my object = 8 bytes (possibly more with alignment), a ton of overhead for a class that will have 300K instances. The odd thing is that this class is script-private, and although its fields are mutable, they are never used in binding expressions - andjavafxc 1.3 is supposed to be smart enough to remove these overheads for code that can benefit from closed-world (script-local) optimizations, remember? But it didn't work here; this may be a limitation or a bug. I wondered that the generated code could be better for the MainSeqDouble variant of the program (where Particle's fields are only modified at initialization), but it's exactly the same, even if I change the fields visibility from default (script-private) to public-init. (public-init fields can be modified after initialization; but only by code from the same script.)

    In the bright side, JavaFX Balls's significant heap savings should be attributed to the larger number of scene graph nodes created by this program, even at startup with 16 balls. I repeated the test with the heaviest possible test: 512 balls, with 2D and Effect options. The heap usage was: JavaFX 1.2 = 13.703K; JavaFX 1.3 = 6.865K, JavaFX 1.3 (Prism) = 10.787K. Just like promised,JavaFX 1.3 saves a big amount of overhead for scene graph nodes. But it's remarkable that in this benchmark, Prism will consume more memory than the Swing toolkit and land in the middle of the scale from 1.2's 13Mb and 1.3's 6Mb. It's possible that Prism is more aggressive in optimizations that require more memory (for caches etc.), but this result may also just reflect its current beta-quality stage.

    Finally, I looked at the "Perm" memory usage (PermGen, the region used by HotSpot to store code for loaded classes). JavaFX 1.3 uses more memory than 1.2 (from +12% to +24% in my tests); but Prism loads less code than the Swing toolkit, so it almost reverts to 1.2's PermGen sizes (+2% to +17%). All tested benchmarks have little code of their own, so virtually all code comes from libraries: the JavaFX runtime and its dependencies in the JavaSE core. Even if JavaFX 1.3 has higher PermGen usage, this is probably much less important because the runtime has a fixed size, its cost doesn't scale linearly with application size or complexity (once you load all APIs and widgets, the runtime won't grow anymore for bigger apps).

    [It's hard to assess GUIMark's dynamic footprint, because the animation cannot be paused; also the allocation rations are intense so the next section will be sufficient.]

    Garbage Footprint

    Having less objects retained in the heap is not that good if a program allocates too many temporary objects, causing memory pressure that forces the heap to expand, high GC overheads, and long GC pauses. In my last test, I executed the benchmarks with-XX:+PrintGCDetails and -XX:+PrintGCTimeStamps, to calculate the number of objects that are allocated and recycled per time unit. I executed JavaFX Balls in the 512 Balls mode to have a fixed effort per frame. All scores are normalized to as bytes-per-frame.

                                 
    ProgramJavaFX 1.2JavaFX 1.3JavaFX 1.3
    (Prism)
    JavaFX Balls82 Kb/frame75 Kb/frame0,33 Kb/frame
    Strange Attractor10 Kb/frame5,75 Kb/frame 
    GUIMark22 Mb/frame24 Mb/frame75 Kb/frame

    Comparing JavaFX 1.2 to 1.3, in the first two tests the new release burns 43% less memory in the Strange Attractor benchmark - which barely does anything other than event handling, some math and writing to a raw int[]. So, this test shows that JavaFX 1.3 is more efficient, allocating half the temp objects to sit doing almost nothing. ;-) The JavaFX Balls test shows that 1.3 is also more efficient, 8% advantage, doing heavy animation with many nodes. So far, a nice incremental improvement.

    GUIMark shows a small regression in JavaFX 1.3, but it's difficult to judge these results because the frame counts are very low and the animation engine is having to skip frames. Anyway, both allocation scores are extremely high, which certainly helps to understand the awful GUIMark scores of 1,81 fps and 2,81 fps: it's not just some dumb layout calculation algorithm; the system it allocating memory like there's no tomorrow, which is both wasteful in itself, and points to other internal inefficiencies.

    Prism is the big winner though, pulling order-of-magnitude gains again. Prism's performance comes from a smarter architecture, and we can see another aspect of that here - the Prism animation engine allocates vastly less temporary objects. To be exact, it's under 0,4% of the Swing toolkit's allocation ratios for both JavaFX Balls and GUIMark, which is some insane amount of optimization. Well, other way to look at this is that the Swing toolkit is really broken for tasks like text rendering and complex layout.

    The Java/Swing GUIMark scores 5,4Mb/frame, which makes it only ~4X better than JavaFX. The Swing code has no scene graph, it's all immediate-mode rendering, which saves some live memory. Still, a quick profiling session shows abundant allocation of temporarychar[], int[] and byte[] arrays - a smoking gun for drawing and text code that needs to copy data into temporary buffers that are not reused. This provides an interesting argument against the common wisdom that scene graphs are less efficient than immediate-mode rendering. Truth may sometimes stand in the opposite side, because a scene graph is able to reuse buffers and other expensive helper objects across frames. My analysis of the Java/Swing GUIMark source code revealed very few allocation inefficiencies - creation of some constantColor and GradientPaint objects in the paint methods - and I fixed them all, but there was no impact in performance; the byte-wasters are definitely inside JavaSE's graphics and text APIs.

    The GUIMark tests need some Full-GCs, except again for Prism. HotSpot Server will avoid Full-GCs even for the Swing toolkit, but that's because it sizes the heap more aggressively. Server's default starts at 64Mb but it stabilizes at ~68Mb for JavaFX 1.2 and ~90Mb for JavaFX 1.3; while HotSpot Client runs the program within its much smaller default heap size of 16Mb. That's one of the reasons why you don't want HotSpot Server for client-side apps. I could have made these numbers better, and probably gain some performance, with some manual tuning - but my execution rules for all RIA benchmarks include zero manual tuning. For GC in particular, the "ergonomics" feature of HotSpot must be good enough for client apps. For Prism, once again we have ideal behavior, both variants of HotSpot run the program with their respective default heap sizes.

    Porting

    My experience running all these testes was not 100% smooth. I've already reported the worst issues, with StrangeAttractor, but these were expected/deserved as I've relied on internal APIs.

    JavaFX 1.3 is not 100% compatible with JavaFX 1.2. Check the JavaFX Migration Guide. This time around, we have very minor language changes.

    • The change to forward-reference behavior is very welcome (fixes a language design mistake; previous behavior produced code that didn't behave as expected by the programmer).
    • Binding expressions must now be pure (no side effects), a very welcome change - JavaFX Script has a light functional subset, and bound expressions are one place where enforcing pureness delivers great benefits.
    • The change of binding to the default lazy behavior is the same - temporal transparency is another important part of functional purism, and once again it allows the compiler to be extra smart; you just need to make sure that your program doesn't depend on eager behavior, e.g. using a bound expression to invoke a method that must be run at initialization time.
    • The new on invalidate clause and isReadOnly()synthetic method should not break any code, except perhaps if you use these names as identifiers.

    Then we have some breaking API changes; most are very subtle issues like changed access modifiers, or changed algorithm for preferred size calculation. Some larger changes in layouts, controls and charts mean that apps having complex UIs, e.g. with nontrivial layout code, should be the most affected. Nothing in the lists above affected any of my tested programs.

    When testing Prism, I've had some extra trouble with my usage of non-public APIs. The code and properties that I use to disable the animation engine's "pulse" (that caps the number of frames per second) does not work on Prism. Setting the propertycom.sun.scenario.animation.pulse to 1000 has no effect on Prism, and while I've found Prism's properties (checkprism-common.jar /com.sun.prism.pk.PrismSettings), there is no equivalent "pulse" property. There is a property prism.vsync that I can disable, but the only result is that instead of a perfect 120fps rate (on my system/monitor at least), Prism will use other mechanisms (just like the Swing toolkit) to approach the same 120fps target. I will appreciate any hint if you know better.

    Bye-bye Swing Bridge

    Beware too of the javafx.ext.swing package, as it is completely unsupported on Prism - will bomb with aNoClassDefFoundError. This is not going to change;Prism cannot support legacy Swing controls - in fact, Prism configures the JVM run run in AWT Headless mode, so many AWT/Java2D/Swing classes will fail to work even if you create them directly. Prism does its own access to video hardware, creation of top-level windows, has its own Event Dispatch Thread etc., so it's just not possible to allow the equivalent stack from JavaSE to run simultaneously. If you've been relying on thejavafx.ext.swing package to easily port legacy Swing code or to compensate for missing JavaFX controls, be warned that in some future release, Prism will most certainly become the only supported toolkit, and the JavaFX/Swing bridge will be gone for good. (I suppose that he first release with production-quality Prism will make it default but still support the Swing toolkit; the latter could be removed in another release or two. But it will be removed. Won't likely get any enhancements or even fixes in this EOL period, either.)

    This also explains why the JavaFX team didn't do a better job in that bridge, not implementing some features requested by Swing programmers, e.g. embedding JavaFX components inside Swing apps - I suppose they could have implemented these features, but that would just push more people to do quick Swing/JavaFX ports or hybrids, that would hit a brick wall in a future JavaFX release.It's not that Sun was evil, refusing to support Swing developers and their assets of code, tools and components. Now that 1.3 already offers a decent set of controls - ok, at least considering the experimental ones, and third-party like JFXtras's - and has really powerful and easy building blocks to create new controls, I'll warn any adopters to consider the Swing bridge mostly as a compatibility feature... for JavaFX 1.2 apps that already used Swing controls. Not as something that should be used for development of any new, JavaFX 1.3 app.

    I believe though, that it should be possible to create a new compatibility layer: a Swing-over-JavaFX implementation, where the Swing controls and higher-level API are implemented on top of the JavaFX scene graph. The SwingWT project does something similar (Swing over SWT), so my suggestion is probably possible - but it's also very likely a pretty big effort. For one thing, this could be justified as a transitional solution for the NetBeans RCP platform.

    Conclusions

    In this blog I've focused only in a few areas of JavaFX 1.3, mostly performance and Prism, and still with a limited set of benchmarks. I didn't even start looking at the new controls or any other changes, and 1.3 has quite a few new & enhanced features. But performance is critical, it can spell the success or the doom of a platform. JavaFX was already the RIA toolkit to beat in some performance aspects - although mostly due to the superiority of its underlying VM, that (even with the Client version of HotSpot) is second to no competitor's. Java's technology for JIT compilation, GC etc., just rules. But JavaFX it still needs to catch up in some important issues, remarkably startup time and footprint. This requires continuous improvements in both the JavaFX and the JavaSE runtimes. JavaFX 1.3 adds some nice incremental improvements across the board, but it still carries the weight of the legacy AWT/Java2D/Swing toolkit, plus the limitations of JavaFX's first-generation animation package.

    What you really want is Prism, which puts some impressive results here, in every performance aspect from speed to loading time and footprint. Prism wins by a landslide the previously-embarrassing GUIMark test; enables upcoming advanced features like additional 3D support; and there's rumor it will makezero-calorie french fries too.  So, when is Prism shipping? I guess that will be JavaFX 1.3.1 in June. Prism was originally supposed to ship in production quality in 1.3; its current EA release looks surprisingly good (although my tests are admittedly very limited); and JavaFX TV's FCS depends on it. This certainly shouldn't freeze JavaFX adopters again until the next release; yes the next one will be even better, but even without Prism the JavaFX 1.3 release is already a big step forward, greatly expanding the number of applications that JavaFX could serve very adequately.

    In my last attempt to stress the JavaFX platform, I ported the Strange Attractor demo/benchmark. Different from JavaFX Balls, this is not scenegraph-driven animation, but old-school "pixel by pixel" drawing… still, makes for another batch of interesting findings, including a few issues in the JavaFX Script language and its compiler, and other topics like fractal maths, BigDecimal, and JDK 7's stack allocation.

    UPDATE: All webstart apps here are now updated for JavaFX 1.3, so their performance may be different from what is described by the article.

    I have found Strange Attractor in Miguel de Icaza's blog, listing three implementations: Canvas/JavaScript, Flash/AS3, and Silverlight – in this order of increasing performance. The general point seems to be that static-typed languages will wipe the floor with dynamic languages when it comes to performance. I happen to agree… except that there's one important RIA platform missing on that list ;-) so I went to Joa Ebert's blog, fetched the Silverlight code, and ported it over to JavaFX.

    The porting was easy, as soon as I found my way around JavaFX's limitations. It happens that JavaFX doesn't let you "paint" a component, like AWT/Swing and most other 2D toolkits. You can only "draw" things by composing scenegraph nodes. But this wouldn't work here: Strange Attractor is a particle animation demo, and it uses 300K particles to render a 3D fractal. I could use a tiny, pixel-sized rectangle for each particle, but this very likely would come dead last in the performance race. Even if JavaFX's scenegraph scales very well, the memory weight of all those nodes and the rendering overhead would certainly kill it.

    But the solution is simple. First, I create a ImageViewnode for the animation. This contains a Image object that is initialized from a blank image. So far, standard stuff. Now, in order to "paint" the fractal in this Image, I do this:

     

    function move (deltaX:Float, deltaY:Float)
    {
        var pixels = ((img.platformImage as BufferedImage).getRaster()
            .getDataBuffer() as DataBufferInt).getData();
        java.util.Arrays.fill(pixels, 0x000000); 
        ...
                pixels[index] = min(pixels[index] + 0x202020, 0xFFFFFF); 
        ...
        imgView.impl_transformsChanged();
    }
    

     

    I have to resort to two "internal" tricks. First, I access theImage's platformImage property; its declared type is opaque (Object), as the actual type is platform-dependent. For the desktop profile, implemented on top of Java SE APIs like Java2D, that type is BufferedImage. So I just need to cast, then use standard Java SE APIs to put my dirty hands on the int[] array that contains the pixels. I can fill this array with black with Arrays.fill(), read and write individual pixels by just indexing its positions, etc.

    In the second trick, as soon as the frame is complete, I callImageView.impl_transformsChanged(). This is another internal method; it is invoked automatically by the runtime when the node's transforms are changed. Normal apps never need to call it, so it's not a official API. But it has the side effect of forcing the ImageView node to refresh itself from the backing pixels. Notice that my ImageView has no transforms at all, so this should not perform any other redundant work.

    In an ideal world, we'd have a officialImageView.invalidate() method. There are some issues with my hacks, so I filed the bug RT-5548: Provide (official) support for bitmapped rendering. I explore the issue in more depth in this bug; so just read, comment, or vote there if you are interested. This is all we can do to influence/lobby a project that's not open source. I will just paste my final comment here: "Right now JavaFX is pretty hard for third-party extension developers. Suppose I want to create a new Control that really demands custom (non scenegraph-based) rendering, what should I do? Full source code is not available so I can't consult it; in-depth technical documentation does not exist at all; the platform still misses important functionality for some people. If the team at least provides some guideline about JavaFX-to-native-2D integration, at least we can work around these limitations while they exist."

    The performance

    So, just how fast can JavaFX move these particles? I've found that this depends on several factors, so I actually created four variants of the program, identified below by their class names (I've bundled each in a single .fx file, mostly to make the variants easier to manage). In these names, Float = float precision; Double = double precision; List= particles are stored in an ad-hoc single linked list with anext pointer in each particle; Seq = particles don't have that pointer, but they are stored in a JavaFX Script sequence. The version that matches the other ports of Strange Attractor is MainListDouble.

    I tested with the early access of JDK 6u18, yet another important update for client-side Java in general and JavaFX in particular; for this specific benchmark, 6u18 brings a update version of HotSpot so CPU-bound code should benefit. (Click each program's name to launch it; source code here.)

                                   
    ProgramHotSpot Client 6u18ea-b01HotSpot Server 6u18ea-b01
    MainListFloat78 fps111 fps
    MainListDouble74 fps95 fps
    MainSeqFloat68 fps92 fps
    MainSeqDouble60 fps75 fps

     

    Some pretty interesting results here. First, the score seems to be very influenced by memory access. The major difference between the four variants is the size and layout of particle data. EachParticle is a simple object with x, y,z fields; and also the Java object header, and an extraint VFLGS$0 field that's used by JavaFX Script's properties (each 32 properties share one such field which is a bitmap; classes with more than 32 properties need additional bitmap fields). We have 300K particles, so a Float particle is 24 bytes = 7,2Mb for the entire fractal; and a Doubleparticle is 36 bytes, likely 40 due to alignment = 12Mb total (estimations for a 32-bit JVM.) Even the lower value doesn't fit entirely in my Q6600 CPU's 4Mb L2 cache, and there other memory pages involved in rendering (the pixel array that's 880Kb; code from the app, JVM, OS…), so the rendering will hit the FSB hard.

    The List variants are also faster, why? The datasets are the same size – there is an additional reference field in each Particle, but then I don't need a sequence object with one reference per particle. JavaFX's sequences are well optimized, they are backed by native arrays just like Java SE'sArrayList. (The real story is more complex – there are many concrete sequence impls and the compiler picks and changes the most adequate as needed; sequences of value types can map to optimized sequences without boxing overhead so it's even better than ArrayList and closer to a growable version of primitive arrays.) But there is some small overhead to iterate the sequence, and once again there's worse memory locality. In theMainSeq* programs, the heap will contain one huge array with at least 300K references = 1,2Mb; plus 300K Particleobjects somewhere else. The sequence's backing array is treated by the JVM as a "large object", which by itself may have some performance consequences. But the major problem is that a sequential iteration through all particles will demand a non-sequential memory access pattern, alternating between the sequence's backing array pages, and other pages containing the particles.

    The JIT compiler also appears as a very important performance factor. HotSpot Server shows a whopping 42% better frame rate in the easiest test MainListFloat; in the hardest,MainSeqDouble, it still produces a very large advantage of 25%. But this result is very interesting because the animation's inner loop is relatively simple, it just performs a few multiplications for coordinate transformation and plots a pixel in the resulting position. The particles are all constant data, the transform matrix is calculated only once per frame, and the inner loop contains no expensive operations like allocation. (There is one call to a tiny method that's trivial to inline even for HotSpot Client; performance didn't change after I refactored some code to introduce this method.) I guess HotSpot Server is just smarter in memory access, e.g. with prefetching instructions.

    Then I said to myself: What a wonderful… no; I said: how could I make this code even faster? One obvious target is avoiding the cost of JavaFX Script objects, which are a bit larger than similar Java objects due to those property bitmap fields. The new variant MainListFloatJava implements the Particleclass as a Java object. And while I'm at it, why not eliminating all object model overhead completely and just store all particle data in a raw float[] with 3 consecutive positions for each particle's (x, y, z) data? The variantMainListFloatRaw does this.

                       
    ProgramHotSpot Client 6u18ea-b01HotSpot Server 6u18ea-b01
    MainListFloatJava81 fps115 fps
    MainFloatRaw142fps170 fps

     

    Once again, very interesting speedups. The Java variant is almost 4% better for both HotSpot Client and Server, a nice although not vast advantage; but the Raw variant is an incredible 82% better for HotSpot Client, and 53% better for HotSpot Server. (The fact that Server gets a smaller speedup reinforces the thesis that its advantage in the previous tests were mostly related to more efficient memory access – Server caught most low-hanging fruit in the previous test.)

    Whining/Wishing Dept.

    The fact that I could make this program a full 4% faster by just rewriting a trivial class in Java means that the overhead of the binding bitmaps, inserted by javafxc, is pretty annoying. I did my best to help the compiler: my Particle class is script-private, its properties have no triggers and are never involved in binding expressions, which allows javafxc do optimize out some of the generated code. But that was not enough. Checking the bytecode, I identified several optimization opportunities so I filed bug JFXC-3456: Optimize handling of VFLGS$N bitmaps and other property-related code. (This blog was a long time in the making, so this bug is already fixed for SoMa and marked as a dupe; it seems the Compiled bindrework of the next release, mentioned in my investigation of binding performance, is making fast progress.)

    And I was also horrified with the way javafxc (alas, the JavaFX Script language) handles null values: all nulls are masked out, so we never get a NullPointerException. I failed to perceive this in previous explorations of JavaFX (it's not documented in the Language Reference). I reported this as a bug in JFXC-3447: Support NullPointerException. Yeah I'm asking to have my NPEs (and also a few other important exceptions) back – at least, to have some control over this critical behavior; please check the bug report before you think I have some fetish for stack traces.

    And yes, these results are yet another evidence that the Java platform would benefit from value types, so I could have a headerless Particle class (ok, struct) and put it in a by-value array (objects stored directly in the array without references). This would produce exactly the same memory layout of the MainListFloatRaw, except that my code wouldn't need several changes (for worse – low-level array manipulation like particles[i + 1] instead of particles[i].y, etc.) This memory layout requires 300K*4*3 = 3,6Mb, just half the footprint of MainListFloat, and it's even better as all particles are laid out in perfect sequential disposition. We're still overflowing the L2 cache, but much less than before so the performance gain is huge.

    The Java community claims for some kind of value type support since ever; the last attempt came from John Rose's Tuples proposal – a relatively modest and easy change, but still, out of JDK 7. The Java language is basically frozen when it comes to fundamental capabilities… but not the Java platform. See the great JSR-292 that basically "fixes" the JVM for all dynamic-typed languages. This is a good precedent because this huge platform enhancement is basically useless for the Java language. The DaVinci project is also working hard on all sorts of cool features to support immediate (headerless) objects, including fixnums, tuples, structs, inline arrays in tail position, etc.; and also tail calls, continuations and other fundamental techniques that are worth gold for many languages; see this presentation. These enhancements from the DaVinci Machine will most probably not come to a future version of the Java language, but they will eventually be adopted by other JVM languages like Scala, Clojure, JRuby etc.; and obviously JavaFX Script, if Sun gets its act straight.

    Meanwhile, JDK 7 is also making great progress in the Escape Analysis-based stack allocation optimization, that was recently turned on by default. Some days ago, Slava Pestov was happily tweeting how Factor kills HotSpot Server in a Mandelbrot program (we all love fractal calculation microbenchmarks…). I found not only that Java was faster (from 160ms to 46ms) after eliminating a Complex class that caused 300Mb of allocation per run of the benchmark, but even maintaining this class, JDK 7b72 could run 2,15X faster (74ms) thanks to reducing the churn to 110Mb per run. Keep in mind however, that Escape Analysis is by definition only good for temporary objects that don't "escape" a single method (or basic block, trace, or whatever the optimization unit); this optimization won't be any help for long-lived data like Strange Attractor's particles.

    JavaFX vs. Other RIAs

    I didn't compare "officially" JavaFX to the other versions of Strange Attractor; some comments though. The JavaScript/Canvas version is dog slow, max ~7 fps here even in the latest browsers with post-modern JavaScript JITs, not surprising because dynamic typing sucks (ok, I'm repeating myself). The AS3/Flash version is better but not stellar at ~25 fps, thanks to its optional static typing (that is for real, and not a joke like in Groovy). Silverlight is easily the best of these three, and my JavaFX version appears to be even faster (apples-to-apples, fair comparison is the MainListDouble version with HotSpot Client = 74 fps). But the Silverlight program is missing a FPS counter, and even if it that was available, the animation would probably be capped by the display frequency or some standard rate like 60Hz; JavaFX usually does that too, but in my code I resorted to the same tricks used by JavaFX Balls to reach the maximum possible fps.

    Also, you cannot compare CPU usage, because my JavaFX version is smart enough to only render new frames when the 3D image's position changes; all other versions don't do that, they will peg one CPU core at 100% all the time even if you keep your mouse parked so all frames are identical.

    The .NET platforms does support value types, so it could potentially be optimized to use this feature, and (unless the .NET JIT compiler is really poor) Java's only hope to match it would be resorting to a low-level implementation likeMainListFloatRaw (or waiting for the fruits from the DaVinci Machine project).

    Fractal Mystery

    Now, the most interesting comparison is not the performance, but the actual image produced by each program. The three original versions produce basically the same image, modulo details like color and some FPS display. But my JavaFX version is distinctively different – check this:

    Silverlight:

    JavaFX:

    I captured the images in similar positions (mouse parked at the lower-right corner); the difference is very noticeable. My program produces a bigger image, remarkably in the outer "corona" of the fractal – the deviation seems to grow as function of the distance from center. Now, I'm just a rookie in the maths involved in these graphics: since the old times of FractIntfor DOS (most awesomest fractal platform evar!), I'm content to code formulas that I find somewhere else, and amuse myself with the result without really understanding it in depth. In this case I didn't even code anything, I just ported C# code to JavaFX Script. The JavaFX image looks better and more complete, but this may be just my bias. Can anybody explain this difference?

    While investigating this, I changed the code that calculates the color for each rendered pixel, so particles closest to the observer are brighter, the figure looks solid, and the object is easier to inspect. Performance goes down ~4% in HotSpot Server, ~10% in Client; but the 3D effect is pretty nice (especially when animated).

    JavaFX (3D enhanced, Float):

    javafx2

    But the lack of bitwise operators in JavaFX Script is irritating (I have to use a clumsy Bits class with methods likeshiftLeft(), etc.). JavaFX Script aims to be a high-level language, but come on - the extra operators would not add any significant complexity, they are bread-and-butter material, even competing "languages for designers" like JavaScript and ActionScript have those operators. Perhaps it's just a leftover from JavaFX 1.0, that didn't even have integral numeric types. But there's more – the language omits symbolic operators&&, || and !, forcing us to the keywords and, or and not… that are much less readable (compare "aa and bb or cc" to "aa && bb || cc"). And not consistent too: not-equals is != so the exclamation point still lives meaning negation. And the vertical bar in "[a | a > 5]" has the same role as where in "for (a in b where a > 5)". IMHO, operator syntax is one area where JavaFX Script's design should be fixed.

    Now, the most interesting discovery, facilitated by this enhancement, is that the fractal changes significantly with numeric precision. Compare the previous image with the following:

    JavaFX (3D enhanced, Double) (click image to run):

    javafx3

    The last image, created with Double precision, is noticeably different in the outer corona where you can see series of bands, like in a snail' shell. Most of these bands are inexistent, or very hard to see, in Float precision, because the particles that should form the frontiers are in slightly wrong positions. If you're a veteran fractal lover, this is not news – fractals are remarkably dependent on numeric precision, in fact it's one of the very few CG technique where floats are not good enough to avoid severe artifacts. Most good fractal programs offer better-than-Double precision, necessary to render in deep zoom or high iteration levels. Strange Attractor performs 300K iterations over a single (x, y, z) position; this is a enormous number of iterations, so any imprecision will quickly escalate into noticeable artifacts.

    If Double is better than Float, wouldn't big decimals be even better? I recoded the calculation method with Java'sBigDecimal. The resulting code is horrendous (as usual), which is bad enough in Java but definitely doesn't "fit" in JavaFX Script… you'd expect a language like that to offer a seamless arbitrary-precision decimal type. We could just have some syntax sugar over BigDecimal, to be able to use *instead of multiply(), etc. But the performance would still suck (as usual) because BigDecimal is immutable and the churn of object allocation and GC burns more CPU than the actual calculations. The Java platform desperately needs mutable counterparts of BigDecimal and BigInteger (some implementations already have these for the internal implementation of some operations, but the mutable classes are not public). Then, many high-level languages like JavaFX Script, Groovy, Scala etc., could offer a decimal type complete with operators and other special syntax and semantics, but reusing java.math's implementation and interoperable representation. The mutable objects wouldn't eliminate the advantages of immutability if the programmers don't use them explicitly – the source compiler could do that automatically to compile expressions requiring temporary values (much like javac does for string concatenations since JDK 1.0); still, public mutable APIs would allow much further manual optimization (and value-typeBigDecimal, even better…). Anyway, after waiting a few seconds for this calculation (limiting precision to IEEE128 ~= 2X better than Double), the resulting image is not any different to the naked eye, so Double was already good enough.

    But after this digression, the conclusion of this experiment with numeric precision is that… I still don't know why the other languages produce different output even at same precision. The Java platform is well-known to have a very strict math spec, but the fractal calculation uses extremely basic arithmetic (only multiplications and sums) so this should not be a factor.

    Sun just released the first maintenance update for JavaFX 1.2. This release brings mostly a batch of important javafxc fixes, that I dissect in this blog...

    Java programmers are used to the fact that compilation of Java source code is a relatively straightforward process, because the Java language has a simple mapping to the Java bytecode. Sojavac is a trivial compiler, at least in the code generation phase. (Other phases may be more complex – in particular, type-checking has become significantly complex after Java5; but due to erasure, even this basically doesn’t impact code generation.) There haven’t been many code generation bugs over the lifetime of javac, and there are no significant differences in code quality between releases or betweenjavac and other independent compilers (like ECJ). This is different for example from C/C++, where different compilers may produce code of wildly distinct quality.

    Of course, the performance and correctness of your codecan still vary between JVMs, but that would happen in the bytecode-to-native translation, performed by the JIT compiler. So Java compilation is not really different from C/C++, it’s just split in two layers where all the hard work goes in the lower layer of JIT compilation.

    Now with JavaFX, we are closer to C/C++ compilers thanjavac was. The JavaFX Script language is not a near-1-to-1 mapping of the Java bytecode spec. Quite the opposite, this language – although very similar to Java – contains a few features that require significant extra intelligence to be mapped to bytecode. In ascending order of complexity (as I estimate it):

    • Mixins
    • Sequences
    • Binding

    No surprise that the javafxc compiler is still distant from the standards long established by javac, i.e., to reliably generate bytecode that contains no error and is as efficient as physically possible on the current JVM/bytecode spec.

    Now you may think that javafxc is still immature, and this is not really wrong, but reality is more complex: like most higher-level languages, many language features depend on runtime support. Code produced by javafxc doesn’t run on top of a plain JRE even if you could avoid explicit use of anyjavafx.* APIs; it depends on a significant runtime library (javafxrt.jar: ~1,1Mb in the SDK). So, part of the issues you may have with the JavaFX Script language are proper compiler problems, but others may be caused by bugs or inefficiencies in runtime classes.

    When a bug ix fixed in javac, you can just recompile your sources and you get bytecode that’s faster or more correct, even when executed in older JREs. But this rule doesn’t always work for javafxc; first, because some issues may live in the runtime; second, because the JavaFX platform is not yet mature enough to have a stable ABI. Maintenance releases keep at least backwards compatibility, so applications compiled byjavafxc 1.2.1 will run on 1.2.0 runtimes. But the downside, that comes from the split compiler/runtime implementation of the language, is that the range of bugfixes that can be delivered in a maintenance update is limited to those not requiring ABI-incompatible changes.

    A stable ABI doesn’t have to be immutable, or to provide forward compatibility (allowing newer apps to run on older RT). The ABI’s public interfaces can grow, they only can’t remove or change public interfaces present in previous versions. Even this level of compatibility may seem unimportant because the JRE can manage multiple JavaFX RTs, and new apps that require a new version will fetch it seamlessly on demand… but that’s mostly true for the Desktop profile. Hopefully JavaFX Mobile could support OTA auto-update, and even JavaFX TV might be updateable; but I’m not holding my breath. It’s more likely, at least in some devices, that such updates would be supported only through firmware updates, which doesn’t qualify as user-friendly / seamless. These profiles are yet in development stage so I am just guessing, but as soon as they start shipping in millions of devices, developers will demand stable evolution ofjavafxc’s ABI from that point forward.

    Well, but I digress. What’s new and improved in JavaFX 1.2.1 (codename LafayettePark for javafxc)? The officialRelease Notes has a broken link that gives you the fixes for JavaFX 1.0.1, so just use the Release Notes feature from the JIRA:JavaFX Script, Runtime. Many interesting javafxc issues, including:

    • Fixes to memory leaks, all related to binding(JFXC-3370, JFXC-3369,JFXC-3337, JFXC-3290).
    • Slacker Binding optimizations (JFXC-3247, JFXC-3236). These seem to be the “low-hanging fruit” of a general Slacker Binding feature (umbrella: JFXC-3235) which remaining items are scheduled for SoMa (next major version). “The idea of Slacker Binding is that, for side-effect free binds, unless a Location is externally required, in-line code is used instead of Locations.” In other words, the generated code is much more efficient (both memory and execution time) for special cases where not all functionality of binding is actually used.
    • Another similar umbrella item is JFXC-3055:Remove intermediate Locations from bind translation. In this case most fixes were delivered in 1.2, now another particular case is optimized in 1.2.1 (JFXC-3060) and there’s just one to go, probably for SoMa (JFXC-3059). These optimizations are similar to those in Slacker Binding, but apparently even better, wiping out it significant generated code.
    • More assorted binding optimizations, like JFXC-3317(Optimize bound JavaFX var select with mutable selectors), JFXC-3315(Optimize away bound select processing when selector is immutable), and JFXC-3089(Collapse on-replace triggers into setter).

    If you have some complex JavaFX app that uses lots of binding and you’ve been suffering of weird performance of memory leaking behavior, perhaps now you know why. ;-) And there’s more of the same baking in the JIRA, but 1.2.1 is already close to the point of diminishing returns in optimizations of the current binding support (but read on).

    This release also brings a few runtime fixes, the most important IMO being:

    • RT-5193 (Direct3D Acceleration does not work with unsigned applications), that could be a major performance issue for some people since hardware acceleration is a night-and-day factor for graphics-heavy apps.
    • RT-4959 (General FPS reporting (RT-997) broken in Marina b11): Essential for benchmarking junkies like me.
    • RT-4802 (Applets sometime do not respond to user input): Seems to be a rare bug (I’ve never hit it even with tabbed browsing), but pretty severe when it hits.

    Overall, this is a more interesting maintenance release than past ones like 1.0.1 or 1.1.1 that mostly fixed major screw-ups of their previous dot-zero releases. JavaFX 1.2.1 seems indeed to be a nice refinement update, while you wait for SoMa (1.5?) with the next batch of major features and improvements.

    Binding (and Triggers): the Next Generation

    But wait, there’s more! It seems that the new round of binding optimizations in 1.2.1 was just a workaround while thereal solution for this problem is not implemented – that will happen in the Lombard release (apparently a minor update planned for a couple months after SoMa). The recently posted bug JFXC-3423(Umbrella: compiled bind) says it all, but I’ll copy & comment its description here because it teaches a lot about JavaFX Script and javafxc:

    The currently bind and on-replace are implemented by building an object graph which represents the components of the bind expression (Locations, BindingExpressions, etc), the on-replace (ChangeListeners), and the relationships between them (dependencies, containment). This, let's call it Location-graph, is them executed - effectively interpretively executing the bind - to update values and maintain dependencies. Within an environment of developing semantics, limited resources, and shifting requirements, the compiler would never have been delivered without this flexible approach.

    However, the execution time overhead and footprint of the current implementation is judged to be roughly ten times what is acceptable. Many optimizations have already been applied, and the result is factor of two or more improvement - but there is a long way to go. Some optimizations have been simply making the existing implementation more efficient or handling special cases in a more light-weight manner. [I think this refers to JFXC-3055.]But the most seemingly promising optimizations are those that circumvent the Location-graph, instead implementing variables with bind or on-replace with in-lined code. [Now I think this refers to Slacker Binding, JFXC-3235.]These dynamically inflate to a Location-graph on-demand. In micro-benchmarks these give order of magnitude or more improvements in dynamic footprint. However, in real-world testing, the improvements are only a few percent. The reason is that references to these variables from inflated variables inflate the optimized variables - unraveling almost all optimized variables.

    The proposal is to move to compiling bind and on-replace into bytecode. Like the in-lining mentioned above this means compiling the value computation into generated code, but, more significantly, the dependency calculation and state needed by that calculation must be generated as well. As a result, Locations, and their supporting structures (BindingExpressions, ChangeListeners, etc) are no longer used, and would be removed. This, of course, saves the footprint used by these structures, but also saves the execution time to build and interpretively execute them, as well as removing the GC churn as these Location-graphs are built and torn-down.

    Executive Summary: If even after 1.2.1 you have complex JavaFX apps that use binding too heavily and experience an important performance hit from that, don’t despair, the cavalry is coming. The good news is that we’re always finding that JavaFX’s implementation still has plenty room to improve. In the runtime, SoMa will debut a big revamp of the Scene Graph, even bigger than 1.2: the Prism engine (mostly notpublic in the JIRA); RT-5252 (Support 3D perspective transforms for existing 2D JavaFX objects); RT-5474 (Scene graph optimizations), and many more. The bad news is that, if you tend to see the glass as half-empty, these pending improvements mean that even JavaFX 1.2.1 is not yet the *really* *hot* release that we wished to have since Dec 2008. The same logic applies to the JRE: 6u10 was a Giant Step for ManKind, and 6u14 even better; but now we’re waiting for 6u18 (to ship in tandem with SoMa) with another batch of important optimizations for loading time and general JPI/JAWS behavior. Not to mention JDK 7 with promising fundamental improvements like Jigsaw (enabling even better loading time, footprint and JavaKernel-like installation), and XRender (potential improvements for any Java graphics over X11).

    Filter Blog

    By date: