1 2 3 Previous Next


38 posts

JavaFX Balls 3.0 Blog

Posted by opinali Nov 25, 2011

Swing 2.0 is Coming Blog

Posted by opinali Sep 22, 2010

The biggest announcement - and the biggest surprise for many - of JavaOne 2010 was certainly Oracle's new plans for JavaFX 2.0... or, should we say, Swing 2.0?

The history of JavaFX has been contentious since its beginning, when it was clear that FX was a new toolkit, even a new platform, while most people in the brave Swing community wanted a "Swing 2.0". Well, this is basically what Oracle is planning to deliver with JavaFX 2.0 - minus the ugly legacy of the AWT/Java2D/Swing APIs and architecture. JavaFX 1.x was an ideal replacement for Swing, but ideals don't always work; JavaFX 2.0 will be a more pragmatic attempt at "Swing 2.0". It's just not a drop-in update. Swing code won't be trivial to convert. The announced "plugin3", capable of running Prism, will be 100% free of AWT dependencies. Swing interop will keep requiring using the JavaFX Swing toolkit: the one that's bigger and slower.

Non-evolutionary changes are sometimes necessary. The old UI APIs take my blame for Java's near-death on the desktop & web. Don't dream that the problem was just Sun's neglect, could be fixed by new VM optimizations, deployment improvements, JDK 7's Jigsaw... or the Tooth Fairy. As the partial success of the JDK 6uN project demonstrated once again, no amount of tuning or fixing can save Swing. Even in the Swing community, many wished for a compatibility-breaking Swing 2.0 that would just be similar enough to allow easy migration and interop.

From EJB 3.0 / JPA to the most-wanted JSR-310, experience shows that sometimes enough is enough: the only way for frameworks that didn't work, is the highway. This tells me that some people have double standards when they strongly oppose JavaFX because it's not a smooth update for the existing Swing ecosystem.

R.I.P. JavaFX Script

The JavaFX Script language was the big casualty of the new plan, and I will heartly miss it too; that was quite a fun language to play with. But maybe that's mostly because Java was, and still is outdated. Even Java 8, with lambdas and all, won't be as nice as JavaFX Script (at least for UIs).

Some people will be quite happy creating JavaFX applications with the Java language (even Java 6). Plan B is relying on alternative languages such as JRuby, Groovy, Clojure or Scala; the Scala examples look really good, often hard to distinguish from JavaFX Script. If Scala is extensible enough to build some sugar for JavaFX's binding, I'm sold.

Dropping JavaFX Script has some advantages of performance and footprint. JavaFX 2.0 promises improving on 1.x even with its much bigger feature set. If that looks hard, it's easy to understand: first, Prism is more lightweight than the Swing-compatible toolkit. Second, JavaFX Script tends to produce bytecode that's quite bloated, and a bit slower than equivalent Java code (in one experiment, I got a 4% speedup by simply rewriting a trivial POJO-like class in Java.). These inefficiencies affect both application code and parts of the JavaFX frameworks, originally implemented in JavaFX Script - not the perfect tool for the job. Finally, JavaFX Script was still work in progress: I was hoping that they would add more features, optimizations and refinements... but, time's up.

Some critics complain that JavaFX Script was a wasteful diversion, a science fair experiment. I think it was a quite cool language, and the barrier to entry pretty low to any competent Java programmer that would actually spend a couple days with it. And any new big framework needs many months of experience until you really get the concepts and architecture, understand the tradeoffs and performance aspects, and become capable of writing high-quality code. But language barriers-to-entry are more complex:

  • Any JVM framework is better developed in Java. I hate when I read (for example) about Clojure's persistent data structures, as I can't use that in my Java code (at least not in a smooth, natural way). So this great piece of engineering and innovation is locked in the small niche of Clojure fans. Nobody will be motivated to try the full Clojure package after a good experience using its frameworks from Java. From my own small niche of JavaFX Script fans, I didn't mind that others couldn't use JavaFX's awesome frameworks; but this tight coupling clearly didn't do JavaFX any favor. Lesson learned: Write system-level code with the system-level language.
  • Programming language adoption is black magic. Corporate push can only go so far - the developer community can be quite opinionated. But that community alone can also only go so far - languages / compilers / VMs are Cathedrals that typically demand enormous, long-term investment, very organized design and development. On top of that, add: leadership, evolution, standards, timing, luck... for example, I don't think Javascript would've stood any chance in a fair competition with VBScript. Javascript is much better, but it didn't win because it was better. It won because it was there first in a web that was already big, and still owned by Netscape. And because many years would pass before people really had to learn Javascript, to write complex Web 2.0 apps: that would be the opportunity for VB's argument, "it's familiar and Microsoft has a pointy-clicky IDE". Lesson learned: Proposing a new language is always a big risk.

JavaFX Script didn't realy fail, though; it taught us many interesting things, and it paved the way. The future JavaFX 2.0, coded in either "good old Java" or in some high-level DSL-able language like Scala or Groovy, will be a much better system because we have a very clear and concrete vision of the ideal way to do some things. It seems to me that at least part of this technology (the runtime, if not javafxc's code generation) can be reused in a new Java-compatible library they come up with now.

Not a Full Restart

I've seen some skepticism for JavaFX 2.0 plans, claiming that to be a new "full restart" so we'd be heading for another 2-3 years wait until JavaFX is again mature and stable (say, as good as JavaFX 1.3.1 or better). There are some important mistakes here. First, JavaFX 2.0's feature set is much bigger than 1.x. Had Oracle kept the previous course with JavaFX Script, the Roadmap would still be pretty good for next year: new concurrency model; footprint and startup improvements; GA and default Prism; texture paint (thanks!); much expanded CSS support (animations, grid layout); next-gen media stack; WebView (back from the ashes of JWebPane?); HTML DOM and extra browser interop; and a new batch of controls - complex, critical ones like TableView.

Granted, a few items here (Prism, new controls) were originally supposed to GA in 1.4, later this year. The Design Tool was also supposed to hit public beta now. But the major disruption of the new plan is the "black hole" of the next few months, at least until the EA or Beta ships; any code I can write today will have to be ported when 2.0 ships.  JavaFX Script will be maintained for some time, but eventually all code that uses the JavaFX Script language and/or the existing APIs will be legacy code (yeah, I see the smiles of Swing fans for the irony).

JavaFX 2.0 is not a full rewrite. The bulk of JavaFX is already written in Java, C/C++ (the runtime has significant native code), or shader language. Only the higher-level public APIs, like UI controls, are written in JavaFX Script (and I expect even these to rely on some Java when it gets though). Oracle will have to rewrite some parts of the runtime, but even that will be partially a port - I don't think Jonathan will ignore all existing code base and design, and write the new Button control from zero, laboriously coming up again with the same rendering and layout algorithms, etc.

Oracle won't start coding all the new stuff next week, when the team resumes from the JavaOne break. It seems to me that they have been working on the new plan for some time now. You can see in the JIRA that the JFXC project looks dead since June when v1.3.1 development finished; 1.3.2 work didn't even start. Oracle has already presented some demos of what JavaFX 2.0 will be, so the new runtime may already have a few months of work behind it - a rough early alpha, at least.  Two big-ticket features, the advanced media stack and HTML5 engine, will mostly integrate/wrap third-party projects; and while still not trivial tasks, it seems that work has started quite some time ago in both fronts.

Finally, JavaFX is increasingly less dependent on any programming language. Releases 1.2 and 1.3 started a big push to move as much content as possible to the FXD format, and to web-happy CSS stylesheets. The roadmap for 2.0 further the trend: animation and layout will be scripted by CSS (using standard CSS specs when available). I didn't hear anything about the FXD format, but I think it will also be maintained and improved. So in the end, we don't really lose much of JavaFX Script's nice declarative syntax. Support for these components - from the CSS and FXD parsers / internal metamodel (already implemented in Java), to the NetBeans plugin and Production Suite (already implemented in Java or native code) - should be unaffected by the transition away from JavaFX Script.

Mark Reinhold announced today that the JDK 7 / JavaSE 7 project has slipped once again: mid-2011 without Jigsaws and Lambdas, late 2012 for JavaSE 8 with those. The delay (or some other bad news like dropping features) was already expected by anyone who tracks the project. But really, how big and bad is this delay?

As a big enthusiast of both Jigsaw and Lambdas - and as a tech writer who just published two massive articles on Java 7 & JavaSE 7 (in the Brazilian Java Magazine) - I was initially... very unsatisfied, to be polite. But doing a reality check, the slip is neither as big, nor as bad as it seems at first sight.

The reason of course, is JDK 6. I have continuously tracking the "post-6uN" releases, where Sun/Oracle continues to push the envelope, delivering as much improvements as they can without breaking their own TCK. See my recent coverage of 6u21 for example. I'm already testing the first build of 6u23, that (besides a bunch of Swing fixes) carries another massive VM update, now to the bleeding-edge HotSpot 19 (the very latest one from JDK 7 at the moment). This includes such high-profile JDK 7 features as the latest G1 collector, the complete VM support for JSR-292, and other items like CompressedOops with 64 Gb heaps, CMS fixes and tons of smaller VM/runtime fixes and improvements.

Version numbers are a somewhat arbitrary thing. Sun changed the J2SEJavaSE versioning schema a few times, always to the dislike of half the planet. Some people think that 6u10 should have been called 6.1. In this case, I'd rebrand 6u14->6.2, 6u18->6.3, 6u21->6.4 and 6u23->6.5. Maybe we would feel better: "Yeah JDK 7 delayed, that sucks... but at least we're getting 6.5!"

The restriction of no changes in the language syntax or public APIs puts some limits on the features that can be delivered with the (so-called) maintenance updates. But these restrictions are not impossible to circumvent; they are just cumbersome. For example, you may wonder how good is having the VM back-end for JSR-292, if the front-end (the java.dyn APIs, some bits of syntax sugar) cannot be available. It will probably be good enough for the projects that need this the most: dynamic languages that target the JVM, such as JRuby, Groovy and Jython. I suppose that a future update of the JSR-292 backport will be able to make use of the new back-end, perhaps by adding an extension jar with the missing APIs to the bootclasspath. Even the new invokedynamic bytecode, that needs a new classfile version, may not be a big problem: the backport solution might use tricks with "magic" internal APIs that the VM would replace to emit a dynamic invoke; or the VM launcher could have some -XX:+InvokeDynamic option. Both carrying all necessary disclaimers as unsupported, VM-specific extensions. Nothing new here, the JDK already contains literally hundreds of private, magic APIs and options. And the argument of "only runs on Oracle JDK" is much less important than before... which brings my next subject: OpenJDK.

Who cares if something works "only" in the Oracle JDK? Any such feature will also work on OpenJDK, that is free software and (thanks to community projects like IcedTea, Zero and Shark) runs on even more platforms than the official Oracle JDK. (The entire OpenJDK project was a great move by Sun as it basically subsumed the older Free Java projects - these days, nobody cares to test Java apps on Classpath or GCJ.)  I suspect that this thinking will do a lot to promote the Oracle/Open JDKs as an even stronger de-facto standard than before. Another important JavaSE implementation, the JRockit VM, is owned by the same company, and will eventually be merged with HotSpot. So in the end there are only two relevant JDKs: Oracle's and IBM's. All other production-quality JDKs, such as Apple's or HP's, are mostly ports of Oracle's code, so they tend to follow its steps including most extensions. The notable exception is the Excelsior JET static compiler, but they have a smaller market share, and their business depends on having the best possible compatibility with the standard (read: Oracle) JDK.

Additionally, features like JSR-292 are not really vendor-specific extensions. They are just standard features... from the next platform release. I don't see a reason why IBM wouldn't want to make a similar move, and backport their own JSR-292 support from the ongoing IBM JDK 7 project to some update of IBM JDK 6. (Just keep the feature disabled by default, so ultra-conservative WebSphere admins won't freak out.) Last time I checked IBM was making some noise in dynamic languages support, so that would even be a business necessity. Maybe a future release of projects like JRuby would need different extension jars and launch scripts for Oracle/Open JDK and IBM JDK, and that would be all.

Programmers love new syntax and new APIs, and it's much neater when these are included in the core platform: less stuff to install, less worries about support, toolchain or portability. But truth be said, these conveniences are not deal-breakers. New APIs have been traditionally available from multiple sources, from open source projects to the "upper" JavaEE specs; and an increasing number of Java (Platform) programmers are now comfortable with alternative languages and compilers, from Scala and Clojure to Oracle's own JavaFX Script.

My last comment for Oracle: Please include the Tiered Compiler in 6u23, and I will definitely be completely fine with the slip of JDK 7. Quality first, it's done when it's ready, and all that. ;-)

Now that JDK 6u21, JavaFX 1.3.1 and NetBeans 6.9.1 are all finally released, I'm back to checking the latest news and improvements in JavaFX. The official Release Notes points to the deployment improvements as the single new end-user feature, so I've checked the latest improvements in this area.

The really major feature of this release is for developers: debugging and profiling will now, well, work as expected. With my excuses to the javafxc team that worked a lot to make this happen, it's just not exciting, headline-worthy material... still I have some comments about the compiler update in the end.

It's the Deployment, Stupid!

JavaFX's feature set is decent at least since 1.3, although more is needed / is coming (TableView, etc.). But even by 1.2, deployment was already perceived as the biggest problem - for any Client Java, including Swing and runners-ups like Apache Pivot. A recent blog from Max Katz summarizes this well, section The Ugly stuff. Java's deployment has been ugly for so long that it takes some faith to believe it can be fixed.

Faith has been rewarded, though, even if not with miraculous speed. The JDK team continues the "6uN" project, delivering incremental client-side improvements in a roughly half-yearly pace. The latest such release is 6u21, which brings these significant enhancements:

  • Java HotSpot v17: Not client-specific, but extra VM performance never hurts! More fresh goodies backported from the JDK 7 project, including memory management improvements (better CMS and G1 GCs, Escape Analysis, 64-bit CompressedOops, code cache changes to reduce the risk of PermGen blowups).
  • Support for Custom Loading Progress Indicators: Used to great effect by JavaFX 1.3.1.
  • The usual batch of bugfixes: Java2D, AWT, Swing, plugin2 and other components. Remarkably for Java2D, a fair number of font-related fixes, including rendering quality enhancements and optimizations.

For JavaFX, 1.3.1 is a maintenance release, but it boasts one significant new feature, the new Application Startup Experience including a cool new progress indicator and more fine touches. Let's check Max's comments:

  1. Browser freezing for cold-startup of Applets: Problem persists... why? How difficult is to initialize a plugin in the background, not holding any browser thread / lock / whatever until fully initialized? There are excuses for the JVM's loading time and resource usage (big, 15-year-old platform...); but the Browser Freeze From Hell is hard to accept, remarkably after the plugin's 6u10 full rewrite.
  3. The dreadful animated Java logo, that didn't report real loading progress: Gone! Point scored.
  5. Scary security dialog: That's slowly improving. The main thing is avoid nagging for apps that shouldn't have security concerns. JDK 6u21 fixes one bug involving drag&drop and security. Other recent JDK builds have tweaked more details (but, too bad that 6u18's removal of mandatory codebase didn't work, reverted in 6u20 - temporarily?). You can go very far with the current FX platform without signing or permissions; and when you need that, a "scary" one-time dialog is the right thing to do. A RIA runtime should not be a big backdoor. Notice that unneeded security warnings are sometimes due to poor application programming / packaging.
  7. Error reporting: This is still an issue. The Java Console is great for developers, but a fiasco for end-users. It should be disabled by default, and a new, user-friendly mechanism is needed to report any failures. That must be native code, so it works even if something like JAWS fails.

Some extra items that come to my mind:

  1. Installation: Also slowly improving. JavaFX 1.3 modularized its core runtime so most apps won't need to fetch all of it from the web (or even from the plugin cache) if they don't use certain feature (like the javafx.fxd and javafx.scene.chart packages). JavaFX 1.3.1 further hides the annoyance of first-time FX execution behind the progress indicator.
  3. EULA: Argh, 1.3.1 still shows an atrocious EULA dialog when you run the very-first FX app. But that's probably hopeless.
  5. Redundant Plugin: Anybody noticed that all modern browsers have an OOPP (out-of-process plugin) architecture? Java 6u10 created its own out-of-process layer (jpl2launcher), but this is not required for the latest browsers. When I run a Java applet in (say) Firefox 3.6.4+, I get this Java plugin process and also Firefox's own plugin-container process. That's stupid, one extra process (and I guess one extra level of IPC indirection, security barriers...), to achieve nothing. The plugin should detect these new browsers and then just run "in-place" (that is, isolated in the plugin container process).
  7. Plugin/Browser Artifacts: I only see refresh and scrolling artifacts, for Java Applets, on Firefox 4, and only with the new Windows rendering acceleration (Direct2D / DirectWrite / Layers) turned on. FF4 is still beta, with many known bugs on the new accelerated pipeline, so that's hopefully just a transient browser issue.
  9. JNLP files: WebStart's launch experience is polluted by these files, that are dumped into some temp directory and appear as regular downloads in your browser's download manager or statusbar. If you're unlucky enough, you'll even get a download/run confirmation dialog. The reliance on .jnlp's MIME type is fragile, both in the browser and in the server. The extra HTTP request may be noticeable for small applets over slow connections. The plugin and JAWS should create some alternative mechanism. Like an <object> tag containing all JNLP metadata - hey, that works for both the old-and-busted Flash and the new-and-cool Silverlight, so I assume that <object> is just great: so, anybody care to explain why was the external JNLP file invented in the first place?

In my account, that's 2 items fixed (2, 8) and 2 items improved (3, 5) out of 9 issues. That's not bad, but Oracle definitely needs to keep working hard and fast, remarkably on item 1 that is very often the single big offender in "Java-powered" webpages.

The Undocumented Bits

JavaFX 1.3.1 published some important new documentation: the complete CSS Reference Guide, and a good FXD Specification. Keep 'em coming!  We're still missing some important docs:

  • Updated, complete, high-quality JavaFX Script Language Reference and formal spec;
  • Javadocs for the Preview controls This is here, missed it somehow!!
  • Any official information about the Prism toolkit;
  • Any official "internal" information, e.g. for the various system properties that can be used for tuning and diagnostics.

I understand that all these items are in the "under construction" category - even the JavaFX Script language is still a moving target, although not as fast-moving as in the past. But some docs help a lot the early adopters, enthusiasts and betapreview testers.

Testing the New Deployment

I exercised 1.3.1 by updating all the JavaFX applets and JAWS apps in my blogs: JavaFX Balls, StrangeAttractor and Game of Life. I performed some startup tests - warm, cold, and "freezing-cold" (cleaning my plugin cache to force reload of the JavaFX runtime). The combination of the latest Java and JavaFX runtimes, and the updated JNLP files to request the new progress indicator, have definitely improved all scenarios of startup. So, this feature worked as advertised, pretty good! Definitely not yet in the Flash league of startup experience and speed, but definitely another significant step forward.

Now, what about real-world, complex JavaFX apps? The samples are updated, but these are smallish. So I went back to the Vancouver Olympics applet, the big showcase of JavaFX 1.2. Unfortunately it's not updated; not even to 1.3.0, it loaded the crappy old JavaFX 1.2.3 runtime with the now-ancient-and-crude startup experience. Yeah I know that these Winter Games are long over, but their site was apparently a partner of Oracle to promote JavaFX, so I'd expect them to keep it updated to the latest JavaFX release. On the plus side, Sten Anderson's great Music Explorer FX was updated; try it. Maybe Oracle's marketing dept is just pouring cash in the wrong pockets! ;-)

Testing the New Compiler

I've also rebuilt all jars, but that's just for my second test: Checking if the updated javafxc compiler had any improvement or regression. As I blogged before, javafxc 1.3 was a step forward in performance (remarkably the compiled-bind optimization), but a step backward in code size; but many further optimizations are planned, including several ones to reduce current space/speed tradeoffs. 1.3.1's major theme for javafxc is the JDI support; its list of fixed bugs doesn't seem to contain any code generation improvements (the next batch seems to be planned for 1.3.2). But... who knows? Also, I was worried if the big debugging work could mean a regression in compiled code size: maybe the new compiler would generate additional debug information, annotations or helper code.

So, I've repeated my simple Static Footprint benchmark. The numbers for 1.3.0 are slightly different due to a few small updates in my programs. I've also added a pack.gz metric that shows the smallest possible deployment size, just for the classes + META-INF data (removing any resources, such as images, from the source jar). Finally, the stripped metric is for a .pack.gz that's additionally stripped of any debug info (with the --strip-debug option; the pack200 tool won't do that by default!).

ProgramJavaFX 1.3.0JavaFX 1.3.1
HelloWorld2 classes, 2.579 bytes
            pack.gz: 954 bytes
            stripped: 782 bytes
2 classes, 2.731 bytes (+5,8%)
            pack.gz: 1.024 bytes (+7,3%)
            stripped: 876 bytes (+12,0%)
JavaFX Balls20 classes, 118.559 bytes
            pack.gz: 18.787 bytes
            stripped: 13.828 bytes
20 classes, 116.539 bytes (-1,5%)
            pack.gz: 17.935 bytes (-4,5%)
            stripped: 13.852 bytes (+0,1%)
Strange Attractor57 classes, 393.332 bytes
            pack.gz: 21.119 bytes
            stripped: 13.040 bytes
57 classes, 378.641 bytes (-3,7%)
            pack.gz: 20.038 bytes (-5,1%)
            stripped: 13.167 bytes (-0,1%)
Interesting Photos46 classes, 457.561 bytes
            pack.gz: 49.587 bytes
            stripped: 37.842 bytes
46 classes, 438.619 bytes (-4,1%)
            pack.gz: 46.073 bytes (-7,1%)
            stripped: 37.845 bytes (0%)
GUIMark27 classes, 224.904 bytes
            pack.gz: 17.224 bytes
            stripped: 12.955 bytes
31 classes, 217.390 bytes (-3,3%)
            pack.gz: 16.552 bytes (-4,0%)
            stripped: 12.989 bytes (-0,2%)

The first results look surprising: all programs (except the unrealistically-small HelloWorld) show improved code size, up to 4,1% smaller without pack200 compression, and up to 7,1% smaller with Pack200. These would be excellent numbers for a maintenance update that's not supposed to contain any code-size optimization! But checking the stripped numbers shows virtually identical sizes. The conclusion is that all differences are very likely just side effect of the debugging support changes. The updated javafxc is smarter, producing debug info that's both smaller and better.

Of course, for stripped bytecode there is no advantage at all. But the advantage of non-stripped files is still important, because Java developers very rarely strip debug info. (The NetBeans project settings page doesn't even offer an easy checkbox for pack200's --strip-debug option; I'd bet that many Java developers don't even know that such option exist.) Besides that, the fact that the maintenance for debugging support didn't cause any regression in code size, is another good news.

Missing Deployments

The new SDK contains a new /runtime directory with the redistributable Desktop runtime. But it's not clear if we actually have the right to redistribute these files, and under which conditions - I didn't find a redistribution license. This option is very important for some people; we need some enlightenment about this.

The absence of a redistributable JavaFX Mobile package is quite remarkable. The mobile runtime was updated in both 1.3 and 1.3.1 cycles, it just wasn't released to the public, so the only version of JavaFX Mobile that you can actually install in a real handset is the now-Jurassic v1.2. JavaFX's mobile plans are stuck for non-technical reasons, as Oracle probably works on its strategy; the M.I.A. JavaStore may also be part of the same imbroglio. The JavaFX TV runtime is not available either, although its status doesn't seem so bleak (it's not late to the race; it didn't have a faux pre-launch like JavaFX Mobile and Java Store; and it depends on Prism and other components so its non-shipping status may be just for the reason of not being ready).

Well, that was the speculation we already did by 1.3's launch. Now with Oracle's moves against Android, we may just be watching the beginning of the next chapter. I have already posted some thoughts on a specific, technical part of this debate; but I'm holding my breath for the final consequences for everybody - Java and Android developers. In my dreams, my next smartphone would be a 'droid that could also run JavaFX programs... let's see how all this works out. Hopefully we only have to wait another month, as Larry Ellison and Thomas Kurian will spill the beans about Java Strategy and Directions. We need directions indeed, they can't come soon enough.

I was doing some JavaFX hacking, and I had to create a sequence initially full of zeros. How can you do that? There's apparently only one way:

var bits = for (i in [1..64]) (0 as Long);

Problems: First, I need a loop - OK, a comprehension - to initialize the sequence. There is no syntax, no API helper or type constructor, that directly expresses "Long[] with N elements". I could use a literal like [0, 0, 0, 0, ...], but this doesn't scale to large sizes.

Second, I have to write the (0 as Long), because JavaFX Script doesn't support Java's type suffixes like 0L for zero-as-Long. JavaFX Script drops many complexities from Java; but dropping the numeric type suffixes looks like a wrong move. I mean, it's not like Long numbers are some niche feature. And 0 as Long is butt-ugly.

JavaFX Script should try being as close to Java as reasonably possible, given their different design criteria. I'm OK with big diffs like no generic types (don't fit in FX's complexity budget), different attribute and method/function declarations (required by FX-specific features), and most other changes. But numeric literals is something I'd expect FX to just clone from Java. It's also missing Java 5's hexadecimal FP, and will soon miss Java 7's underscores and binary base (I'd vote to add all these in FX 1.4). These are compile-time features, no cost of any kind. You don't need it, you don't use it -- no impact on APIs, no interaction with other language features. And Java's rich numeric syntax could also be useful in the FXD spec.

Sequence optimizations are not powerful enough here. For the alternative code var bits:Long = for (i in [1..64]) 0, the compiler will create a sequence of type Int[], then in the assignment to bits it will invoke a helper method that converts that to a new Long[] sequence. This could be fixed by new optimizations: in code like xx = for (...) y, where xx is a sequence of x and y needs conversion to x, the compiler could first perform a high-level rewrite to xx = for (...) (y as x), avoiding the cost of allocating a temporary sequence yy only to immediately need a copy-with-conversion to xx. Notice that the per-element conversion (y as x) is often a zero-cost operation, like in our example of Integer->Long

LongArraySequence jfx$177sb = new LongArraySequence();
int i$upper = 64;
for (int i$ind = 1;; i$ind <= 64; i$ind++) {
    int i = i$ind;
    long jfx$178tmp = 0L;
$bits2 = (Sequence)Sequences.incrementSharing(jfx$177sb);

The decompiled code above shows that there are no optimizations for initial capacity. The internal LongArraySequence class contains the necessary constructors that take an initial size as argument; but this is probably only used internally, by runtime code written in Java - the javafxc compiler has no intelligence yet to use it.

There are two ways to fix the performance of this code:

1) (The Wrong Way) Adding special syntax to allocate a sequence of fixed size, e.g. new Long[64] or just Long[64].

This is the "wrong way" because we're thinking in Java, not in JavaFX Script. First, generator syntax is already good and terse enough. Second, the proposed syntax makes more sense to work with mutable sequences, and this is not the paradigm that JavaFX Script's sequences are pushing. Sequences are immutable (more exactly "persistent", to use a Clojure term); only sequence variables are mutable, each mutation creates a new sequence. This is expensive even though javafxc does some optimizations to minimize the churn.

Having said that, sometimes we need mutable sequence variables, and the costs can be low if we program carefully, and if the compiler does its part - that's what I am missing here. And there isn't really a better way (except "impure" ways, like using nativearrays or Java classes).

2) Adding the necessary optimizations, so a for that produces a sequence filled with a single value will do it as fast as possible.

Or, just make the for generator efficient. First, preallocate the sequence whenever the size can be detected. Second, bulk-fill the sequence [slice] by invoking Arrays.fill().

The latter optimization is very interesting to discuss:

- It's only valid if the body of the for loop is either a compile-time constant, or a functional-pure expression that also doesn't depend on variables changed inside the loop. The javafxc compiler already does simple pureness detection (for 1.3+'s binding), and I hope that will keep improving because this makes many new optimizations possible and even easy.

- Arrays.fill() is JavaSE-only, so this optimization wouldn't be supported when compiling with -profile mobile. But this can be worked around with some runtime-lib stubs.

Without these enhancements, programmers are temped to drop to native arrays (ugly), and invoke Arrays.fill() manually (non-portable). We can imagine a new API with methods to fill a sequence [slice] with a single value, and other bulk-ops like copy. But this is not the "JavaFX Way". Like said before, you are supposed to use for-comprehensions. It's certainly cleaner than something like "bits = new Long[size]; Sequences.fill(bits, 0, sizeof bits, value)" - yuck!!

Yes there's already a javafx.util.Sequences API with functions like sort() and binarySearch(), but these are complex enough to deserve APIs... at least in the current language. Simpler things like Sequences.max() would disappear in a language with some extra functional programming tricks, e.g. max = reduce(seq, >, Long.MIN_VALUE). [You can do that today, but not with enough clean code or enough performance.]

javafxc's next steps?

For the JavaFX Script Compiler project, 1.3 was the Compiled Bind Release, and 1.3.1 is the Debugging Release. 1.3.2 will apparently be mostly a maintenance release fixing low-priority JDI and binding bugs, and a few optimizations - at least JFXC-4388: Iterating over sequences created by bound for loops is very slow is planned and very important (this actually seems to be a closure optimization, not a sequence optimization). Next feature release, 1.4 (Presidio), will address many binding optimizations that slipped the 1.3 deadline - remarkably code-bloat fixes to reduce the speed X size tradeoff from the initial Compiled Bind. I see plenty of sequence-related fixes too, but not (yet) significant sequence optimizations.

There is a bug JFXC-1964: Umbrella: Sequence optimizations, but this bug (now in "After-Soma" limbo) was from the JavaFX 1.0-1.2 releases. It enrolls 27 bugs, 24 of which are fixed. Most of the described optimizations seem to be "fundamental" things (e.g. flattening optimizations), or low-handing fruit things like looping with straight indexing instead of Iterators. There's no [visible] plans, yet, to enable a good set of higher-level sequence optimizations. Given the importance of sequences in JavaFX Script, I hope this won't take too long :-) it seems to me to be the next logical step to enhance the implementation of the current language. There are other open avenues, like closure optimizations (but I guess these will have to wait for JDK 7... JavaFX will eventually have to sync with Java's lambdas/closures, both for interop sake and also to benefit from new VM support).

A high-level language like JavaFX gives the source translator vast opportunity to enhance code performance, without imposing any cost to the runtime (library size, warm-up time, memory usage or anything). This is very different from the tradition of the Java language, which syntax is very close to the "machine", so javac is purposefully a non-optimizing translator: the responsibility for optimization is fully in the shoulders of the runtime. A design that doesn't always win, because new language features are often introduced without sufficient new support from the bytecode/VM. Or because the weight on the runtime's shoulders has long become an important problem - remarkably for client-side or mobile apps, needing fast startup and low resource usage.

Before Java, I was a C++ programmer and I appreciated that the compiler would do a massive optimization effort. Long build times could be avoided with a mountain of hacking (precompiled headers, incremental linking...) and even if they are big, that's a good tradeoff for the very best application performance. Java changed this upside-down by moving all optimization to the runtime; and this has advantages like portability, dynamic features, and dynamic optimizations that often beat C/C++. But Java's move was maybe too radical. Over the last few years and releases, we've been slowly compensating this with some efforts to move overhead to the compile- or installation-time: the Class Data Sharing (CDS); the bytecode preverification (created for J2ME and adopted by JavaSE 1.6); the JIT caching of some JVMs; hybrid AOT+JIT VMs like JET. New languages like Scala, JavaFX Script, Clojure and JRuby seem to be yet another evolutionary step, as they offload more optimization responsibility back to the source compiler. JavaFX is a layer atop Java and it can't fix problems like long warm-up and poor memory sharing of JVM processes; but it can (and must) not add any extra runtime load.

I've finished the development of my Game of Life, with a couple final fixes and new features... including a solution to the bad performance reported before. Once again the work has uncovered some surprises; read on.

Un-Scripting JavaFX Script

The first version used a "scriptish" style, all code thrown in a single .fx file, only average effort in structure. Now I have three files: World.fx with the World class (data model and Life algorithms); IO.fx with new support for loading patterns; and Main.fx with the UI. This refactoring required declaring some classes, functions or properties to public[-read|-init]. Some extra noise, but the Java veteran inside me feels much warmer and fuzzier with encapsulated code. I still appreciate though, the facility to bang prototype code without thinking about such issues.

I'm a bit annoyed with the absence of private visibility, but arguably that's unnecessary: if you have global functions/variables or multiple classes in the script, you are likely in the prototype stage and won't bother with encapsulation. On the other hand, I'm worried that the javafxc output uses public visibility for all source features, losing VM-level enforcement of visibility.  The bytecode contains some annotations like @ScriptPrivate but these serve only to the compiler, they are ignored by the VM's classloading and verification. You cannot trust JavaFX's visibilities for security purposes. A more important impact, perhaps, is that bytecode optimization/obfuscation tools can't take full advantage of restricted visibility for closed-world analyses.

Some I/O and Parsing

Several people complained that it's too much work setting Game of Life (GOL) patterns manually, one click per cell. The Internet is literally infested with GOL resources. (Indeed, the web can be divided in four major groups: Game of Life; Fractals; Retrocomputing; and Boring sites. Thanks to me, java.net is just moving out of the Boring category.) There are many popular GOL programs for all computers since the ENIAC, and they have developed a few standard file formats, the most popular being LIF and RLE (each with a couple variants...). The LIF (Life 1.06) format is braindead simple, it can be parsed with very modest code:

function parseLIF (text:String):Point2D[] {
    for (cell in text.split("\n") where indexof cell > 0) {
        def xy = cell.split(" ");
        Point2D { x: Integer.valueOf(xy[0]) y: Integer.valueOf(xy[1]) }

My parsing functions uses JavaFX's Point2D as a cell coordinate; the output is a sequence of such coordinates for all "live" cells in the pattern. I can use Java's string manipulation facilities including regular expressions, so the job is pretty easy. JavaFX's sequences and generator contribute again for minimal coding.

Problem: I've used String.split(), not available on the JavaFX Mobile platform. The compiler will catch this only if I reconfigure the project for JavaFX Mobile.

RFE for the people writing IDE plugins: allow me to create a project of type "JavaFX library", that I can configure to any of the JavaFX profiles including common, so the compiler will allow me to use only the strict set of APIs (from both JavaFX and the underlying Java runtime) that are guaranteed to exist in the selected profile. Notice that the mobile profile is a proper subset of desktop but only for the JavaFX APIs; for the underling Java APIs this is not true, I cannot use mobile as a G.C.D. configuration for code that should run in any profile, because this would allow the project to use JavaME-specific APIs that are not available in JavaFX Desktop (even CLDC alone, includes at least javafx.microedition.io - the base GCF package).

This remembers me that the Generic Connection Framework is a great API that should really be available on JavaSE. That was the plan of JSR-197, but unfortunately this idea never took off. JavaFX lacks a full-blown I/O API; javafx.io is a good start as an "80/20 rule" API for higher-level needs, but many complex programs will be tied to tons of java.io / java.nio / java.net / ..., or equivalent JavaME APIs. Except that they wouldn't, if the GCF was an official part of JavaSE. Perhaps Oracle should promote this idea - add the JSR-197 jars to the JavaFX runtime as an extension package (i.e. a separate jar, only downloaded or loaded by apps that need it). But inclusion in JavaSE would be much better, perhaps also solving that platform's embarrassing deficiency of standard support for some kinds of I/O (yes I know about JavaComm, which is another part of the problem, not the solution.)

But the LIF format is very dumb (bloated files); what you really want is the RLE format:

function parseRLE (text:String):Point2D[] {
    def lines = for (line in text.split("\n") where not line.startsWith('#')) line;
    def header = for (l in lines[0].split(", ")) l.substring(l.lastIndexOf('=') + 1).trim();
    def x = Integer.valueOf(header[0]);
    def y = Integer.valueOf(header[1]);
    var currX = 0;
    var currY = 0;
    def run = new StringBuffer();
    for (line in lines where indexof line > 0) {
        for (i in [0 ..< line.length()]) {
            def c = line.charAt(i);
            if (Character.isDigit(c)) {
            } else {
                def len = if (run.length() == 0) 1 else Integer.valueOf(run.toString());
                for (l in [1..len]) {
                    if (c == '$'.charAt(0) or currX == x) {
                        currX = 0;
                    if (c == 'b'.charAt(0) or c == 'o'.charAt(0)) {
                        def cell = if (c == 'o'.charAt(0)) Point2D { x: currX y: currY } else null;
                    } else null

The big, outer for loop that contains most of parseRLE() will produce (and return) a Point2D[] (the return type declaration is optional, the compiler could infer it). The entire state machine that parses the RLE format is inside this for. Each step through the state machine will either deliver a Point2D value that is appended to the return sequence (actually, to a sub-sequence that is eventually flattened into the return), or a null value that is ignored (JavaFX Script's sequences cannot contain null; inserting null is a no-op). It's a nice example that justifies both the auto-flattening and the restriction of nulls. These features let me code parseRLE() in a quasi-functional style, without any ugly explicit sequence mutation. The only explicit variables are the locals currX and currY, part of the state of my state machine. The remaining state is the iteration variables line, i and l, but these are all "managed" - JavaFX Script fixes Java's mistakes by not allowing user modification of loop control variables, and also function parameters. This makes the for construct functional, unless you throw extra variables and assignments.

This is the popular Glider pattern in RLE format:

# The Glider
x = 3, y = 3

The most remarkable thing in parseRLE() is the ugly handling of characters, e.g. if (c == '$'.charAt(0)). JavaFX Script doesn't have a first-class character type, a common trait of scripting languages. The problem is, JavaFX Script does not "box" chars - coming from non-FX APIs like String.charAt() - into strings of length 1. These chars remain with the Character type. But the language doesn't have a character literal syntax; '$' is a string and not a character. Writing if (c == '$') will grant you a compiler error about incomparable Character and String types.

RFE: Either add a character literal syntax, or promote chars to strings (but with the necessary unboxing optimizations, to keep the efficiency of a simple char wherever possible).

The problem is bigger than his, however; even strings are a second-class type in JavaFX Script. It seems to me that strings should be handled as a special kind of sequence, which elements are characters (or 1-char strings). I want to iterate a string with for (c in line); I want to get a substring with slicing syntax like line[5..<10]. Today you can declare a variable with the sequence type Character[], that is even optimized internally with a special-cased sequence class CharArraySequence; but that is completely unrelated to the String type.

First-class support for strings could be added as compiler sugar. The same old good, efficient and interoperable java.lang.String class could be used to store string data, without any extra wrapper; but the compiler would overload the syntaxes of sequences and for to handle strings. As a simple example, the code:

noSpaces = for (c in line where not Character.isSpace(c)) c;

could be de-sugared into this (Java) code:

StringBuilder noSpaces$sb = new StringBuilder();
for (int c$index = 0; c$index < line.length(); ++c$index) {
    char c = line.char(c$index);
    if (!Character.isSpace(c)) {
String noSpaces = noSpaces$sb.toString();

Now I know that this is easier said than done, because it's not just throwing a handful of special-case translations. The right way to do this requires that strings and sequences are "normalized" to a sufficiently homogeneous AST, so the code generation is able to implement either common or separate handling as necessary, for every combination of strings vs. other kinds of sequences, as well as with other language features.

The language already performs some custom handling of strings, for interpolation with {}. A great start, but we need more :) besides sequences integration, first-class (and portable) regex support would be another hit. This obvious RFE is already filed as JFXC-2757: JavaFX Script should support regex literals, and as the comments explain, it's not that as easy as in other languages that have this feature because there are interactions with binding and triggers. (But this means also, that first-class regex would be more powerful than in other languages.)

Reading from the Web

I won't embed Life patterns in the program; it will fetch these from the web. The site conwaylife.com contains many patterns, well organized and available in stable URLs and in several formats. The front page is also a great Life Java applet, a surprise for me because it loads very fast and smooth. When I wrote the original Life blog & program, I didn't find this superior Java GOL (but that's a very complex, optimized implementation - the Game of Life (and cellular automata in general) allows some crazy optimizations - not adequate to my purposes).

public class LifeRequest extends HttpRequest {
    public-read var result:Point2D[];
    override var onInput = function (is) {
        try {
            def sb = new StringBuffer(is.available());
            while (is.available() > 0) sb.append(is.read() as Character);
            result = parseRLE(sb.toString());
        } finally {
            try { is.close() } catch (e:IOException) {}

Class LifeRequest makes an HTTP request to an URL that contains a Life pattern in RLE format, then reads the input stream and parses it. Yeah the code that consumes the stream is stupid (one byte at a time). But it seems the underlying HTTP stream - for the record, a FX-specific com.sun.javafx.io.http.impl.WaitingInputStream - is buffered; I didn't notice any performance impact reading large patterns. Once again I wish we could have some extra string power, or perhaps higher-level I/O APIs. I cannot use methods like read(byte[]) because I don't want to write a Java class just to allocate a nativearray. And I don't want either, to rely in additional JavaSE-only APIs like BufferedReader; even with that wouldn't help a lot - I'd still a loop, invoking readLine() for each line and using a StringBuilder. What I really need is an API that "slurps" the whole stream into a string. Or perhaps something more JavaFX-style like being able to create a "view sequence" of several component types (think java.io buffers); this would probably need the language to offer forward-only sequences that support sequential iteration but not random access (but this opens yet another big avenue of new language designs... let's skip that).

def patterns = [
    "b52bomber", "B-52 bomber",
    "blinkerpuffer1", "Blinker puffer",

Follows a static list of the Popular Patterns offered by the site mentioned above. This is a simple list of key/value pairs, where the key is part of the URL that will fetch the data. Except of course, that this is a flat sequence. Now I can plug once again my favorite RFE: I need a native map data type. :-)

/*** Begin Digression: How "complete" should JavaFX be?

At this point, I hear some people screaming - "just use Java!!" for these things that JavaFX is not yet ideally suited, like nontrivial string manipulation or I/O. And not pile layers of new RFEs demanding the language to become more powerfulcomplex and the javafx.* APIs more completebloated.

In fact, I often don't even need to drop to Java code, I can just use Java APIs directly, in the Java way (without insisting in support for sequences and other JavaFX features) but all inside normal JavaFX Script functions. That would be uglier JavaFX Script code, but would arguably be smoother than moving some code into a separate .java source, with a different syntax and harder integration e.g. for methods that would need to call back into JavaFX Script objects. (Only problem here is that I cannot allocate a nativearray from JavaFX Script.)

All so-called scripting / higher-level languages assume that you may have to fall back to "system" code for some tasks. That's why languages like Ruby, Python, Perl etc., have a system interface (to C language / native shared libs) that's much less torturing than JavaSE's JNI. For alternative JVM languages it's even better, the system fallback usually means calling Java classes, not C/native code. Even with issues like the SE-vs-ME fragmentation, Java is usually an order of magnitude better than C as a system-level language for carrying the load that a higher-level language cannot. (For the few exceptions, there's still JNI so you lose nothing... well, except for that JNI=torturing detail.)

The only issue of course, is where exactly to draw the line. People coming from Java may consider JavaFX already good enough. You can't build a complex app in pure JavaFX, but so what? "It's a goddamn UI DSL! Just use Java for any non-UI work." I don't see that way, I think JavaFX has great potential to be a great platform on its own.

Even if you by the DSL argument, the frontiers between application layers is blurred and dynamic... even in a well-architected front end, the UI typically shares significant code with other layers: POJOs, validation, general utilities. And you have lots of communication between these layers, e.g. querying some business Facade to populate a form. This is typically smooth when all layers share a single language and SDK, but much harder otherwise. And what happens when you change your mind or find a design mistake, and need to push a bit of code from one layer to another? Any refactoring that straddles a barrier of language/SDK will be much more difficult, certainly beyond the ability of IDEs's automatic and safe refactoring commands... Obviously, it's much more convenient being able to code the entire application in a single language/SDK. Then you fall back to the system level in a much more limited and ad-hoc manner, e.g. to optimize a performance-critical algorithm, or to better reuse a system library that doesn't have a wrapper for the higher-level language/SDK, or for legacy support, etc.

The high-level language/SDK should provide at least the reasonable basics, on all fundamental features. That RFE for a built-in map type is fundamental, because you can go very far with "only" sequences and maps, while only sequences is definitely limited (if you ignore performance, having only maps would be less limited; maps are more general). But having a very rich data structures library, like JavaSE's java.util, is not fundamental - I'd say >95% of the Collections API are just performance optimizations (or convenience algorithms/APIs e.g. Stack) over the basic list/sequence & map that most scripting languages offer as their single built-in data structures.

Notice that language-integrated data structures are very powerful; the compiler can often perform decisions such as selecting a specialized implementation of sequences or hashtables that's more efficient for a specific program usage. You don't need manual choices such as ArrayList vs. LinkedList: you trust the compiler to do that choice. Only when the compiler fails in such magic optimizations, and only when that failure is found to be a significant performance problem, you optimize it manually.

I don't want to bloat the JavaFX APIs either, but many interesting FX-specific APIs could be implemented as a thin layer over some SE/ME-specific APIs. We still need that FX layer because it makes the same features more portable, and more powerful and easier to program as the API can take advantage of features like binding, sequences and first-class functions & closures. This is again not different from other JVM languages, see for example Groovy or Scala. Both communities seem to believe that it's worth the effort and runtime size, to either wrap or replace many Java APIs like JAXP, Swing, Collections, concurrency, JDBC; or to provide full-new frameworks for critical tasks like web development. Not to mention the languages that are independent from the JVM and carry over their own completely independent set of standard libraries for everything, plus big app frameworks (e.g. Rails for JRuby).

Compared to these languages, JavaFX would need less and lighter API wrappers. The language is very close to Java (Groovy looks closer to a superset of Java; but Groovy's dynamic typing and high reliance on metaprogramming make it actually much less close than the surface syntax suggests). Different from the likes JRuby, there's no need to support any feature or library that was not designed for the JVM. Different from Clojure, there's no radical paradigm shift towards full-blown functional programming. I think we could have a nice set of "thin wrapper" APIs, with very small weight in runtime size and CPU/memory overhead, to cover a very good range of extra functionality like XML(*), I/O, concurrency, perhaps some enterprise / distribution stuff (CDI and some extra client-side support for trivial consumption of EJB / JMS / JAX-WS servers), etc. The NetBeans JavaFX Composer already has some draft of this - if you add a JDBC Data Source to your design, Composer will spit ten .fx files into your project - a thin FX API for things like RecordSet. But everybody hates these IDE-proprietary libraries. I guess that in the future, these will evolve into official JavaFX APIs, e.g. javafx.sql. The canonical example is JavaSE 6's GroupLayout, first born as a proprietary library of the NetBeans "Matisse" Swing editor.

(*) Yes JavaFX does XML, but it's a simple API with its own small parser implementation. The same is true for some other JavaFX APIs that one could imagine to be thin wrappers for Java APIs. This is actually nice for light weight (no Mb-size parser like Xerces making your applets slower to load) and portability (exact same parser implementation used in all JavaFX profiles). But some apps will need the full power of JAXP, and JavaFX could make this power available, with a friendly JavaFX wrapper, at least for the higher profiles like desktop and tv.

End Digression: How "complete" should JavaFX be? ***/

Back to the UI...

def patternCB = ChoiceBox {
    layoutInfo: LayoutInfo { width: 160 }
    items: for (p in patterns where indexof p mod 2 == 1) p

This new ChoiceBox allows me to pick one of the patterns.

onMouseClicked: function (e:MouseEvent) {
    if (e.button == MouseButton.PRIMARY and not
            (e.altDown or e.controlDown or e.shiftDown or e.metaDown)) {
        world.flip(xx, yy);
    } else {
        def req:IO.LifeRequest = IO.LifeRequest {
            location: "http://www.conwaylife.com/pattern.asp?p={
                patterns[patternCB.selectedIndex * 2]}.rle"
            onDone: function () { world.set(xx, yy, req.result) }

I've changed the existing mouse event handler: now only the left mouse button will toggle a cell. For the right button (UPDATE: or your Mac's single-button + any control key), I pick the ChoiceBox selection, do some simple arithmetic to get its "key", build a full URL, then invoke the LifeRequest. I provide a onDone handler that passes the result (as well as the closured-captured cell position) to the new World.set() function:

public function set (x:Integer, y:Integer, cells:Point2D[]):Void {
    for (cell in cells) {
        def xx = (x + cell.x) as Integer;
        def yy = (y + cell.y) as Integer;
        if (xx >= 0 and xx < SIZE and yy >= 0 and yy <= SIZE)
            this.cells[yy*SIZE + xx] = true

The latter is pretty easy. It would be half the size if I didn't have to cast Point2D's coordinates to Integer (this reuse of Point2D was questionable... but I'm lazy). Notice that the Life pattern is contained in a rectangle, and I rubber-stamp the live cells in that rectangle to the world, using the selected cell as the top-left corner.

Exercise for the reader: (or maybe I will do it later) Make the right-click-down event activate an outline rectangle with the exact width/height of the selected pattern; so at right-click-up the pattern is actually set in the world. This needs reading the pattern at right-click-down, so you know its shape... a better idea is reading it even before, when the ChoiceBox selection is set or changed; just do that in background so the UI doesn't freeze. Then the pattern loading would appear to happen instantly. In a variant of this idea, instead of a boring rectangular outline, the preloaded pattern could be overlaid (with the obvious translucency-with-radial-fade effect) on top of the live world, until you "drop" it in the desired position.


The finished program, for this version - click to launch. (If you didn't read the whole blog: use right-mouse click, or click while pressing any control key, to load the selected pattern at the cell under mouse pointer.)


The source code is now 3 files and ~200 LOC, including imports and metadata for 25 patterns. Notice that the "oscillator" patterns are also good for performance benchmarking.

The screenshot above is taken with Prism; it's noticeably different from the previous screenshot (antialiasing of the rectangle borders). I'm not sure which toolkit is "wrong" here, but most likely Prism as it is still in early access, and its output looks more "blurred".

Performance Mystery I: JavaFX Script Functions

The JavaFX team clarified to me that they don't recreate the internal scene graph nodes after property changes (like I do with Rectangle.fill); this destroys my obvious shot for the cause of bad performance. On the other hand, they found that text formatting and rendering (for my status label) was a bottleneck (at least for the simpler testes without actual Life action). Part of the problem here is bug JFXC-3483: Use of String.format for string concatenation hurts performance.

I tried now simple quick profiling with the NetBeans Profiler, and a lot of cycles go in binding (remarkably runtime methods like notifyDependents()), and in several compiler-generated methods like World$1Local$57.doit$$56(). As it turns out, javafxc is compiling some of my functions into something... different. My World.life() method, that calculates the new state of a single cell, contains an inner class 1Local$57; this class is a closure that captures all local variables from the life() method (the parameters x and y, the local count, and the receiver this). In short, the entire content of the life() function is wrapped as a closure. This is the (decompiled) code generated for the "do it" method of the closure. (The mangled names and synthetic methods should disappear in JavaFX 1.3.1, thanks to JDI support - at least in the debugger and profiler, but not in decompiled bytecode.)

public boolean doit$$56() {
    _cls57 receiver$ = this;
    VFLG$Local$57$count = (short)(VFLG$Local$57$count & 0xffffffc7 | 8);
    _cls57 _tmp = this;
    int yy$ind = Math.max(y - 1, 0);
    for(int yy$upper = Math.min(y + 1, get$SIZE() - 1); yy$ind <= yy$upper; yy$ind++) {
        int yy = yy$ind;
        int xx$ind = Math.max(x - 1, 0);
        for(int xx$upper = Math.min(x + 1, get$SIZE() - 1); xx$ind <= xx$upper; xx$ind++) {
            int xx = xx$ind;
            if(elem$World$cells(yy * get$SIZE() + xx))
                $Local$57$count = get$Local$57$count() + 1;
    return get$Local$57$count() == 3 || get$Local$57$count() == 2 &&
        ((Boolean)isLive$bFunc$int__int(FXConstant.make(Integer.valueOf(x)), 0,
        FXConstant.make(Integer.valueOf(y)), 0).get()).booleanValue();

This code is pretty good... except for all the closure overhead. The closure class contains several other methods, and invocations to life() must go through all this baggage including allocation of the closure, extra indirection for locals lifted to the heap, and full binding support for locals (!). This overhead is not related to the first-class status of JavaFX Script's functions (a different, very efficient mechanism is used to wrap functions into values).

The life() method finishes invoking another function, isLive(), which is compiled with even extra weird stuff (name mangling, different calling convention) that's due to being a bound function.

And it gets worse: if I add to life() a conditional return statement before that function's end, this return is compiled as a closure's non-local return. That means raising a (runtime-internal)NonLocalReturnException that will be handled by the (also generated-code) caller. Non-local returns are necessary to allow the code inside a closure to break/continue a loop that contains the closure, or return from the method that contains the closure. Java exceptions are a great mechanism to implement non-local returns. But it seems that javafxc is abusing this technique, using the non-local return exception for trivial return statements that are not non-local returns - in javafxc-generated closures, no less. Also, it seems the technique is not implemented efficiently, showing Throwable.<init>() as the third top CPU hotspot in one of my profiling sessions.

Then I further investigated this issue, and discovered that this trivial optimization...

function life (x:Integer, y:Integer) {
    var count = if (cells[y * SIZE + x]) then -1 else 0;
    for (yy in [max(y - 1, 0) .. min(y + 1, SIZE - 1)])
        for (xx in [max(x - 1, 0) .. min(x + 1, SIZE - 1)])
            if (cells[yy * SIZE + xx]) ++count;
    count == 3 or count == 2 and isLive(x, y)
    count == 3 or count == 2 and cells[y * SIZE + x]

...would change the generated code into:

public boolean life(int x, int y) {
    World receiver$ = this;
    int count = elem$World$cells(y * get$SIZE() + x) ? -1 : 0;
    int yy$ind = Math.max(y - 1, 0);
    for(int yy$upper = Math.min(y + 1, get$SIZE() - 1); yy$ind <= yy$upper; yy$ind++) {
        int yy = yy$ind;
        int xx$ind = Math.max(x - 1, 0);
        for(int xx$upper = Math.min(x + 1, get$SIZE() - 1); xx$ind <= xx$upper; xx$ind++) {
            int xx = xx$ind;
            if(elem$World$cells(yy * get$SIZE() + xx))
    return count == 3 || count == 2 && elem$World$cells(y * get$SIZE() + x);

The whole closure overhead was gone. No closure class anymore. A single method is generated, which bytecode is just as efficient as what javac would produce for equivalent Java code. No locals lifted to the heap, no extra binding support, etc. Notice for example, the simple "count++" instead of the previous gobbledygook "$Local$57$count = get$Local$57$count() + 1".

The big performance screwup was the fact that I was invoking a bound function, isLive(). This caused the caller function life() to "inherit" a ton of overhead that's apparently necessary to deal with bound functions. But this is probably a compiler bug/limitation, because isLive() is not itself a bound function, unless I don't understand the reason for that compilation strategy.

The bad news is that javafxc has some potential performance bugs (or missing optimizations):

  1. Inefficient use of NonLocalReturnException: a) Use in places where it is apparently not necessary; b) should reuse a preallocated exception object;
  3. Absence of optimized compilation of script-private functions that are never used as values (don't need the code for "first-class" support);
  5. Unnecessary propagation of overhead from bound functions to common (non-bound) caller function;
  7. Induction of binding overheads for local variables that are lifted to closure fields;

All these issues must be confirmed, I'm not intimate with the javafxc compiler. Alas, the identified overheads are actually pretty common in other high-level languages... although they are often "hidden" inside interpreters or runtimes, but "exposed" in JavaFX Script which is fully static-typed and compiled. This exposure is good because programmers can easily spot useless bloat and complain about it. ;-) The compiler will certainly keep improving its intelligence to only add extra overhead where it is really necessary.

But if I found a single important new fact about JavaFX's performance, that's it: Bound functions are expensive and dangerous. The extra overhead is not limited to the compiled code of the bound function itself, or even to call-sites; if you have any common function that contains call-sites to any bound function, this entire function will compiled with lots of extra overhead. In my Life program, the bound function was very simple so I just manually inlined it. Otherwise I would have refactored it into a pair of functions: a (possibly script-private) function that performs the actual work, and a public bound function that wraps over it and is only invoked by code that really needs the bound behavior.

Performance Mystery II: Redundant Binding

This section could also be titled: "I am stupid".

Text rendering performance was still a major problem, so I proceeded to investigate it. I know that Java's string formatting APIs are somewhat expensive, but they shouldn't be that bad - the profiler was showing some enormous overhead, in CPU and memory allocation, coming off places like Matcher.<init>() and Formatter.format().

Then I noticed the bug. I have a label with a bound expression:

Label { text: bind "({animSlider.value as Integer}) Gen: {world.gen} Pop: {world.pop}" }

The bug is simple: the variable world.pop is updated incrementally, once for each live cell, in the method World.run().

public function run ():Void {
    var pop = 0;
    cells = for (y in [0 ..< SIZE]) for (x in [0 ..< SIZE]) {
        def cell = life(x, y);
        if (cell) ++pop;
    this.pop = pop;

Fixing the bug was trivial: I created a local variable pop, so I can do a single update to the field in the end of the method. The previous code was forcing the entire rendering of the Label (formatting, rasterization, layout, clipping...) to be repeated for each live cell accounted in each generation.

This is the flipside of JavaFX Script's binding feature to be so simple, so seamless: you don't notice the overhead. There are not explicit setters or firePropertyChange() calls. A Swing programmer would never make this kind of mistake, because the property-change stuff is all explicit. Spotting this kind of performance bug is difficult, maybe due to the immaturity of tooling: no JavaFX-specific support in profilers. Two JavaFX engineers, who told me that they found a huge bottleneck in the Label formatting and rendering, didn't notice the cause.

My new rule of thumb: Don't update public[-read] properties inside loops. Ever. Even for non-public properties, you are advised to avoid repetitive updates. Just mirror the property in a local variable, and update the field only at method's end.

Even in Java this is an interesting micro-optimization, although in JavaFX Script (definitely not a system-level language) we're not supposed to use such low-level techniques... except if, as demonstrated now, there are new, higher-level reasons for that. ;-)


My Life program is now incredibly faster; it runs the "Life" test at full 64 rows @ 50ms delay, without dropping frames, scoring ~19.9 fps. Memory allocation is much saner at ~1095Kb/s (~1 young-GC of 4Mb, costing only 3ms each 4s). CPU usage is still higher than that of a competing Swing program, but that's due to my purist use of sequences and binding; I could easily optimize these... but I'm happy that I didn't, because this pushed me to find my real performance problems.

The graphics / animation engine is not the bad guy that I suspected in the previous blog. It's not doing any stupid reconstruction of the entire internal scenegraph just because I change a trivial fill property of some nodes. Even the string interpolation bug was ultimately insignificant.

People planning to use JavaFX for advanced animation and games must only take some care, like not allowing an avalanche of binding events in every frame, and not updating bound(able) properties inside tight loops (duh!). I also advise to completely avoid bound functions in code that's even remotely performance-critical.

As a final note, I know that my animation strategy is "wrong"; I shouldn't trigger direct changes to the scene graph when a new Life generation is calculated. I should use a separate Timeline to refresh the display. The current strategy, coupling internal state changes to display updates, makes impossible to run GOL in high-speed mode - I can easily calculate many thousands of generations per second, but no graphics technology would be able to catch up in that much frames-per-second.


JavaFX's Game of Life Blog

Posted by opinali May 20, 2010

There is an unwritten tradition that John Conway's Game of Life must be implemented in every programming language and every GUI toolkit. Well, OK I just invented this tradition, but it's a smart introduction and Life is one of the easiest games / cool animations you can program. But it's not too simple that we can't learn a few important things about JavaFX...

My goal: a good-looking and feature-complete version of the Game of Life (GOL), but keeping code simple, short, "canonical". I won't resort to low-level optimizations (e.g. reaching to JavaSE APIs), but I may use high-level ones (e.g. good algorithms, careful selection of JavaFX features). How well JavaFX handles the task the way it is indented to be used?

So, let's start. The complete app is short enough that it fits in this blog, in a few small pieces.

class World {
    var SIZE:Integer on replace { reset() }
    var cells:Boolean[];
    var gen:Integer;
    var pop:Integer;
    function reset () { gen = pop = 0; cells = for (i in [0 ..< SIZE * SIZE]) false }
    bound function isLive (x:Integer, y:Integer)   { cells[y * SIZE + x] }
    function flip (x:Integer, y:Integer):Void      { cells[y * SIZE + x] = not cells[y * SIZE + x] }

The World class implement the game's data model and the GOL algorithm. The cells sequence contains true=alive, false=dead; it would ideally be a matrix, but JavaFX Script doesn't support multidimensional sequences so I have to do some index arithmetic. I could have used primitive arrays (with JavaFX Script's nativearray) but that would be impure, as native arrays are only intended for Java integration and don't completely integrate with JavaFX Script.

    function life (x:Integer, y:Integer) {
        var count = if (cells[y * SIZE + x]) then -1 else 0;
        for (yy in [max(y - 1, 0) .. min(y + 1, SIZE - 1)])
            for (xx in [max(x - 1, 0) .. min(x + 1, SIZE - 1)])
                if (cells[yy * SIZE + xx]) ++count;
        count == 3 or count == 2 and isLive(x, y)

Function life() is the finite state machine for an individual cell; nothing JavaFX-specific here. Except that I hate and instead of &&.

Oh, I I didn't make the obvious optimization of creating two extra rows and columns to avoid the min/max tests to prevent out-of-bounds errors at border cells without all neighbors, because this would reduce the general seamlessness of working with sequences. (It takes a lot of discipline to resist the urge of micro-optimization... ugh...)

    function run ():Void {
        pop = 0;
        cells = for (y in [0 ..< SIZE]) for (x in [0 ..< SIZE]) {
            def cell = life(x, y);
            if (cell) ++pop;

Function run() recomputes the whole world (all cells). The inner for xx builds a sequence for each row, and the outer for yy concatenates all row sequences in a single big sequence ("auto-flattening"). I didn't worry, because the compiler may optimize this by adding the inner elements directly into a single sequence for the outer loop.

((( Begin Parentheses to investigate the compiler (((

Sequences are immutable; updates are performed by creating a full-new sequence, copying all non-updated elements. The compiler can optimize this too, with temporary mutable representations in methods that perform multiple updates; ideally you trust the compiler by default, and only if optimize if necessary (as indicated by profiling). Having said that, my run() function replaces the current sequence by a new one, requiring a single assignment - but I didn't do it to optimize code, I did it because it's more elegant: the code explicitly calculates the entire state N+1 as a function of state N. In fact, run() was a one-liner before I augmented it to update the generation and population counters.

Notice that the GOL algorithm cannot be implemented easily with in-place updates because the new state of each cell depends on the current state of all cells around it. I could have used an in-place algorithm, but that would be uglier and also require some mutable data type like a nativearray.

Another interesting aspect of JavaFX Script is that is sequences are optimized for all basic types. My cells:Boolean[] uses a primitive boolean[] as internal storage, consuming a single byte per element; I've certified this behavior in the profiler. Let's check all these optimizations in the generated bytecode (decompiled):

    public void run() {
        World receiver$ = this;
        set$World$gen(get$World$gen() + 1);
        BooleanArraySequence jfx$25sb = new BooleanArraySequence();
        int y$ind = 0;
        for (int y$upper = get$World$SIZE(); y$ind < y$upper; ++y$ind) {
            int y = y$ind;
            BooleanArraySequence jfx$26sb = new BooleanArraySequence();
            int x$ind = 0;
            for (int x$upper = get$World$SIZE(); x$ind < x$upper; ++x$ind) {
                int x = x$ind;
                boolean cell = life(x, y);
                if (cell) set$World$pop(get$World$pop() + 1);
                boolean jfx$27tmp = cell;
            Sequence jfx$28tmp = jfx$26sb;
        Sequences.set(this, 1, jfx$25sb);

Oh, crap - the compiler didn't use a single BooleanArraySequence like I expected. Unless my memory fails, javafxc is capable of this optimization, but maybe just for simpler cases. It seems the compiler has ways go go. Another missing optimization is preallocation: the maximum number of elements that will be inserted can be statically determined (SIZE for the inner sequences, SIZE*SIZE for the outer), so the compiler should create the sequences with these initial sizes, avoiding growing costs. Finally, every iteration of the outer loop allocates, uses and then discards a temporary sequence (its elements are copied to the outer sequence); this inner sequence could be allocated only once and cleared/recycled in all outer loop iterations. The latter optimization is unnecessary if the compiler could just avoid the inner temporary sequence, but I can see other scenarios where this wouldn't be possible but the reuse of temporary sequences would.

There are also other gratuitous inefficiencies in the generated code, like several redundant temporary variables. (One of these, receiver$, is an artifact of traits, already planned to disappear from unnecessary places). Also I wonder if the order of the synthetic $ind and $upper variables in the bytecode may confuse loop optimizations (just like it confused my decompiler). Such small issues won't impact runtime performance as the JIT compiler will just optimize them out; but the redundancies affect startup/warmup performance and also code bloat.

Why I am complaining so much? JavaFX Script is a high-level programming language, in the sense that its mapping to the compiled form (Java bytecode) is not trivial (like it is for Java). And it actively promotes a high-level programming style, both by offering very convenient high-level features such as sequences and binding, and by not offering alternative low-level features (except for the recourse of "native interface" into Java classes). The net result of this design is that the compiler must assume responsibility for all the low-level optimizations that programmers can't do anymore (or, are convinced that it's not good style to do anymore - e.g. explicitly mutating sequences). In my Java code, I always do such things as preallocating collections, recycling expensive objects (remarkably big collections), or eliminate intermediary collections produced by inner loops.

The javafxc compiler already includes some impressive amount of such high-level optimizations; but we need more. Performance is already pretty good, but there is a lot of potential to be even better; I expect the code generation to keep improving for many updates to come.

))) End Parentheses to investigate the compiler )))

Anyway, let's continue the program...

    function scroll (dx:Integer, dy:Integer):Void {
        cells = for (y in [0 ..< SIZE]) for (x in [0 ..< SIZE]) {
            def yy = y + dy;
            def xx = x + dx;
            yy >= 0 and yy < SIZE and xx > 0 and xx < SIZE and isLive(xx, yy)

World's last function, scroll(), allows me to scroll all cells in any direction. Nothing remarkable here.

def world = World { SIZE: 64 }
def CELL_SZ = 8;

def animSlider = Slider {
    min: 0 max: 1000 value: 50 blockIncrement: 50 layoutInfo: LayoutInfo { width: 120 }
def anim = Timeline {
    repeatCount: Timeline.INDEFINITE
    keyFrames: KeyFrame { time: bind animSlider.value * 1ms canSkip: false action: function () { world.run() } }

Now we start the game UI. I declare the world object, and the animation timeline that triggers a new generation in fixed delays. A Slider allows to change this delay; I had to declare it here so I can use binding to automatically adjust the KeyFrame's delay from the slider value.

Notice the value * 1ms calculation, necessary to convert a Double to a Duration. The multiplication is a no-op, as 1ms is Duration's fundamental unit. You can't use a typecast (value as Duration), because the Duration type needs a unit (ms, s, m, or h) and there is no default unit, not even for 0. I like that, and I'd love to see JavaFX Script evolving to embrace user-defined units in its core typesystem; this would make a lot of sense for a high-level language serving business applications stuffed with manipulation of "real-world" data.

def toolbar =  HBox { spacing: 8
    content: [
        Button {
            text: bind if (anim.running) "Stop" else "Go"
            layoutInfo: LayoutInfo { width: 60 }
            action: function () { if (anim.running) anim.stop() else anim.play() }
        Button {
            text: "Clear" layoutInfo: LayoutInfo { width: 60 }
            action: function () { world.reset() }
        Label { text: bind "({animSlider.value}) Generations: {world.gen} - Population: {world.pop}" }
    onKeyPressed: function (e:KeyEvent) {
        if (e.code == KeyCode.VK_DOWN)       world.scroll( 0, -1)
        else if (e.code == KeyCode.VK_UP)    world.scroll( 0,  1)
        else if (e.code == KeyCode.VK_LEFT)  world.scroll( 1,  0)
        else if (e.code == KeyCode.VK_RIGHT) world.scroll(-1,  0)

I have a top row of controls that allow to stop/start the animation, reset it to the initial state, control its speed, scroll the cells in four directions, and show the generation and population stats. Only remarkable part is the if-else cascade in onKeyPressed(), because JavaFX Script lacks a switch/case statement. The language already has first-class functions and closures, so adding a map type would allow efficient (hashed) branching for larger numbers of keys, and reasonably compact code too.

def life = Group { content: for (yy in [0 ..< world.SIZE]) for (xx in [0 ..< world.SIZE])
    Rectangle { x: xx * CELL_SZ y: yy * CELL_SZ width: CELL_SZ height: CELL_SZ
        fill: bind if (world.isLive(xx, yy)) Color.BEIGE else Color.BLACK
        stroke: Color.BLUE
        onMouseClicked: function (e:MouseEvent) {
            world.flip(xx, yy);

The main "game" region is a grid of Rectangles to show each cell. Once again I use nested Y/X loops, producing the sequence expected by Group.content. For each Rectangle, I've used binding to set its fill color according to the corresponding cell state.

Yup, that design (and even my choice of the Game of Life - mwahahaha!) was a purposeful stress-test of both binding and the scene graph; with the 64x64 world size, this means 4.096 nodes and 4.096 bound properties, so I'm relying a lot on compiler and runtime efficiency.

The handling of mouse clicks, used to toggle cells, is trivial because I can attach the event handler to each Rectangle, so I don't need any picking logic. Notice also that my even handler is a "full closure" that reaches to the xx, yy variables - the indices of the for loops that built the Rectangle sequence.

Finally, in that same mouse handler I force the keyboard focus to the toolbar because that's where I installed the KeyEvent handler for scrolling.

Stage {
    title: "Life" resizable: false
    scene: Scene { content: VBox { content: [ toolbar, life ]}}

The Stage and its Scene, with the toolbar on top of the game region. It's complete! Click the image below to launch:

Game of Life screenshot

The resulting functionality and even the look are, IMHO, surprisingly great for a program that's under 100 lines of code. Just google "Game of Life Patterns", the web is shock-full of GOL resources. Just click the cells; sorry no import/export of LIF or RLE files yet - may appear in the payware version ;-)  This validated my impression about JavaFX's productivity...

A B[l]inding Puzzler

...but I confess that my first code didn't work; the cells didn't change in the screen, as if world was not being recalculated at all. The bug was here:

    function isLive (x:Integer, y:Integer)   { cells[y * SIZE + x] }

    Rectangle { fill: bind if (world.isLive(yy, xx)) Color.BEIGE else Color.BLACK ... }

My fill: bind... was not firing when world.cells changed. The problem is, the only variables captured by the bind expression are world, yy and xx. These are the only data which updates would trigger reevaluation of my bind expression. The cells sequence is encapsulated by the World class, and it's not directly referenced from the bind expression. This may be a significant puzzler as someone could start with a more "scriptish" prototype code full of script-scope variables, and later refactor these into classes.

The fix was trivial once I found the problem; just declare bound function isLive..., and it works. Now the binding system knows that the subexpression world.isLive(xx, yy) is also invalidated when the cells sequence is changed, because that field is used inside isLive() and this dependency propagates to bound expressions that invoke isLive(). (Such propagation is not a completely obvious feature; it shows that the binding mechanism is pretty well rounded, with robust dependency tracking.)

Timeline issues

The KeyFrame.time property is read/write; this is very convenient because I can change the animation speed by just updating this property in my single KeyFrame. Unfortunately, this doesn't work very well. The program starts with a configuration of 50ms; if you click Go and then drag the slider (left = smaller delays / faster animation, right = larger delays / slower animation), the animation will adjust its speed but not smoothly. Sometimes I observe a pause of a few seconds, sometimes a "race" of very fast animation while dragging the slider. The animation engine must be doing some timing/scheduling that become temporarily confused when a KeyFrame's time property is changed.

I have tried some alternate implementations - using an intermediary variable with an on replace trigger that stops (or pauses) and timeline, changes the KeyFrame delay and resumes it; and even create a new Timeline. But the result was always similar.


This is a simple app, so it shouldn't put a big stress on JavaFX... except perhaps, for my sub-optimal state management, and large counts of nodes and bindings.

Idle test: Empty world, animation stopped. CPU usage is 0 as expected, and GC log shows zero activity. This test looks trivial, but it's good to assert that no part of the system (binding, scene graph) uses polling, busy-waiting or other brain-dead techniques. Some platforms are known for non-zero CPU usage in idle apps, so it's good to show that JavaFX won't do that. ;-)

Dead test: Empty world (no live cells), animation on at 50ms (20 gen/s == 20 fps). CPU usage was a lowly ~1,1% (on a quad-core Q6600; so that's ~4,4% of a single core). Most work is due to the recalculation of the world; we can inspect the GC log:

[GC 15.049: [DefNew: 4421K->5K(4928K), 0.0005543 secs] 13263K->8847K(15872K), 0.0005950 secs]
[GC 16.201: [DefNew: 4421K->5K(4928K), 0.0006040 secs] 13263K->8847K(15872K), 0.0006480 secs]

That's ~3,8Mb/s = ~190Kb per frame/generation = ~46 bytes per cell update. It's a bit higher than I originally expected; the missing sequence optimization is certainly the cause, as it produces a lot of extra allocation. But even that is not so bad, because Java's excellent GC produces near-zero pauses.

Dead & Headless test: Similar to the previous test, but I commented out the bind in Rectangle.fill, so the entire GUI layer is a no-op after initial startup. GC behavior was identical, but CPU usage down to 0,89% (of a single core). This means that the binding was costing 0,22% core (or 0,00005% per cell: how's that for precision?). Notice that my code is assigning new values to every cell; the fact that the new values are identical to the old values only saves the effort to repaint the rectangles, but the bind expressions must be reevaluated every time.

Life test: I changed the life() function to just flip all cells in the first few rows. [I only change the return statement, to not remove the effort of calculating all cells with the normal GOL algorithm.] This provides a stable animation test; a real Life run is difficult to benchmark because the number of cell changes in each generation varies chaotically.

The animation engine could not keep up with many rows - the updated cells are only refreshed in some frames. Testing with 2 rows (128 cells) is fine; at 4 rows (256 rects) I could already see skipped frames. Garbage collection was intense, let's see it for 2 rows:

[GC 9.439: [DefNew: 4422K->9K(4928K), 0.0014487 secs] 13267K->8855K(15872K), 0.0014851 secs] 
[GC 9.480: [DefNew: 4425K->6K(4928K), 0.0015159 secs] 13271K->8851K(15872K), 0.0015514 secs]

The animation engine does a lot of allocation when I simply change the Rectangle.fill property to a different Color. We're up to ~120Mb/s = ~6Mb / generation = ~46Kb per updated cell. it seems that the scene graph completely rebuilds its internal node objects ("SGNode's") when some property changes. These preprocessing techniques are essential to accelerate such things as transforms and effects, but in this case I'm just changing a simple Rectangle's internal painting from one solid color to another solid color.

The program is doing full vector rendering; if you check JavaFX Balls, drawing things from geometric elements may be much slower than just blitting a bitmap image. But JavaFX Balls used a complex drawing with curves and gradients; Life only draws pretty dull rectangles without rounded corners, transforms, effects or anything else. I've tested the Prism toolkit too but this time it didn't save JavaFX; basically the same behavior.

I experimented some optimizations that I didn't originally want to use:

  • The first obvious thing is using a single ImageView for the background of all-dead cells. Then I have one fixed Rectangle per live cell, just hiding it when the cell isn't live.
  • Using ImageView also for the live cells (so, full "bitmap rendering").
  • To show/hide the live cells, I've tried both flipping the visible property, and moving dead cells away from the view (change y to a big negative value - this requires some layout tweaks).
  • Finally, I changed the code so that the entire Group of rectangles is protected by a single bind expression, and only live cells will generate a Rectangle. This replaces all live-cell nodes, if any, at every frame. The advantage is that most cells are usually dead, so the scene graph has less nodes.

All these optimizations net me a maximum of ~2X speedup; I could animate 4 rows of pulsating cells with proper behavior (no visible frame skipping - still, high CPU and GC activity).

It seems that JavaFX's scene graph is already perfect for GUIs with controls, but must improve its support for general animation. Changing the state of a large number of nodes per frame shouldn't have such a high cost. Even adding/removing many nodes should be faster, although I realize this is harder and will accept tradeoffs in coding effort - e.g. carefully breaking the scene graph into many groups, then adding/removing these to the scene, maybe with a hint to let the engine do all preprocessing in parallel and only make the new nodes visible when they're fully realized but not hang the animation until that happens.

This would be perfect for problems like Joeri Skora's Isometric tile rendering in JavaFX; notice that JavaFX 1.3 has pretty good performance for a very big scene graph - even in Use brute force mode, his animation scrolls over a 65.536-node scene with surprisingly good performance. But that's just because the scene is completely static; and the approach of dynamically adding and removing nodes, even with optimizations like quadtrees, suffers from the overhead of changing the scene tree.

I don't expect that JavaFX's scene graph would be optimized for huge scenes (e.g., with a spatially indexed node tree for more efficient clipping). JavaFX is not meant to compete, out of the box, as a high-end game engine. But it should be sufficiently powerful and flexible to allow programmers add extra tricks and optimizations that become necessary in each application niche. Besides games (a huge business even in its "casual" category), there are other important cases for advanced animation, such as sophisticated data visualization.

JavaFX versus Swing

I've quickly googled "Java Swing Game of Life" and found one program that's very close to mine. (Even closer after I stole its idea of having a slider to change the animation speed.) I've made a few changes to the Swing app to make both comparable - same cell number and size, same optional hack for the stable Life tests.

Code size and clarity: JavaFX wins. The Swing code is ~230 lines, this after I've stripped many redundant comments and {}s. Even removing all remaining comments (not fair because the code is clearly not all obvious) and tightening the formatting/indentation even more, it's more than 2X the size of the JavaFX Script code. And that's with less features (no scrolling). There is no contest in code size - or much more important, code clarity, that is more subjective but if you check the code I don't think there's much space for argumentation. (This program may not be the best possible Swing code, but I don't think that would be much better.)

The Swing program does no custom painting, it creates a custom JLabel for each cell and changes its background color - this is nice because it's the closest thing to a "scene graph-based" Swing program: all rendering is performed by the toolkit. Also, the panel objects contain the cell state and the GOL algorithms, using two state variables and two complete passes over all cells to enable in-place updates. That should put the Swing program in performance advantage over JavaFX. (Once again I could optimize my GOL program, remarkably for in-place update - but I don't want to; I'm focusing on easy to write, easy to read code.)

Memory usage: Swing wins. Measuring the Life test, JavaFX uses 54Mb working set / 100Mb private bytes (8,852Kb heap); JavaFX+Prism is better at 51Mb / 82Mb (8,875Kb heap).  The Swing program uses 55Mb / 94Mb (3,170K heap, but without any significant allocation/GC activity even in the Life test). JavaFX uses more heap, remarkably for bindings and closures and the scene graph. JavaFX 1.3 has improved the efficiency of both binding and the scene graph, but if you use both in the range of thousands, there's still enough overhead to care. But JavaFX really loses in its excessive memory allocation when scene graph nodes are updated.

Performance: Swing wins. The Swing program doesn't suffer from the issues I discussed with JavaFX's scene graph; it happily runs the Life test with near-zero CPU and GC activity - like we should expect from a simple, 64x64 Game of Life running on a current computer, and executed by native code (in that case JITted) and a reasonably hardware-accelerated toolkit (which includes Java2D/Swing).

Last Conclusions (and RFE's...)

I am still quite happy with my Life program. It was a pleasure to write, and some of the performance issues (remarkably with sequences) are easy to fix if I care. Perhaps even the scene graph limitations have a smarter workaround that I didn't try - e.g., rendering all live cells with a dynamically-built Path (I was just too lazy to try that one...) or some other trick.

I'm not happy though with these scene graph limitations; I should be able to write an efficient version of something like the Game of Life without any optimization effort. Adding/removing many nodes from the scene graph is very expensive; I can accept and understand this, it's the core tradeoff of scene graphs (but still, in JavaFX the tradeoff seems unreasonably high). But I neither understand, nor accept a big overhead for trivial updates in existing nodes - like changing a solid color, flipping the visibility state, or even just translating. At least in this area, it seems that JavaFX must still improve significantly. Even if JavaFX is now very close to be an excellent and complete platform for some important use cases - control-centric (e.g. business front-ends) and media-centric - the platform is still in the beginning of a steep adoption curve and it can't afford to not serve other niches very well.

As a JavaFX enthusiast, I like to refer to all former Java GUI toolkits (AWT/Java2D/Swing, LCDUI, even SWT) as "legacy" and "obsolete"; but this is clearly not fair while there are programs that I can write in these old toolkits with excellent results, but not in JavaFX. I'm optimist because JavaFX has already improved a lot since its v1.0; the foundations are very solid and the JavaFX team is now very fast catching up in areas like high-quality controls, layout and styling. The compiler is also maturing fast; the optimization of binding in 1.3 was massive (even though not yet complete) and the sequence optimizations are ongoing.

Finally, we could argue that the scene graph paradigm is not ideal for all graphics applications, but I don't believe that. I see immediate mode rendering as the future Assembly coding of graphics. On the other hand, shader programming is an important piece of modern graphics stacks; JavaFX uses this intensely (remarkably in Prism), but unfortunately it's only internal. With that support, I could write all the cell rendering easily inside a single canvas node - the Life "world" can be rendered as a big functional texture, and its rendering is a ridiculously-parallelizable task that's perfect for the shader paradigm. Shaders are often great replacement for important use cases that don't favor scene graphs. So, in my (non-expert) opinion, the big missing piece in the JavaFX stack is not a traditional immediate-mode API, but opening Decora (the desktop runtime's portable shading engine) for applications, with a public shading API.


Flash Is a Right Blog

Posted by opinali May 8, 2010

Ian Bogost's recent article Flash is Not a Right highlights some new aspects of the debate about Apple's iPhoneOS development restrictions. I have a different opinion.

I understand Ian's pain as a teacher. Programmers who aren't curious, don't like to explore varied languages and paradigms, are doomed to rank-and-file roles. But this is secondary. The purpose of computing is to serve the needs of end users. But for this to happen, computing has to be a healthy industry: one that allows fair competition, rewards efficiency and quality - these are core values of our economic regime, and the reason behind many consumer protection laws.

Granted, not all platforms are like the PC; many require you to pay a developer fee, sign NDAs, adopt DRM technologies, abide to the vendor's certification and distribution channels, etc. These restrictions have been around since the dawn of computing, and developers generally don't have issues with reasonable terms. But I have no knowledge of a previous computing platform that would enforce the kinds of non-reasonable restrictions that Apple wants to enforce now.

I'm not saying that Apple should help people to use their preferred tools. I'm not asking Apple to OEM-install Adobe Flash Player; it's their product and they deal the deck. But if that deck seems to have some deuces to me, I should be free to use my aces - as long as I put up with the effort and cost, and do it within the behavioral rules imposed by the platform (i.e., only install unprivileged userland code; follow standards of security, reliability, UI guidelines, etc.).

Blocking high-level tools opens a huge can of worms that nobody is talking about: it creates artificial, unfair disadvantage for smaller developers. It's a plutocratic move, that favors huge companies and screws with small shops and indie developers. High-level programming languages and frameworks that create a layer over the raw platform are popular for several reasons. Portability is not the only reason; productivity is another huge reason too and it's even more important - Steve Jobs's Thoughts on Flash smartly avoids this issue. Multiplatform was never Flash's primary selling point; its popularity boomed even when the Mac was down (so Windows was the single desktop to matter). The major reason for Flash's adoption was, by far, features and productivity. Pure web browsers are now catching up with the features, but Flash still benefits from a powerful suite of design tools; simple validation & deployment; and trivial hosting.

Tools like Flash bring power and productivity for the masses. Apple can stop Flash; but they cannot stop all similar tools. Take any huge software company (e.g. Electronic Arts - a big iPhone game provider), and they have their own high-level platform that they rely on: frameworks, design tools, code generators. Even embedded languages and compilers like Lua. (The latter violates even the previous iPhoneOS terms, but Apple pretends to not see it.) They often use game engines too - very big components that actually become the real platform; most "application code" is written to the game engine's API, not to native APIs. Apple's new terms would certainly forbid that. Even if Apple wanted, they couldn't block the likes of EA to use high-level tools, because these may be in-house and not public, and cannot be detected without expensive reverse-engineering of the binaries.

Ian Bogost mentions game engines in a purely negative way - "same plain-vanilla experience (...) lowest-common denominator". Jobs complained that cross-platform tools may not provide full access to system. This is not necessarily true. MonoTouch enables full access to the OS APIs, and it's updated to new iPhoneOS SDKs within days. Others may just need extra work, e.g. a JVM supporting JNI. There's also "right tool for the job" - not all applications need every iPhoneOS feature; if Flash is good enough for your app, why not use it? And when it is not, it's your choice to either create a mediocre product, or use another tool. (If you make the first choice, Apple might reject your app because it's Trash - not because it's Flash. But they probably won't reject it for the former reason, as the absolute majority of mediocre apps in the AppStore shows; and that' fine too, we should just let the free market push bad products to the bottom of the heap.)

Some of the greatest games of all time were built on a reusable, portable game engine (Maniac Mansion anyone?). Yes, the most innovative games are often those that introduce a new engine - or at least a major revision (e.g., The Day of the Tentacle for SCUMM v4). But this happens mostly because there are only few opportunities for breakthrough innovations: a new smart algorithm or game concept; next-gen CPU or GPU - some lucky game will be first, and will be famous mostly because it was first. Also, games that use a common engine often step outside it, or customize the engine, for incremental innovation. The engine provides the 80% of common features for some game category and platform generation, stuff that's just stupid to rewrite for each title. While ancient games would access the video hardware directly, all current games will instead rely on a thick stack, from GPU microcode to low-level drivers to relatively high-level APIs like D3D and OpenGL. Yet Apple is not telling people to skip these layers and program the iPhone's PowerVR hardware directly. This illustrates the idiocy of opposing higher-level stacks. Multi-layer architectures and increasing abstraction are among the core foundations of Computer Science. Steve Jobs is basically asking us to ignore some of the most important best-practices of our profession, and this is Wrong. (Of course he wouldn't know better - Jobs has never written a "10 GOTO 10" program in his life - that's why I'm being a bit scholar here, just in case he reads me.)

(Just wrapping up on games, don't forget design and content; these play a major role to provide a unique experience. The majority of all good games don't have innovative coding, and don't milk the platform's utmost capacity. Tetris had amateur graphics even for 1984. To paraphrase Bill Clinton: It's the creativity, stupid.)

Apple's contempt for developers is way too blatant to not deserve revolt. Apple is a successful company because they have a strong, competent focus on end-users; this can only be lauded. But they have crossed the line when they handle developers with fascist manners - authoritarianism, interventionism, indoctrination. For Apple, developers are servile sharecroppers who should be grateful for profiting from the landlord's properties. Apple is ruling over factors that have no objective impact on application quality, such as programming language choice. They are manipulating their developer base, for their exclusive benefit - pushing Apple's agenda against Adobe and other competitors. This is ultimately dangerous even for Apple, that's losing its famed customer focus; end-users will not benefit from this business.

Programming to any platform in Flash - or C#, Java, whatever you like - is your right.

Performance: JavaFX Balls

As soon as I've got JavaFX 1.3 and NetBeans 6.9-beta, first thing I did was obviously running benchmarks, and the new update delivers on its promise. Let's first check JavaFX Balls (port of Bubblemark). I've last reported results for 1.2 here; but scores for 1.2 are updated again to account for changes in my test system, remarkably the JDK (now 6u21-ea-b03).

TestJavaFX 1.2           
JavaFX 1.2           
JavaFX 1.3           
JavaFX 1.3           
JavaFX 1.3           
JavaFX 1.3           
1 Ball999 fps1000 fps1000 fps1000 fps  
16 Balls998 fps998 fps1000 fps1000 fps  
32 Balls986 fps998 fps998 fps998 fps  
128 Balls490 fps636 fps608 fps666 fps  
512 Balls90 fps108 fps124 fps151 fps  
@ 60 fps642 Balls699 Balls815 Balls878 Balls817 Balls1.173 Balls
@ 200 fps285 Balls358 Balls366 Balls428 Balls  
Effect, 1 Ball666 fps666 fps666 fps972 fps  
Effect, 16 Balls150 fps165 fps162 fps220 fps  
Effect, @ 60 fps44 Balls47 Balls45 Balls66 Balls377 Balls642 Balls
Effect, @ 200 fps12 Balls13 Balls12 Balls14 Balls  
2D, @ 60 fps70 Balls70 Balls68 Balls71 Balls96 Balls105 Balls
2D, @ 200 fps18 Balls20 Balls20 Balls20 Balls  
2D+Eff, @ 60 fps27 Balls28 Balls25 Balls26 Balls75 Balls82 Balls
2D+Eff, @ 200 fps7 Balls7 Balls7 Balls7 Balls  

JavaFX 1.3 shows once again good improvements in the scene graph's scalability - its advantage over 1.2 is bigger for higher node counts, topping at a 37% more fps for 512 Balls, or 28% more balls for 200 fps. JavaFX Balls is a worst-case animation in some aspects, all its nodes move every frame; in many real-world animations, some elements are either static (background) or semi-static (objects that only move or change when reacting to some event), so these will likely scale up to thousands of nodes as JavaFX uses standard tricks like dirty regions and bitmap caching to avoid redundant work.

The performance impacts of Vector rendering ("2D" tests) and Effects are unchanged: both options cost a lot. The Effects framework is the worst offender - a simple BoxBlur effect will bring your performance from 815 to 45 Balls (20X times worse) @ 60 fps. But..., this just for the standard graphics toolkit (identified as "Swing", because it's build on top of some core classes from the legacy AWT/Java2D/Swing stack).

Now let's activate the next-generation Prism toolkit (with -Xtoolkit prism; currently in Early Access). For the bitmap and vector tests, Prism is just as good as the old toolkit. But enabling Effects changes everything; Prism is almost 10X faster than the Swing toolkit, scoring an incredible 377 Balls @60 fps. Like long expected, Prism finally renders effects with full hardware acceleration and without the extra buffer copies that spoil effects on the Swing toolkit.

How good is Prism's score? I didn't focus on benchmarking against other RIA runtimes here, but these results are much better than the top scores I measured last June for PulpCore and LWJGL/Slick. The latter is a dedicated, lightweight 2D game engine and it's also OpenGL-accelerated, which makes Prism's advantage impressive. The Prism Bubblemark program is not available anymore (server down as I write this), but the latest PulpCore scores only 30 fps for 512 Balls. (PulpCore can do 65 fps with a new "Pixel snapping" optimization that rounds all coordinates to integer values - that looks interesting, I will add it to JavaFX Balls later.)

I've also repeated these tests with HotSpot Server. This VM is not viable for client deployment, but we can notice areas of possible improvement - anything that runs substantially faster with the Server compiler is (a) written in Java, and (b) has optimization potential. And we can really see HotSpot Server beating the pants off Client, in the simpler tests that only measure the scene graph's performance. Combining HotSpot Server with the Prism toolkit, I've broken the 1.000 Balls barrier for the first time in the @60 fps test.

Problem: When I add many balls in the JavaFX Balls animation in a single step, e.g. from 128 to 512 balls, the animation "freezes" for a noticeable time - close to a second. This happens because JavaFX relies on preprocessing (pre-allocating/computing objects that are reused at each frame for translations and other pipeline tasks). Prism's delay to add many nodes is not worse than Swing's, but not better either. My test case is perhaps extreme - most real-world animations should not add hundreds of nodes to the scene in a single keyframe. Anyway this shows one possible bottleneck that may deserve optimization in future releases.

Performance: Strange Attractor

My next test is the Strange Attractor benchmark. I didn't expect any improvement in this program, because it makes minimal use of JavaFX's scene graph - all animation is performed by manual writing of color values to a large array of pixels that is finally blitted to the screen.

TestJavaFX 1.2           
JavaFX 1.2           
JavaFX 1.3           
JavaFX 1.3           
MainListDouble74 fps96 fps80 fps94 fps
MainSeqDouble62 fps77 fps65 fps78 fps
MainFloatRaw144 fps166 fps162 fps250 fps
MainListDouble3D62 fps78 fps50 fps64 fps

The performance delta was modest as expected - except for the large improvement in MainFloatRaw with HotSpot Server, and the regression in all scores for the MainListDouble3D test. The latter test has extra code inside the inner rendering loop (for smarter calculation of pixel colors), so the smaller performance may just be some unlucky effect of different javafxc code generation over JIT optimizations.

Why no scores for Prism? The Strange Attractor program had to be ported, because it reaches into Image.platformImage, which I empirically found to be a java.awt.image.BufferedImage (containing a DataBufferInt) in previous releases - and still in JavaFX 1.3 with the Swing toolkit. But Prism's runtime type is com.sun.prism.Image; inside this object there is a java.nio.HeapByteBuffer object that contains the pixels. And the pixel data was only 8bpp, because the Image was created from an 8bpp blank.png file. Well, I changed the code that reaches to the pixels and recreated this PNG at 32bpp. But the program still doesn't work - the image has wrong colors, and I can only see the first frame because my trick to force refresh (calling ImageView.impl_transformsChanged()) has no effect on Prism. I've tried other methods, including some with promising names like impl_syncPGNodeDirect()... but nothing makes Prism sync the window with the updated pixel buffer. I'll be happy to hear about your findings, otherwise we cannot efficiently program bitmapped animation for JavaFX anymore. Another problem is that performance sucks - I get only ~29fps with full usage of one CPU core, and that's without any refresh.

Performance: GUIMark

GUIMark is one benchmark that used to be a disaster for JavaFX, as I briefly reported before. The problem is tracked by bug RT-5100: Text layout in FX is much slower than a pure Swing app. The root cause of this umbrella bug is RT-5069: Text node computes complete text layout, even if clipped to a much smaller size. These bugs are still open - although they report some progresses; for one thing, part of the problem is blamed to the JavaSE's Bug 6868503: RuleBasedBreakIterator is inefficient, and that bug is closed as fixed in JDK 6u18. So I decided to test GUIMark again.

ProgramJavaFX 1.2           
JavaFX 1.2           
JavaFX 1.3           
JavaFX 1.3           
JavaFX 1.3           
JavaFX 1.3           
GUIMark1,81 fps2,22 fps2,81 fps4,44 fps78 fps120+ fps

Text layout performance is better in JavaFX 1.3, but the bug is still alive; the ~2X better scores are still awful. But, that's only true for the Swing toolkit. Prism doesn't suffer that problem, delivering wonderful GUIMark scores.

Notice that I've tweaked the benchmark to use a 0ms keyframe, and used JavaFX's internal FPS logger. It's the only way to allow maximum FPS count when the animation runs too fast, and at the same time, get precise performance numbers when it runs too slow. Also, I cannot measure the real score for Prism / HotSpot Server because Prism caps fps at 120 - but in my system this test consumes 20% CPU (0,8 core in a quad-core system), so I can project ~150 fps.

In the same test machine I get these scores: Java / Swing = 43 fps (Client) / 50 fps (Server); HTML (Firefox 3.7-a4, with DirectDraw & DirectText enabled) = 47 fps; Flash 10.1rc2 = 53 fps; Silverlight 4.0 = 55 fps. Thanks to Prism, the Order of the Universe will be restored, with Java's performance ruling once again.

The other GUIMark implementations are also capped, either by their own code or by their runtimes. I removed this limit only for Java/Swing, changing a timer's delay from 17ms (= max 60 fps) to 5ms (= max 200 fps); but as I expected there was no effect in the performance because the program cannot reach even 60 fps. The HTML, Flash and Silverlight programs can't reach 60 fps so they're not limited by capping. Additionally, they all saturate the CPU - HTML uses a full core (25%), Flash uses a bit more (30%). Silverlight uses two full cores (50% of my quad-core CPU!), very surprising because I didn't run the multithreaded version of the benchmark, and because the score is actually terrible considering that it consumes 2X more CPU power than other runtimes that deliver similar fps ratios.

GUIMark was designed to measure only a RIA runtime's animation & graphics pipeline - layout, drawing and composition engines; it doesn't run any significant amount of "application code", so it should not benefit from a more efficient language and JIT compiler... Except of course, that JavaFX eats a lot of dog food - its core runtime is partially Java bytecode. But it also contains significant native code (including "GPU-native" shading code), and remarkably in Prism I wouldn't expect the Java code to be critical; still, HotSpot Server consistently makes a big difference: roughly double GUIMark performance. Profiling the VM, I noticed that HotSpot Server optimizes java.nio's direct buffers much better, as well as some other APIs involved in bulk data manipulation like Arrays.fill(); these methods are all over Client's profile but totally absent in Server's (which means intrinsic compilation). Prism heavily relies on these methods for the interface with the accelerated pipeline (D3D in my tests on Windows). This seems to hint that even after Prism ships, JavaFX performance could gain yet another significant performance boost: the Client VM just needs to acquire a few critical optimizations that are currently Server-exclusive.

Static Footprint

JavaFX 1.3 promises many performance enhancements, including reduced startup time and memory usage, and this is critical because - remarkably now with 1.3's already very good core feature set - deployment is by far the most important factor for JavaFX's adoption.

ProgramJavaFX 1.2JavaFX 1.3
HelloWorld2 classes, 2.726 bytes2 classes, 2.579 bytes
JavaFX Balls19 classes, 95.19 bytes19 classes, 117.005 bytes
Strange Attractor62 classes, 563.769 bytes62 classes, 427.992 bytes
Interesting Photos53 classes, 238.902 bytes46 classes, 431.741 bytes
GUIMark9 classes, 93.841 bytes27 classes, 224.904 bytes

The tally of compiled classes, for these few programs, shows a regression in javafxc 1.3 - it may produce 25% less bytecode (Strange Attractor), but will most often produce more bytecode, up to 140% more (GUIMark).

Strange Attractor is the single app (except the trivial HelloWorld) that consists in a single .fx script file (more exactly, several .fx files, but they are all independent variations of the same program). The javafxc compiler can perform some important "closed-world optimizations": for example, a private or script-private property that is not involved in any binding expression in that script can be compiled without support for binding. On the other hand, when this overhead cannot be optimized out, generated code is typically bigger than in 1.2 - largely thanks to the awesome enhancements of compiled bind, that delivers higher-performance binding with a tradeoff in more sophisticated code generation. But even for the applications that have bigger static footprint like Interesting Photos, we are promised a net gain because the dynamic footprint is greatly reduced (no expression trees for interpretation of bound expressions); so you loose some Kb in code size, but you win more than you've lost in reduced heap usage. (This is the theory - but check the next section!)

JavaFX Optimization Rule: Fine-grained decomposition into many small .fx files, with generous public members, will balloon the code footprint of your JavaFX app. Even if this is more than compensated by reduced dynamic footprint, you want both costs down if possible! Existing bytecode optimizers/obfuscators for Java (e.g. ProGuard) may help a bit, but won't be optimal as javafxc's code generation is very complex, and performing the closed-world optimizations I mention above is not just a matter of stripping unused class members. Suggestion: add a javafxc option to request these optimizations for all public properties in a project, assuming that no external, separately-compiled code will have "incoming binding" on our code - that's a safe assumption for most JavaFX apps (but not for libraries).

Finally, the worst-case of GUIMark is related to binding: this programs makes some extensive use of binding - 53 bind expressions, in a program that has 430 lines including spaces and comments. The "compiled bind" system seems to be as heavy on the call sites, as it is on the definition of bound variables. The program is also written in a very "scriptish" style, 80% of it is variables and functions in the global scope, it's possible that a more OO style with higher encapsulation would help the binding optimizations. Notice that compiled bind is not really complete; no less than 9 compiled bind optimizations have slipped into JavaFX "Presidio" (1.4). This includes at least one item that would apparently remove the code bloat at call sites: JFXC-4199: Use a delegate class instead of creating a new class for objlit.

ProgramJavaFX 1.2JavaFX 1.3           
JavaFX 1.3           
HelloWorld1.660 classes1.717 classes984 classes
JavaFX Balls1.847 classes1.885 classes1.111 classes
Strange Attractor1.894 classes1.996 classes1.201 classes
Interesting Photos2.095 classes2.032 classes1.193 classes
GUIMark2.033 classes2.207 classes1.360 classes
AWT HelloWorld1.050 classes
Swing HelloWorld1.206 classes
Swing GUIMark1.511 classes

In the table above, I run each program with -verbose:gc and check the number of loaded classes up to the startup screen. JavaFX 1.3 loads a few more classes in 3 of the 4 tests; some tiny average increase is expected considering its much increased feature set, but it's nothing to worry about.

On the other hand, Prism turns an impressive win: a minimal HelloWorld program loads less than a thousand classes, saving 60% off the Swing toolkit. We shave off 733 classes for HelloWorld, 839 classes for Interesting Photos. The latter is still a simple program, not even completely loaded - I stop counting when the initial screen appears, before any action or user event - but this is the right way to measure startup overheads. For a bigger real-world app the ~800 classes saved by Prism may be a small fraction of all loaded classes; but this is irrelevant for loading-time experience if the app is able to display its welcome screen quickly and then load the rest of classes and resources on demand or in background.

Just ask Flash developers: it's hard to walk into a real Flash app (excluding tiny widgets like menus, animated ad banners, or video-player shells) that doesn't need incremental loading techniques - including the infamous loading progress animations. Still, Flash gets a rep of "instant-load" experience, just because it bootstraps fast - the core runtime and the application's startup code are able to load quickly. When JavaFX can do the same, we are in competitive territory.

Looking at the classloading stats for standard JavaSE applets, they are almost as lightweight as JavaFX+Prism - an AWT-based HelloWorld applet loads 66 extra classes (6% more), a Swing version loads 222 extra classes (22% more). JavaFX sans Prism has the worst classloading story, because it combines two graphics/GUI toolkits, as the JavaFX stack is layered on many core classes from AWT, Java2D and even Swing... JavaFX was initially a big regression in bootstrapping costs. Prism not only fixes this regression, it's even more lightweight than the legacy APIs, which is impressive. Sun has really faced the challenge to dump a huge piece of Java legacy and rewrite it from the ground up, introducing a full-new toolkit that has zero dependency on the old one. There are more advantages in this approach, like starting from a clean slate (no need to support bug-per-bug backwards compatibility with 15 years of legacy), embracing modern architecture (GPU acceleration), and much better portability (Prism is reportedly very portable across OSes and even JavaFX profiles; the JavaFX TV is built on CDC + Prism, and inspection of the SDK's jar files apparently shows that Prism shares more components with the Mobile runtime too).

Sun has really pulled a Swing Killer of their own. Too bad that IBM's SWT, for all the secession it caused in the Desktop Java ecosystem, only focused on conventional, GUIs, without much thought for graphics/media-rich apps or direct tapping of GPU capacity... perhaps it was too early for that in ~2001. The fact is that JavaFX+Prism renders obsolete not only all of AWT/Java2D/Swing (and JavaME's LCDUI and more), but also SWT/JFace. The Eclipse Foundation seems to only believe in pure-HTML RIA, they are only evolving their RCP technologies towards the web (RAP & RWT), so JavaFX is the major option forward for Java rich GUIs apps that can't fit in the feature and performance envelopes of HTML5 + JavaScript.

Dynamic footprint

In the last test, I look at each program's memory usage, using -XX:+PrintHeapAtGC and executing jmap -histo:live <pid> on the JVM after it was fully initialized (this triggers a full-GC on the target VM, so I can see precise live-heap stats).

ProgramJavaFX 1.2JavaFX 1.3JavaFX 1.3           
HelloWorldHeap: 628K         
            Perm: 2.716K
Heap: 671K         
            Perm: 3.318K
Heap: 421K         
            Perm: 3.199K
JavaFX BallsHeap: 1.085K         
            Perm: 3.685K
Heap: 801K         
            Perm: 4.161K
Heap: 635K         
            Perm: 3.779K
Strange AttractorHeap: 13.876K         
            Perm: 3.764K
Heap: 18.448K         
            Perm: 4.306K
Heap: 17.629K         
            Perm: 3.957K
Interesting PhotosHeap: 2.073K         
            Perm: 4.264K
Heap: 2.039K         
            Perm: 5.308K
Heap: 739K         
            Perm: 4.501K

JavaFX 1.3 uses significantly less heap memory than 1.2 for JavaFX Balls; a bit less for Interesting photos, a bit more for HelloWorld, and a lot more for Strange Attractor. The latter creates a big linked list of tiny Particle objects, and I wouldn't expect these objects (simple {x, y, z, next} class) to have a different footprint, so I decompiled the classes emitted by javafxc and checked the instance fields emitted for the Particle class. For JavaFX 1.2:

int VFLGS$0;
public double $strangeattractor$MainListDouble$Particle$X;
public double $strangeattractor$MainListDouble$Particle$Y;
public double $strangeattractor$MainListDouble$Particle$Z;
public Particle $strangeattractor$MainListDouble$Particle$Next;

Now let's check JavaFX 1.3:

public short VFLG$Particle$X;
public short VFLG$Particle$Y;
public short VFLG$Particle$Z;
public short VFLG$Particle$Next;
public double $Particle$X;
public double $Particle$Y;
public double $Particle$Z;
public Particle $Particle$Next;

The fields with VFLG$... names are bitmaps used by the binding system to keep control of which variables are dirty, needing reevaluation. JavaFX 1.2 used to need a single bitmap for the whole object (up to 32 mutable fields; objects with more fields could need additional bitmap fields). But now, it seems that JavaFX 1.3 uses extra control fields that are per each object field - possibly for lazy binding or other compiled-bind enhancements (see my past discussion of binding). This added four short values to my object = 8 bytes (possibly more with alignment), a ton of overhead for a class that will have 300K instances. The odd thing is that this class is script-private, and although its fields are mutable, they are never used in binding expressions - and javafxc 1.3 is supposed to be smart enough to remove these overheads for code that can benefit from closed-world (script-local) optimizations, remember? But it didn't work here; this may be a limitation or a bug. I wondered that the generated code could be better for the MainSeqDouble variant of the program (where Particle's fields are only modified at initialization), but it's exactly the same, even if I change the fields visibility from default (script-private) to public-init. (public-init fields can be modified after initialization; but only by code from the same script.)

In the bright side, JavaFX Balls's significant heap savings should be attributed to the larger number of scene graph nodes created by this program, even at startup with 16 balls. I repeated the test with the heaviest possible test: 512 balls, with 2D and Effect options. The heap usage was: JavaFX 1.2 = 13.703K; JavaFX 1.3 = 6.865K, JavaFX 1.3 (Prism) = 10.787K. Just like promised, JavaFX 1.3 saves a big amount of overhead for scene graph nodes. But it's remarkable that in this benchmark, Prism will consume more memory than the Swing toolkit and land in the middle of the scale from 1.2's 13Mb and 1.3's 6Mb. It's possible that Prism is more aggressive in optimizations that require more memory (for caches etc.), but this result may also just reflect its current beta-quality stage.

Finally, I looked at the "Perm" memory usage (PermGen, the region used by HotSpot to store code for loaded classes). JavaFX 1.3 uses more memory than 1.2 (from +12% to +24% in my tests); but Prism loads less code than the Swing toolkit, so it almost reverts to 1.2's PermGen sizes (+2% to +17%). All tested benchmarks have little code of their own, so virtually all code comes from libraries: the JavaFX runtime and its dependencies in the JavaSE core. Even if JavaFX 1.3 has higher PermGen usage, this is probably much less important because the runtime has a fixed size, its cost doesn't scale linearly with application size or complexity (once you load all APIs and widgets, the runtime won't grow anymore for bigger apps).

[It's hard to assess GUIMark's dynamic footprint, because the animation cannot be paused; also the allocation rations are intense so the next section will be sufficient.]

Garbage Footprint

Having less objects retained in the heap is not that good if a program allocates too many temporary objects, causing memory pressure that forces the heap to expand, high GC overheads, and long GC pauses. In my last test, I executed the benchmarks with -XX:+PrintGCDetails and -XX:+PrintGCTimeStamps, to calculate the number of objects that are allocated and recycled per time unit. I executed JavaFX Balls in the 512 Balls mode to have a fixed effort per frame. All scores are normalized to as bytes-per-frame.

ProgramJavaFX 1.2JavaFX 1.3JavaFX 1.3           
JavaFX Balls82 Kb/frame75 Kb/frame0,33 Kb/frame
Strange Attractor10 Kb/frame5,75 Kb/frame 
GUIMark22 Mb/frame24 Mb/frame75 Kb/frame

Comparing JavaFX 1.2 to 1.3, in the first two tests the new release burns 43% less memory in the Strange Attractor benchmark - which barely does anything other than event handling, some math and writing to a raw int[]. So, this test shows that JavaFX 1.3 is more efficient, allocating half the temp objects to sit doing almost nothing. ;-) The JavaFX Balls test shows that 1.3 is also more efficient, 8% advantage, doing heavy animation with many nodes. So far, a nice incremental improvement.

GUIMark shows a small regression in JavaFX 1.3, but it's difficult to judge these results because the frame counts are very low and the animation engine is having to skip frames. Anyway, both allocation scores are extremely high, which certainly helps to understand the awful GUIMark scores of 1,81 fps and 2,81 fps: it's not just some dumb layout calculation algorithm; the system it allocating memory like there's no tomorrow, which is both wasteful in itself, and points to other internal inefficiencies.

Prism is the big winner though, pulling order-of-magnitude gains again. Prism's performance comes from a smarter architecture, and we can see another aspect of that here - the Prism animation engine allocates vastly less temporary objects. To be exact, it's under 0,4% of the Swing toolkit's allocation ratios for both JavaFX Balls and GUIMark, which is some insane amount of optimization. Well, other way to look at this is that the Swing toolkit is really broken for tasks like text rendering and complex layout.

The Java/Swing GUIMark scores 5,4Mb/frame, which makes it only ~4X better than JavaFX. The Swing code has no scene graph, it's all immediate-mode rendering, which saves some live memory. Still, a quick profiling session shows abundant allocation of temporary char[], int[] and byte[] arrays - a smoking gun for drawing and text code that needs to copy data into temporary buffers that are not reused. This provides an interesting argument against the common wisdom that scene graphs are less efficient than immediate-mode rendering. Truth may sometimes stand in the opposite side, because a scene graph is able to reuse buffers and other expensive helper objects across frames. My analysis of the Java/Swing GUIMark source code revealed very few allocation inefficiencies - creation of some constant Color and GradientPaint objects in the paint methods - and I fixed them all, but there was no impact in performance; the byte-wasters are definitely inside JavaSE's graphics and text APIs.

The GUIMark tests need some Full-GCs, except again for Prism. HotSpot Server will avoid Full-GCs even for the Swing toolkit, but that's because it sizes the heap more aggressively. Server's default starts at 64Mb but it stabilizes at ~68Mb for JavaFX 1.2 and ~90Mb for JavaFX 1.3; while HotSpot Client runs the program within its much smaller default heap size of 16Mb. That's one of the reasons why you don't want HotSpot Server for client-side apps. I could have made these numbers better, and probably gain some performance, with some manual tuning - but my execution rules for all RIA benchmarks include zero manual tuning. For GC in particular, the "ergonomics" feature of HotSpot must be good enough for client apps. For Prism, once again we have ideal behavior, both variants of HotSpot run the program with their respective default heap sizes.


My experience running all these testes was not 100% smooth. I've already reported the worst issues, with StrangeAttractor, but these were expected/deserved as I've relied on internal APIs.

JavaFX 1.3 is not 100% compatible with JavaFX 1.2. Check the JavaFX Migration Guide. This time around, we have very minor language changes.

  • The change to forward-reference behavior is very welcome (fixes a language design mistake; previous behavior produced code that didn't behave as expected by the programmer).
  • Binding expressions must now be pure (no side effects), a very welcome change - JavaFX Script has a light functional subset, and bound expressions are one place where enforcing pureness delivers great benefits.
  • The change of binding to the default lazy behavior is the same - temporal transparency is another important part of functional purism, and once again it allows the compiler to be extra smart; you just need to make sure that your program doesn't depend on eager behavior, e.g. using a bound expression to invoke a method that must be run at initialization time.
  • The new on invalidate clause and isReadOnly() synthetic method should not break any code, except perhaps if you use these names as identifiers.

Then we have some breaking API changes; most are very subtle issues like changed access modifiers, or changed algorithm for preferred size calculation. Some larger changes in layouts, controls and charts mean that apps having complex UIs, e.g. with nontrivial layout code, should be the most affected. Nothing in the lists above affected any of my tested programs.

When testing Prism, I've had some extra trouble with my usage of non-public APIs. The code and properties that I use to disable the animation engine's "pulse" (that caps the number of frames per second) does not work on Prism. Setting the property com.sun.scenario.animation.pulse to 1000 has no effect on Prism, and while I've found Prism's properties (check prism-common.jar / com.sun.prism.pk.PrismSettings), there is no equivalent "pulse" property. There is a property prism.vsync that I can disable, but the only result is that instead of a perfect 120fps rate (on my system/monitor at least), Prism will use other mechanisms (just like the Swing toolkit) to approach the same 120fps target. I will appreciate any hint if you know better.

Bye-bye Swing Bridge

Beware too of the javafx.ext.swing package, as it is completely unsupported on Prism - will bomb with a NoClassDefFoundError. This is not going to change; Prism cannot support legacy Swing controls - in fact, Prism configures the JVM run run in AWT Headless mode, so many AWT/Java2D/Swing classes will fail to work even if you create them directly. Prism does its own access to video hardware, creation of top-level windows, has its own Event Dispatch Thread etc., so it's just not possible to allow the equivalent stack from JavaSE to run simultaneously. If you've been relying on the javafx.ext.swing package to easily port legacy Swing code or to compensate for missing JavaFX controls, be warned that in some future release, Prism will most certainly become the only supported toolkit, and the JavaFX/Swing bridge will be gone for good. (I suppose that he first release with production-quality Prism will make it default but still support the Swing toolkit; the latter could be removed in another release or two. But it will be removed. Won't likely get any enhancements or even fixes in this EOL period, either.)

This also explains why the JavaFX team didn't do a better job in that bridge, not implementing some features requested by Swing programmers, e.g. embedding JavaFX components inside Swing apps - I suppose they could have implemented these features, but that would just push more people to do quick Swing/JavaFX ports or hybrids, that would hit a brick wall in a future JavaFX release. It's not that Sun was evil, refusing to support Swing developers and their assets of code, tools and components. Now that 1.3 already offers a decent set of controls - ok, at least considering the experimental ones, and third-party like JFXtras's - and has really powerful and easy building blocks to create new controls, I'll warn any adopters to consider the Swing bridge mostly as a compatibility feature... for JavaFX 1.2 apps that already used Swing controls. Not as something that should be used for development of any new, JavaFX 1.3 app.

I believe though, that it should be possible to create a new compatibility layer: a Swing-over-JavaFX implementation, where the Swing controls and higher-level API are implemented on top of the JavaFX scene graph. The SwingWT project does something similar (Swing over SWT), so my suggestion is probably possible - but it's also very likely a pretty big effort. For one thing, this could be justified as a transitional solution for the NetBeans RCP platform.


In this blog I've focused only in a few areas of JavaFX 1.3, mostly performance and Prism, and still with a limited set of benchmarks. I didn't even start looking at the new controls or any other changes, and 1.3 has quite a few new & enhanced features. But performance is critical, it can spell the success or the doom of a platform. JavaFX was already the RIA toolkit to beat in some performance aspects - although mostly due to the superiority of its underlying VM, that (even with the Client version of HotSpot) is second to no competitor's. Java's technology for JIT compilation, GC etc., just rules. But JavaFX it still needs to catch up in some important issues, remarkably startup time and footprint. This requires continuous improvements in both the JavaFX and the JavaSE runtimes. JavaFX 1.3 adds some nice incremental improvements across the board, but it still carries the weight of the legacy AWT/Java2D/Swing toolkit, plus the limitations of JavaFX's first-generation animation package.

What you really want is Prism, which puts some impressive results here, in every performance aspect from speed to loading time and footprint. Prism wins by a landslide the previously-embarrassing GUIMark test; enables upcoming advanced features like additional 3D support; and there's rumor it will make zero-calorie french fries too.  So, when is Prism shipping? I guess that will be JavaFX 1.3.1 in June. Prism was originally supposed to ship in production quality in 1.3; its current EA release looks surprisingly good (although my tests are admittedly very limited); and JavaFX TV's FCS depends on it. This certainly shouldn't freeze JavaFX adopters again until the next release; yes the next one will be even better, but even without Prism the JavaFX 1.3 release is already a big step forward, greatly expanding the number of applications that JavaFX could serve very adequately.

Filter Blog

By date: