In the last updates, I did a quick port to JavaFX 1.2 and evaluated its performance again (and again). But as I keep playing with this benchmark and learning JavaFX, I added a few extra enhancements:

  • New options of 512 Balls (desktop) / 128 Balls (mobile), and Adaptive 60fps. These make easier to compare to some other versions of Bubblemark.
  • Binding-related enhancements recommended here.
  • More benchmarking-friendly behaviors. The animation is off when the applet starts; this allows you to change the options (with preview!) before the animation starts, important to measurewarm-up performance. The status is updated more often; logged to the console; more precise (2 decimal digits), and more correct (when some option changes, the status would previously mix new configuration info with a score related to the old config).
  • (desktop) New rendering option to enable use of JavaFX'sEffects framework. I chose the BoxBlureffect, because it's a simple, popular effect that I'd expect any competing animation package to offer or be capable to program easily. Other Bubblemark ports don't have this option, but I'd be thrilled if they'd implement it.
  • (mobile) A help screen that documents the keypad controls, and disappears only when the animation is started.
  • (desktop) The control toolbar was removed, because its cost is not insignificant even with caching. I adopted keyboard controls similar to the mobile version, just with a different selection of keys. The status line shows a short help text, similar to the help screen from the new mobile version, but always-on.

Check the source code. The applet below shows the desktop version. I still refuse to offer a WebStart deployment because the java.net site doesn't serve JNLP files with the correct content-type and that would force me to host the JNLP somewhere else. (The kind of problem that would never happen in a site from Adobe or Microsoft.)

UPDATE: Applet now updated for JavaFX 1.3, so its performance may be different from what is described by the article.

   

A portability glitch

The new FPS counter adds two decimal digit of precision, so in a very hard test like 512 Balls + Effect, when the performance drops to <10fps, we still have 2-3 digits of total precision. But this uncovered a subtle portability bug. I used JavaFX Script's string formatting syntax, e.g. "{%02.2f frames}fps", where%02.2 is the System.printf()-style formatter forframes's value. But that code failed for JavaFX Mobile, because floating-point formatting is not supported. In fact, not even zero padding is supported. I had to add some manual formatting code as a workaround.

JavaFX Mobile supports formatting with internal packages, because there are no formatting APIs in Java ME, eitherprintf() or older APIs like DecimalFormat. The JavaFX Language Reference (which btw is in pretty bad shape - very incomplete and not significantly updated since JavaFX 1.0) doesn't document the formatting options, having a placeholder: "[To do: describe portable Formatter that handles the common subset]". Hopefully a future version of javafxc will have an option to generate warnings for code that uses syntax that's specific to the desktop profile, because this kind of error is not directly related to missing APIs so the compiler is currently silent even if the project is configured for mobile.

Benchmarking Effects

Enabling the BoxBlur effect makes the animation fuzzy, but not warm. The effect has a severe impact in performance:

  • 1 Ball: 665fps (from 995fps)
  • 16 Balls: 116fps (from 665fps)
  • 32 Balls: 60fps (from 665fps)
  • 128 Balls: 17fps / 15% CPU (from 400fps / 22% CPU: 23X slower)
  • 512 Balls: 2,15fps / 25% CPU (from 82fps / 23% CPU: 38X slower)

Notice that the base 1 Balls score is meaningless because on that test the animation is capped by the animation engine's 1KHz "pulse", in fact even 10 Balls score the same 995fps in my system. Also, the scores for 16 and 32 Balls are identical; both tests have minimal CPU usage, so once again the FPS rate should only be limited by timing artifacts. Because of this, I can't compute with any significant precision how much slower the effect makes these tests. I did that calculation only for 128 and 512 Balls, where the score seems to be limited by CPU.

I was expecting a sensible cost in performance, but not in the range of 20-40X slower. Looking at CPU usage in the same test, it's <1% in the standard run (Windows's Task Manager shows a stable "00"), but it jumps to ~8% (i.e., 32% of a single core from my Q6600). For 128 Balls test the CPU usage costs an extra 7%, but this must be normalized to performance, so it's 11,62% CPU/frame which is 13X worse than 0,88% CPU/frame without Effects.

The program dumps the acceleration used by JavaFX for this effect, and the result in my system is "Direct3D" so I suppose the BoxBlur is fully implemented by shading. There is some bottleneck in the activation of the Effects, perhaps for imposing additional steps in the pipeline, extra buffers, or something.

As it is today, the performance of Effects is not viable for higher-end animations, such as action games. It's perfectly fine for RIA apps, e.g. to cast a drop shadow in some internal frame that can be dragged. Notice however that I didn't test in machines with onboard graphics. I tested in my main test system that has a NVidia Quadro 1700 and also in a laptop with a NVidia GeForce 8400M. The Effects framework can also be accelerated by OpenGL, or by a x86/SSE pipeline for machines without shading-capable GPUs.

Dude, where are my frames?

In the bullet list above, I didn't report the score actually reported for the 512 Balls test with Effects. The program reported ~5,5fps, but that was obviously false. This is very odd, because my animation timeline uses the canSkip:true option, so it's only executed as often as the animation engine can actually produce frames. This is important to keep the FPS score honest. But something was not working as advertised. I added code to printnanoTime() at each execution of the timeline's action:

244768994737808 (delta: 467ms)
244769462624160 (delta: 13ms)
244769475647323 (delta: 13ms)
244769941612132 (delta: 465ms)
244769954661486 (delta: 13ms)

The first and fourth deltas above reveal that frames are being produced each ~465ms, so the score is a little north of 2fps. The ~13ms deltas are unreal; the runtime should be skipping frames or repeating the keyframe. The reason is linked to the bugs RT-2943 and RT-4052; both report problems for 0-duration timelines. My bug has different symptoms, so I filed a new bug, RT-5024: Animation drops frames but repeats keyframes. The JavaFX team responded quickly and I learned a few more things about the animation runtime.

I can avoid the 0-duration bugs by using a tiny duration, like 1ms (the smallest possible). I tested that and it fixed the FPS status, making it exact again. But there was a severe disadvantage: top performance dropped to 500fps, even for the most trivial test with 1 Ball. This turns out to happen due to OS scheduling latency (in my case Vista SP2, so I'm curious to know if the same behavior happens in other OSes, remarkably one with a realtime scheduler like Solaris).

Then, I can gain my high scores back by enabling the optioncom.sun.scenario.animation.fullspeed=true. With this option, the animation will not yield to the OS when there's nothing to do, so any impact from scheduling (delays, jitter) is eliminated or at least greatly reduced. This worked wonderfully, so I could reach a round 1000fps score with low Ball counts, and even at 512 Balls the performance improved from 82fps to 90fps, a very significant boost of almost 10%.

The bad news: fullspeed=true will cause JavaFX to 0wn your CPU, sucking it at 100% all the time, regardless of the amount of work done, even 1 Ball. (In fact I have a quad-core machine so this translates to "only" 25% CPU usage, so my system didn't get any less responsive.) This high and unconditional CPU usage happens because JavaFX will busy-wait instead of yielding. Notice that this doesn't disable multitasking; Windows can still preempt the JavaFX process to run other apps if there are no other CPU resources available. Still, it's a significant impact on the system in a bad scenario, like a single-core machine or even a multicore box that's loaded.

The busy-wait, or "spinning", technique is common in animation and game engines, as it can extract the last ounce of system performance for a few extra FPS. For example, I noticed that the LWJGL/Slickversion of Bubblemark, with V-Sync off, uses 20% of my CPU with 1 Ball - almost a full core, and it would probably reach 25% / 1 full core if ran outside the browser. That's an insane CPU overhead for a single Ball, even at 3070fps - JavaFX Balls can do 1000fps at <1% CPU, so it could theoretically do 3000fps in <3% CPU if JavaFX got rid of its 1000fps cap. I'm pretty sure that LWJGL/Slick, and perhaps other bubblemark competitors, resort to busy waiting. But I decided to not enable this technique in JavaFX Balls, as the cost/benefit is certainly not good for most potential apps; I think this only makes sense for advanced action games.

Deployment issues

After all updates since JDK 6u10 (I'm using 6u14) and JavaFX 1.0, Java applets still suffer deployment problems that make difficult to keep the faith in Java for the desktop.

In my Vista SP2 PC, JavaFX presents a dialog saying that it must install the JavaFX Desktop Runtime. I click Accept and the applet runs, but this stupid dialog will return every time. I checked the JPI cache and the JavaFX runtime is there all the time. This happens even for the JavaFX applets in javafx.com/samples; but not all applets, for example the old version of JavaFX Balls in my blog doesn't suffer this - but that one uses an older JavaFX version. It's probably some bug in the JavaFX Deployment Toolkit v1.2. I noticed this bug before but I was ignoring it because I blamed it on my messed-up developer's environment. So I uninstalled all my JDKs and JREs (as I keep everything from 1.3.1 up installed), manually cleaned up Java's entries in Windows' Registry, installed everything again, and the problem persists. In my laptop running Windows 7 RC, this problem doesn't happen.

I still suffer from the problem of multiple JavaPlugIn icons appearing in the Windows system tray. This bug was supposedly fixed, but it happens in my dev/test sessions because I typically run several "Java host" programs: at least one Java IDE, multiple browsers (usually the latest FF and Chrome), and sometimes Windows Live Writer which also shows embedded applets. The problem is that each "host" needs its own jp2launcher.exe process. That sucks, because I'm probably not getting all resource sharing I should get with a single launcher.

Conclusions

JavaFX 1.2 was a much needed update that fixed important holes and performance bottlenecks. But as they fix the ugliest issues, we, developers / enthusiasts / early adopters, just move to find smaller issues so we still have something to complain about. :-) The relatively poor performance of Effects is certainly something Sun should investigate, although it's certainly not a disastrous performance problem like the scene graph scalability that I complained about for JavaFX 1.0/1.1. I reported this new problem as issue RT-5035.

Moving from severe bugs / bottlenecks / missing functionality to less severe ones certainly shows progress in the platform; but perhaps this progress could be faster. I've seen other people criticizing JavaFX 1.2 for bugs in the new APIs like controls, and I think Sun deserves all flak they might receive, simply because the development of JavaFX is still disgustingly closed - how do you expect highest quality when such a major release is shipped withoutany public beta? In practice, all JavaFX "FCS" releases are betas, and if you intend to ship some app for JavaFX 1.2, my advise is waiting for the first maintenance update, as usual (1.2.1 or whatever). At least the project has a open issue tracking and the dev team is very responsive.

UPDATE: I refreshed the applet (and sources), fixing a bug with the Adaptive mode and also adding a new feature to scale the balls (keys UP/DOWN cycle from 1X to 4X scale in each axis, i.e. 1X to 16X areal size). Larger balls create virtually no cost for bitmap rendering, but the cost is significant for vector rendering and/or effects. Notice that I didn't care adjust the bouncing against walls and other balls, so the animation (remarkably for multiple balls) will look a little wrong, but I didn't want to make even more changes in the moving/collision code that should be kept as close as possible to the reference Bubblemark. Finally, I also changed some key bindings to make the desktop and mobile controls more similar and natural.