In my last entry, I briefly introduced the major features of the upcoming Consumer JRE. I'd like to now go into details on my pet project, code-named Java Kernel.
As previously mentioned, the idea is to create a 'minimal' JRE which has enough code to run System.out.println("Hello world!") and... well, that's about it. Every class or native library that isn't strictly necessary to boot up the JVM is excluded.
This minimal JRE has a few tricks up its sleeve, of course. It can detect when you try to access a class, such as javax.swing.JFrame, which isn't currently installed. It will then go download and install a "bundle" containing the required functionality. As far as your program can tell, nothing unusual happened -- it requested javax.swing.JFrame, it got javax.swing.JFrame. The only real difference is that (due to the required download) the classload took longer than usual.
Naturally, we display a progress dialog for any downloads taking a meaningful amount of time. If you use a freshly-installed Kernel JRE to run a Java program, you'll see a dialog telling you that a few components are being downloaded, and then the program window will pop up and life will continue as normal.
You usually won't see any other progress dialogs -- most programs download everything they need before the main window shows up. Even with the ones that don't, Swing and AWT are by far the biggest bundles you will end up downloading, and both of them will be there before the main window appears. The other bundles are mostly quite small and won't involve an objectionable delay (and, of course, if the delay is short enough we don't pop up a dialog at all).
Other than this, the Kernel JRE looks and feels exactly like any other JRE.
The Kernel JRE is currently divided into a hundred or so different bundles. These bundles generally follow package boundaries -- if you touch any class in (say) java.rmi, the entire java.rmi package will be downloaded. This means you'll end up downloading more classes than strictly necessary to run your program, but the alternative, downloading classes one-by-one, would be ridiculously slow due to all of the individual HTTP requests involved. We are trying to strike the proper balance between reducing the number of bytes downloaded and reducing the number of HTTP requests made.
Some bundles involve more than one package. javax.swing, for example, is entirely useless without javax.swing.event and several other packages. Since they are so tightly interconnected, they are packaged together into a single bundle. A few bundles don't cleanly follow package lines. In java.awt, for example, it makes sense to separate out the subset of AWT used by Swing programs. A Swing program isn't likely to touch AWT components like java.awt.Button, so we have a separate bundle (internally named java_awt_core) which includes only the AWT classes that a typical Swing program would use.
Still not small enough...
We've got other space-saving tricks, as well. Take a look at one of the core, absolutely essential files in Java 6: jvm.dll. This is (obviously) the JVM itself, needed to run all Java code. It's 2.3MB. And that doesn't include any classes, launchers, the installer, the Java Plug-In, Java Web Start, or any of the other essential JRE features. When you're trying to deliver an entire JRE in under 2MB, the fact that one of the required files is 2.3MB puts you at a pretty severe disadvantage.
Compression helps, obviously, but it takes more than a good compressor to squeeze things down this small. Java Kernel has its own version of jvm.dll, which omits a lot of optional features like JVMTI and additional garbage collectors. The current prototype's jvm.dll is a much more svelte 1.1MB. And when the Kernel JRE finishes downloading itself in the background, it will swap in the good old full client JVM, so you won't be without these optional features for long.
The Kernel JRE will continue to download its missing bundles in the background, whether they were specifically requested or not. Over a broadband connection, this will only take a couple of minutes, so the window of time during which you might run into missing bundles is brief.
After the last bundle is downloaded, the Kernel JRE will reassemble itself into an exact replica of the "normal" JRE. All of the disparate bundles will be repackaged into a unified rt.jar file, the Kernel JVM mentioned above will be replaced with the traditional client JVM, and so forth. A "finished" Kernel JRE will be byte-for-byte identical to a "normal" offline JRE.
But what if I want to pre-download everything I need?
The single most frequently asked question is "Can I force the Kernel JRE to go ahead and download everything I need, so that there are no pauses or download progress dialogs while my program is running?"
I mentioned during my JavaOne session that we were well aware of the need for this, and working on a solution, but that we weren't ready to discuss it yet. I'm pleased to announce that the plans for this have been finalized (well, as final as anything gets in the software industry...) and I can reveal them now.
The JDK will include a tool which allows you to assemble a "custom bundle" containing all of the classes and files needed by your particular program. You determine the entire set of JRE classes needed by your program (for instance by running java -verbose or by using a static analyzer) and then use this list to create the bundle.
(Command names and options likely to change)
> java -verbose -jar MyProgram.jar > class_list.txt
> jkernel -create custom_bundle.zip -classes class_list.txt
You can then install this bundle into a freshly installed Kernel JRE:
> jkernel -install custom_bundle.zip
You can run the jkernel -install command as part of your program's installation or startup. With a custom bundle installed, you can rely upon the absolute minimum set of classes and files needed to support your program, and thus get the smallest possible download size.
This isn't yet optimal for applets or web start programs, as (unlike standalone programs) they don't have the ability to install the bundle before they start to execute, and thus before any bundles are automatically downloaded. Ideally I'd like the ability to simply specify "And my program needs this custom bundle, also" in the applet tag or JNLP file somewhere -- the only question is whether we'll be able to get this into the first release or not.
Remember how the Java 6 jvm.dll is 2.3MB by itself?
The Kernel JRE's installer includes jvm.dll, the other native files and hundreds of classes needed to boot the JVM, the Java Plug-In, Java Web Start, java.exe, javaw.exe, javaws.exe, the installation code, and various support libraries needed to support the installer (such as unpack200).
And it's only 1.9MB.
If you build a custom bundle containing the classes required to run a typical Swing program, it comes out to about 1.5MB, for a total download of around 3.4MB for the JRE + custom bundle. Bigger programs might use as much as 4MB-5MB of the total JRE size, but it would be rare to exceed that.
Compared to the current JRE's size of somewhere between 10MB and 15MB, depending on how you measure it, hopefully you will agree that this is quite an improvement.
So, I'm sure you've got lots of questions for me. Shoot.
Dieter Krachtus just sent me a link to a project he's working on, a shell extension which allows you to treat JAR files as executable programs under Windows. Now, double-clicking on a JAR file has long caused it to be launched under "java -jar", but with the generic "Java document" icon it doesn't exactly scream "executable program". I'm not sure how many people even know that you can double-click on a JAR to launch it, and between that and the generic icon that probably explains why I've never seen a Java program which took advantage of that ability.
With the ability to embed multiple resolutions / color depths of icons directly into your JAR files, as well as Ant integration and a GUI, this looks like a nifty little project. It's currently limited by the fact that it has to be installed on the end-user's system to function, but... what if this sort of capability were integrated directly into the JRE? Is that something you would find useful?
It's also worth mentioning that, as a Mac user, I'm used to being able to "install" most applications by simply dragging them to my Applications folder or other convenient location, and "uninstall" them by dragging them to the trash. JAR files potentially represent the same capability offered to users of other platforms -- just download the JAR file, and that is the program, with no need to install it before using it or uninstall it when you're done with it. Just double-click on it to run it, and if you decide you don't want it anymore you merely need to delete it. I think there's quite a bit of merit to this idea.
Update: Sorry, I should have explicitly stated that Dieter does not work for Sun and this is not a Sun project -- it's just something I thought was neat. I should also mention that most of the credit goes to Chris Deckers, the project lead.
Ok, this isn't strictly Java-related, but it's geeky enough that I hope you find it interesting regardless.
Various sites have recently broken the news that the next version of MacOS X, code-named Leopard, will feature support for Sun's ZFS filesystem. As a Mac user, I find this news particularly exciting, but those of you still using Windows may want to take note as well.
If I sat down and wrote a list of all the things a super-powerful futuristic filesystem should do, completely without regard for practicality or implementation difficulty, not only would ZFS already do everything I came up with, but I doubt I would have imagined even half of its actual features. Suppose you want to clone your hard drive, install a test application, and then roll back to the previous state of your hard drive. How long would that take you? For most of you, long enough that you'd rather just cross your fingers and hope for the best.
Under ZFS, creation of a writable clone of a filesystem is essentially instant. It only has to maintain the difference between the two states, rather than two complete copies of the data, so the clone initially takes no space and virtually no time to create. Once you're done with your tests, destroying the clone is also essentially instant. The ability to instantly create, restore, and destroy snapshots and clones is incredibly powerful and something I'm very excited about, but it's not the only trick up ZFS' proverbial sleeve.
Among other things, ZFS is a 128-bit filesystem, meaning that the total storage it can manage in a single storage pool is 2128 blocks, which is a very big number. In fact, 2128 is such a big number that I'm going to unequivocally state that we will never, ever need more storage than that.
That's a bold claim. Many computational limits have been thought sufficient in the past -- who ever thought we would need more than 4GB of memory in a desktop system? I've seen people making similar claims in response to ZFS, thinking that we've passed every other limit, so why not this one? That's a reasonable question to ask, so let's take a look at how much data a 128-bit filesystem can actually hold.
We need to store a lot of data for this thought experiment, and nothing fills hard drives like video. Let's say it's high-definition video, complete with surround sound -- maybe 10GB / hour after compression. And you record this video 24 hours a day, 365.25 days a year. That's 85.6 terabytes a year, which is certainly a lot of data, but it's well within the reach of modern storage systems. So let's record this video for a very long time, say since the formation of the Earth 4.5 billion years ago. That's an inconceivable amount of data, roughly 359 billion terabytes, and is already more than a 64-bit filesystem can handle.
But what good is only one camera? It might end up at the bottom of the ocean and spend a billion years filming a family of sponges. We clearly need many, many cameras. Let's put one camera for each square meter of the Earth's surface, all of them recording high-definition video for 4.5 billion years. We're up to 2 x 1038 bytes now, an inconceivably large number. You could also express it as 200 trillion trillion terabytes, but that doesn't make it any easier to handle -- it's just too big for human understanding. We must have filled up the filesystem by now, right?
Well, this incomprehensibly gargantuan amount of data has indeed put a dent in our 128-bit filesystem, which is now about 0.1% full. All the data ever produced by the human race -- all speech, books, plays, movies, music, emails, everything -- is a tiny, tiny drop in the bucket in comparison.
A 128-bit filesystem effectively cannot be filled. The laws of physics set an upper bound on the amount of information we can cram into a certain amount of mass and volume, which means that it would take at least 136 billion kilograms worth of matter to hold that much data. And that's just a lower bound on the amount of matter necessary; it might not be a very tight bound (meaning the actual requirement is probably many orders of magnitude greater). Even ignoring the obvious impossibility of creating a storage device that large, you could never create enough data to fill it. Even with a high-definition camera on each square meter of the Earth's surface, it would take almost 4 trillion years' worth of video to fill it.
I think there's a lesson here. Our computers are now powerful enough that it's reasonable to choose limits so large that we can be essentially 100% confident that they will never, ever be reached, not in this universe at least. The question "How large can I imagine this value getting?" is very dangerous, because we humans are creatures of small imaginations. I'm not suggesting that every single limit must be so ridiculously large as 2128, but it's important to remember that arbitrarily chosen small limits have historically been a much, much greater problem than asking the computer to process an extra couple of bytes here and there.
And, because I want to at least mention Java here, take a look at JSR-202, "Java Class file Specification Update". One of the major changes is increasing various limits, because those initially chosen for the sizes of methods and so forth turned out to be too small. The limits of human imagination strike again.
I mentioned in my last entry that I have left Yahoo! and am now officially a Sun employee. After an all-too-short break between jobs, my first day with Sun was this past Monday, and it's been quite an experience so far.
As with joining any big company, most of my first week was spent trying to get someone to actually set up my access badge and email account, figuring out how to access documentation on various subjects (there is documentation, right?), and dealing with various other miscellaneous getting-started headaches. Most of that is sorted out now and I expect that I should actually be able to, you know, work starting next week.
The most exciting part so far has been the fact that I've gotten to be involved in the Dolphin planning sessions we're having this week. I've requested some deployment features in the past, including a gigantic whopper of one, and while I can't give away too many details, I can say that there is definitely some hope.
I mentioned wanting an "updatejava.exe" program which would allow you to install specific versions of Java upon request, tremendously simplifying the process of writing installers for Java programs. There's actually a very good chance of such a tool making it into Dolphin, although it would probably have a slightly different name. And lest I erroneously receive credit for this idea, I should point out that this feature was already being investigated before I even suggested it.
The "Browser Edition" I suggested in this entry is a more complex problem. When I wrote that entry, I was in the enviable position of being able to request ridiculous improvements and not having to actually write any of the code. As you can imagine, my position is quickly shifting from "Sun should add support for feature <X> right now!" to "Ummm... well... you see, that's a really hard problem and it would be a lot of work...". Regardless, I think it's okay if I reveal that there is a feature more-or-less identical to what I suggested in the infamous "Browser Edition" blog entry currently under consideration for Dolphin. This should not in any way, shape, or form be construed as a promise that we will actually do it -- in other words, don't get your hopes up -- but it's being considered. At the very least, you should be aware that Java applets are the subject of intense scrutiny around here and we are trying to figure out how to improve them, within the limitations of the time and manpower we actually have available to throw at the problem.
There's a lot of other neat stuff on the table, most of which I probably shouldn't talk about yet. Hopefully the tidbits I've tossed out so far aren't revealing anything that will get me in trouble... In any case, expect some neat stuff from Dolphin's deployment enhancements. It's also not too late to suggest things: we are definitely interested in your feedback.
When you work at a major Internet company like Yahoo!, deployment is a Big Deal. You have millions of customers running every version of every OS imaginable, some with marginally working computers, and they all need to be able to run your software. And they need to run it now -- make them wait too long, or download too much, and they'll give up and move on to your competitors.
While I'm definitely a huge Java fan, it's a hard technology to deploy to end-users. If a particular user doesn't have Java installed, or doesn't have the right version of Java installed, there are major challenges surrounding the detection, installation, and upgrading process. Even if users have the right version of Java installed, its behavior in web browsers isn't necessary all that reliable. I posted a couple of high-profilerants about Java deployment issues recently, to try to call some attention to these issues.
I wasn't really expecting much of a response. I figured some fellow complainers would show up, we'd talk amongst ourselves for a little while, and that would be the end of things. I wasn't expecting Sun to even notice my complaints, let alone actually do something about them. I was happy to be proven wrong.
I was asked to put together a resume, and invited to interview with the Java Deployment team. After two rounds of phone interviews and a grueling eight-hour interview process in Burlington, I accepted an offer to join Sun, and will finally have the opportunity to address some of the problems that have been bugging me for so long. This is a really exciting change for me -- to finally be working on Java, instead of just with it, and to be able to influence where things are headed... well, it's a Java geek's dream come true. Or at least it's this Java geek's dream come true.
We're still working out the details, but I should be starting at Sun in about a month. Yahoo! is a great company, and has been very good to me over the years, but this was an opportunity I just couldn't pass up.
To dispel any rumors...
Before any rumors get started, let me be the first to say that just because I blogged about some deployment ideas does not mean that any of those ideas will necessarily get implemented. Sun obviously found them interesting, or I daresay I wouldn't have been hired, but there's a big difference between finding an idea interesting and actually putting in the time and money necessary to implement it. In particular, don't expect a Java Browser Edition -- much as I would love to see it happen, I'm not naive enough to believe that it's at all likely.
I do have a lot of ideas for improvements to the Java Plug-In and Java Web Start; reasonably small, practical features that will nevertheless make a huge difference in how easy it is to deploy Java programs. Hopefully some of them will actually get implemented eventually, but at this point it's far to early to speculate on how things will work out. Rest assured that I will be doing my best to push for easier deployment solutions, and I'm all ears if you have any suggestions of your own.
What about JAXX?
What does this mean for JAXX, the declarative XML user-interface language I'm working on? In the short term, not much. The position I was hired for has nothing to do with JAXX, and I will still be doing all JAXX work in my spare time, rather than as a Sun-sponsored activity.
What about the long term? Your guess is as good as mine. All I know for sure is that I have a lot of ideas for the future of JAXX, and will continue to crank away on it. If the Swing team decides to add a user interface language to Java at some point (which I know they have considered), I expect I would at least be involved in the discussion. Other than that... who knows?
The decision to leave Yahoo! was both difficult and painful, but I think I made the right choice. I'm very excited to be joining Sun, and I hope that I will be able to really make a difference. Wish me luck!
Strings are a fundamental part of any modern programming language, every bit as important as numbers. So you'd think that Java programmers would go out of their way to have a solid understanding of them -- and sadly, that isn't always the case.
I was going through the source code to Xerces (the XML parser included in Java) today, when I found a very surprising line:
com.sun.org.apache.xerces.internal.impl.XMLScanner:395protected final static String fVersionSymbol = "version".intern();
There are a number of strings defined like this, and every one of them is being interned. So what exactly is intern()? Well, as you no doubt know, there are two different ways to compare objects in Java. You can use the == operator, or you can use the equals() method. The == operator compares whether two references point to the same object, whereas the equals() method compares whether two objects contain the same data.
One of the first lessons you learn in Java is that you should usually use equals(), not ==, to compare two strings. If you compare, say, new String("Hello") == new String("Hello"), you will in fact receive false, because they are two different string instances. If you use equals() instead, you will receive true, just as you'd expect. Unfortunately, the equals() method can be fairly slow, as it involves a character-by-character comparison of the strings.
Since the == method compares identity, all it has to do is compare two pointers to see if they are the same, and obviously it will be much faster than equals(). So if you're going to be comparing the same strings repeatedly, you can get a significant performance advantage by reducing it to an identity comparison rather than an equality comparison. The basic algorithm is:
1) Create a hash set of Strings
2) Check to see if the String you're dealing with is already in the set
3) If so, return the one from the set
4) Otherwise, add this string to the set and return it
After following this algorithm, you are guaranteed that if two strings contain the same characters, they are also the same instance. This means that you can safely compare strings using == rather than equals(), gaining a significant performance advantage with repeated comparisons.
Fortunately, Java already includes an implementation of the algorithm above. It's the intern() method on java.lang.String. new String("Hello").intern() == new String("Hello").intern() returns true, whereas without the intern() calls it returns false.
So why was I so surprised to see protected final static String fVersionSymbol = "version".intern(); in the Xerces source code? Obviously this string will be used for many comparisons, doesn't it make sense to intern it?
Sure it does. That's why Java already does it. All constant strings that appear in a class are automatically interned. This includes both your own constants (like the above "version" string) as well as other strings that are part of the class file format -- class names, method and field signatures, and so forth. It even extends to constant string expressions: "Hel" + "lo" is processed by javac exactly the same as "Hello", and "Hel" + "lo" == "Hello" will return true.
So the result of calling intern() on a constant string like "version" is by definition going to be the exact same string you passed in. "version" == "version".intern(), always. You only need to intern strings when they are not constants, and you want to be able to quickly compare them to other interned strings.
There can also be a memory advantage to interning strings -- you only keep one copy of the string's characters in memory, no matter how many times you refer to it. That's the main reason why class file constant strings are interned: think about how many classes refer to (say) java.lang.Object. The name of the class java.lang.Object has to appear in every single one of those classes, but thanks to the magic of intern(), it only appears in memory once.
The bottom line? intern() is a useful method and can make life easier -- but make sure that you're using it responsibly.
I'm on vacation with my family right now. Vacation time is pretty hard for me to come by -- one of the dangers of being "essential" is that nobody wants to let you leave -- so this is a noteworthy event, made possible only by the fact that I agreed to bring my cell phone and work laptop, ensure the availability of Internet access at my destination, and remain reachable twenty-four hours a day. When we arrived, I got out my laptop and booted it up to check my email. It got as far as showing a blank Windows desktop and then... sat there.
I spent a while fiddling with it, but sadly when your machine won't even boot into Safe Mode and you have no other bootable disks with you, there really isn't much you can do. So here I was, stuck without a functioning computer, when having a functioning computer and Internet connection was part of my vacation deal. A whole week without being able to check email, or read my webcomics, or being able to post blog entries... a whole week without Internet access of any kind.
I don't think I can take that kind of punishment, so I did whatever any true geek would do: I used this situation as an excuse to buy a new computer. I (a long-time Windows user) had been lusting after the new MacBooks for quite a while, and my wife was well aware of this. She was also well aware that Fathers' Day was just around the corner, and, well, to make a long story short I love my wife very much and I'm typing this on my new 13" MacBook.
Obviously I'm not the first Java programmer to realize that a Mac is pretty great Java environment. When I was at JavaOne, I was shocked at the number of Mac laptops being toted around -- it seemed like every other system was a Mac. But after setting my system up, installing Eclipse, and getting to work on it, I really don't understand how I managed to put up with Windows for so many years. My Mac runs all of the software I need, Java programs run as smooth as silk, and (despite having theoretically less power) it feels faster than my most powerful Windows system. I could spend all day babbling about all of the things I love about it, but one thing is certain: I'm not going back.
Oh, and the Windows laptop? I decided to leave it on for an extended period of time, and checked on it periodically. After eight hours, there was still no change, so I went to bed. When I woke up in the morning, I saw that it had actually managed to finish booting. So after somewhere between eight and sixteen hours of sitting there, it finally got to the desktop. I realize that this is pathological -- when your computer takes more than eight hours to boot, something is clearly screwed up and I can't just say "Ha! Windows sucks!", but I sure feel like saying exactly that. I was eventually able to fix it by disabling some startup items... but hey, I got a new Mac out of the deal. Thanks for dying, work laptop!
First things first: JAXX 1.0.1 is finally out. This version contains a lot of bugfixes and significant improvements to the quality and size of the generated Java code. Download it here.
Now that the major 1.0 bugs are fixed and a solid baseline has been established, I'm making plans for the future. Where is JAXX headed? What's next? I've posted a first pass at the JAXX roadmap and am seeking feedback. It's still early and subject to change, and there will be a lot more detail added as time goes on -- but I think it's a fairly decent stab at where things are headed.
I'm particularly excited about the addition of animation, based on Chet Haase's Timing Framework. The first animation features are going to be simple and straightforward, based on the current CSS pseudoclasses. Right now you can use a pseudoclass to, say, make a label turn blue when moused over:
<JLabel id='hoverLink' text='Mouse over to turn me blue'/>
This effect is applied instantly -- the second the mouse enters the label, it turns blue. The initial animation features will allow you to have effects like this be applied gradually over time:
<JLabel id='hoverLink' text='Mouse over to turn me blue'/>
Note the [duration=500ms] on the pseudoclass. There will be other properties for controlling acceleration, deceleration, and perhaps other features of the animation. This a simple change, to be sure, but I don't want to go overboard yet. More dramatic animation features will be added in good time.
What else is going on with JAXX? You'll have to take a look at the JAXX roadmap -- just be sure to let me know what you think!