The last ten years of software development have seen the rise of the Internet and open standards, most prominently HTML. To most non-technical people, web pages (just HTML over a TCP/IP connection) are The Internet. And now HTML is so pervasive, its usefulness outweighing its flaws, that we find it in many applications that aren't strictly web browsers. Chat programs, help files, and even a certain online music store are all built on top of the flexibility and ubiquity of HTML.
There's just one problem with HTML, though. To display it, you need some kind of web browser, usually a component that actually renders the HTML on screen. For Java developers, a good HTML renderer can be hard to come by. The built-in viewer, the Swing HTMLEditorKit, is quite lacking, and there aren't many alternatives. However, the situation isn't as bleak as you might think; there are other renderers out there, we just have to look harder. In this article, we will review 11 different HTML renderers, comparing their features, compliance, and speed; searching for the best one for any project. Part one will consider free (as in either "speech" or "beer") products, while part two will consider licensed commercial offerings.
What Features Are We Looking For?
When deciding how to rate each renderer, we should consider why we need one. What do we need to do with it? HTML is essentially styled text and images, loaded over a network, with hyperlinks. Java, often called the networked programming language, works with all sorts of network components, including URLs, quite easily. So the key point we are lacking is styled text. HTML (and by HTML I mean HTML, XHTML, and CSS 1/2) has become the standard for styled text. And it's everywhere.
As processor speed and display quality have increased during the last ten years, more and more applications have some form of styled text in them, either for editing or display. A quick look through my Start menu turns up the following: Outlook (HTML email and the "Outlook Today" screen), Media Player (advertising and shortcuts), iTunes (the music store), File Explorer (the stylized sidebar), Trillian chat (for message display), the Address Book, Microsoft Office (Word, Excel, and PowerPoint), and the Palm Desktop. This list doesn't even include the styled wizard text and help files for virtually every application on my computer. These are all programs that don't really have anything to do with HTML. If we count programs that in some way edit or produce HTML, then I've got my editor, jEdit, Photoshop, Flash, and iPhoto. The common thread between all of these is that they have styled text that could be (and often is) HTML.
The other thing these programs have in common is that they don't view normal pages on the Web. They each have specific functions, and the HTML they use is tailored to that function. The browser in iTunes only has to display the HTML coming out of Apple's music database, not the HTML of the average broken web page out there. For that reason the first criteria for our roundup will be "an adherence to standards, as modern standards as possible." This principally means full XHTML support with as much of CSS1 and CSS2 as possible. We want to use fewer table hacks and more
divs with style. Being able to show malformed HTML on the Web is nice, but not essential. The most important thing is that we can get attractive display using standard mechanisms. To test this, we will run the browsers against an XHTML and CSS2 site known to push the envelope while being compliant, the CSS Zen Garden, a showcase for the possibilities of CSS-only style. To measure compliance with older web sites (which may be required for some applications), will we also run against the front pages of Amazon and Slashdot, since these are two heavily used web sites with a good mixture of text and graphics.
Each product we survey will have figures showing how these three sites are rendered. The small figure shown in the page links to an image of a full-size browser window, so you can get a complete picture of how the browser handles layout, images, blocks of text, etc.
Next, we care about speed. A lot of our non-traditional uses for HTML only require small portions of pages (such as a chat program's message display) but speed still matters. It's especially important for larger text blocks such as help files and book readers. To test speed we will use a copy of Shakespeare's Hamlet (from ClassicReader.com) formatted as one gigantic HTML file. The styling is simple, but it's a large file (over 10,000 lines) to parse into memory and scroll.
The final condition for this article is that there must a freely downloadable demo. Some of the commercial products we'll see in part two come with licensing fees, but they all have something you can download right now and try out. I've also added the condition that there must have been some update to the package, or at least the web page, in the last year. There are a lot of dead projects with questionable status out there that we want to avoid.
The Types of Renderers
There are two types of HTML renderers: 100 percent Javaand native wrappers. The 100-percent-Java renderers are just what they sound like, HTML renderers written completely in Java without calls to any native libraries. They have the advantage of being portable to almost anywhere, depending only on the standard JRE libraries (usually Swing). The second type are actually wrappers to a native platform web browser like Internet Explorer or Mozilla. They have the advantage of using a fast and reliable browser that can handle virtually any HTML you throw at it. The downside is that you may be tied to one platform and there is less opportunity for hacking the display from Java. Plus, loading a full web browser may be overkill (and slow) for something like a chat program.
The license is another a distinguishing feature between these renderers. Some are open source or at least available for no cost. Some are free for non-commercial use, and some require licensing fees. Depending on your needs, one type may be preferable to another, so be sure to read the actual license before you decide.
On the rendering tests, we will use a recent build of Mozilla Firebird as our control program. Figures 1, 2, and 3 show Mozilla viewing Slashdot, Amazon, and The ZenGarden.
Free HTML Renderers
The Swing HTMLEditorKit
Company: Sun Microsystems
License: Part of the standard JRE
Type: 100 percent Java
Our first renderer is the venerable Swing HTMLEditorKit. Though it has a bad rap, it a lot better than it used to be. Recent revisions (I tested using the Java 1.4.2 JDK) have added preliminary XHTML and CSS support, though it still fails on a lot of complicated web sites. Since it's just a subclass of JEditorPane, it can integrate easily with any application, and the use of Views and Documents from
javax.swing.text gives it a high hackability factor. Most importantly, it's included with every Java Runtime, so you can depend on it being there. Its one downside is that while you can view the source and modify it for your own use, you can't recompile it and distribute it to others along with your application. I'm not a big fan of the idea that we need to open source Java, but I do think that there would be a lot to gain from open sourcing the HTML component (or perhaps all of Swing).
Here's how our three tested pages look with the HTMLEditorKit (Figures 4, 5, and 6).
Not too bad. The HTMLEditorKit clearly has some issues with horizontal tables, but it's passable. There is almost no modern CSS support, but it shows the degraded version of the Zen Garden properly (the
@import hack notwithstanding). If you use it, be sure to call
setEditable(false) on your JEditorPane, or else all of the script tags will be visible. Speedwise, the HTMLEditorKit pulled up Hamlet in about one second, no slower than Mozilla, so it's pretty speedy with large amounts of text, at least.
All in all, I would say that the HTMLEditorKit's presence in the JRE trumps its failings, and if you can work around its CSS bugs, then use it. It's probably best used in applications with simple styling, such as chat programs or help windows. I wouldn't use it for web previews or anything where you want lots of graphics or tricky alignment.
Modern Compliance: Virtually none
Legacy Web: Passable
Speed: Pretty good
License: Mozilla Public License
Type: Native Wrapper
In terms of hackability, JRex stacks up pretty well. There are APIs to receive events and direct DOM access is under development. Since this is Mozilla, we also get support for XUL, which may be useful for some developers. My only real complaints are the problems dealing with version issues, and lack of a way to auto-detect an existing installation of Mozilla. However, since you can simply include an entire copy of Mozilla (about 5MB of DLLs and binaries) with your application, this may not be as much of an issue.
For people who need to embed a true browser into an application, either for general websurfing or proofing in a dev tool, I recommend JRex. And since it's still a work in progress, if you are an open source developer looking for a project to contribute to, this is one to consider. In particular, one of the leaders mentioned wanting contributors with "knowledge of XPCOM, SWING/AWT, and JNI." He also said that "knowing JUNIT would be an added advantage."
Figures 7, 8, and 9 show JRex's handling of our sample sites:
Modern Compliance: Excellent
Legacy Web: Excellent
Hackability: Pretty good
Company: UC Berkeley's Digital Library Project
License: Open source (GPL)
Type: 100 percent Java
Multivalent is an interesting research web browser. Meant primarily for browsing documentation, its HTML features are a bit behind. It rendered Amazon pretty well, but showed only the unstyled version of the Zen Garden. It loaded Hamlet reasonably fast, but nothing spectacular. Strangely, I couldn't get it to load Slashdot. I kept getting GZip errors, but that may stem from some strange headers on Slashdot's front page. Multivalent supports complete visual and behavioral customization. Plus, since it's open source, you can always start banging on the code. It does have some interesting features, such as lenses for magnifying the screen, full text searching, on-screen annotation, PDF support, on-the-fly decompression, and a speed-reading mode.
See Figures 10, 11, and 12 for a look at Multivalent.
Modern Compliance: Virtually none
Legacy Web: Poor
Company: Matt McBride
License: Open source (MPL)
Type: 100 percent Java
Jazilla is a resurrection of the Javagator, Netscape's Navigator-in-Java project started before they open sourced Mozilla in 1998. Speed is poor, and the rendering for general web sites is almost unusable. Since it's based on so much legacy code, it will probably never support modern features such as CSS2. Still, it can be useful for certain things, especially where a small-footprint browser is required (such as a chat application).
Figure 13, 14, and 15 show Jazilla in action (or, perhaps, Jazilla inaction).
Modern Compliance: Poor
Legacy Web: Poor
Company: Andrew Moulden
License: Free for non-commercial and some commercial apps.
Type: 100 percent Java
As far as speed goes, it is pretty snappy on pages it supports. There is support for event callbacks, and you can override certain features such as how images are loaded, but there isn't too much hackability. In the long run, the lack of CSS and XHTML means that more and more sites will fail in CalPane. The greatest problem with CalPane is that its site doesn't appear to have been updated since 2002. I bent my own rule and included it in this roundup because the renderer is perfectly usable in its current state, as seen in Figures 16, 17, and 18.
Modern Compliance: None
Legacy Web: Decent
Overall, our choices are a very mixed bag. JRex offers high compliance and speed, but requires integration with native code. The 100-percent-Java renderers have little support for modern standards, but some (Calpane and Multivalent) can at least render some popular pages accurately.
In part two of this series, we'll take a look at what commercial HTML renderers can do, and we'll collect some other renderers that didn't make the cut for this survey but might yet find their place.