Java Sketchbook: The HTML Renderer Shootout, Part 1 Blog

Version 2
    " hrefaction="pub">

              
                                 

    Contents
    What Features Are We Looking For?
    The Types of Renderers
    Free HTML Renderers
       The Swing HTMLEditorKit
       JRex
       Multivalent
       Jazilla
       CalPane
    Conclusions

    The last ten years of software development have seen the rise of the Internet and open standards, most prominently HTML. To most non-technical people, web pages (just HTML over a TCP/IP connection) are The Internet. And now HTML is so pervasive, its usefulness outweighing its flaws, that we find it in many applications that aren't strictly web browsers. Chat programs, help files, and even a certain online music store are all built on top of the flexibility and ubiquity of HTML.

    There's just one problem with HTML, though. To display it, you need some kind of web browser, usually a component that actually renders the HTML on screen. For Java developers, a good HTML renderer can be hard to come by. The built-in viewer, the Swing HTMLEditorKit, is quite lacking, and there aren't many alternatives. However, the situation isn't as bleak as you might think; there are other renderers out there, we just have to look harder. In this article, we will review 11 different HTML renderers, comparing their features, compliance, and speed; searching for the best one for any project. Part one will consider free (as in either "speech" or "beer") products, while part two will consider licensed commercial offerings.

    What Features Are We Looking For?

    When deciding how to rate each renderer, we should consider why we need one. What do we need to do with it? HTML is essentially styled text and images, loaded over a network, with hyperlinks. Java, often called the networked programming language, works with all sorts of network components, including URLs, quite easily. So the key point we are lacking is styled text. HTML (and by HTML I mean HTML, XHTML, and CSS 1/2) has become the standard for styled text. And it's everywhere.

    As processor speed and display quality have increased during the last ten years, more and more applications have some form of styled text in them, either for editing or display. A quick look through my Start menu turns up the following: Outlook (HTML email and the "Outlook Today" screen), Media Player (advertising and shortcuts), iTunes (the music store), File Explorer (the stylized sidebar), Trillian chat (for message display), the Address Book, Microsoft Office (Word, Excel, and PowerPoint), and the Palm Desktop. This list doesn't even include the styled wizard text and help files for virtually every application on my computer. These are all programs that don't really have anything to do with HTML. If we count programs that in some way edit or produce HTML, then I've got my editor, jEdit, Photoshop, Flash, and iPhoto. The common thread between all of these is that they have styled text that could be (and often is) HTML.

    The other thing these programs have in common is that they don't view normal pages on the Web. They each have specific functions, and the HTML they use is tailored to that function. The browser in iTunes only has to display the HTML coming out of Apple's music database, not the HTML of the average broken web page out there. For that reason the first criteria for our roundup will be "an adherence to standards, as modern standards as possible." This principally means full XHTML support with as much of CSS1 and CSS2 as possible. We want to use fewer table hacks and moredivs with style. Being able to show malformed HTML on the Web is nice, but not essential. The most important thing is that we can get attractive display using standard mechanisms. To test this, we will run the browsers against an XHTML and CSS2 site known to push the envelope while being compliant, the CSS Zen Garden, a showcase for the possibilities of CSS-only style. To measure compliance with older web sites (which may be required for some applications), will we also run against the front pages of Amazon and Slashdot, since these are two heavily used web sites with a good mixture of text and graphics.

    Each product we survey will have figures showing how these three sites are rendered. The small figure shown in the page links to an image of a full-size browser window, so you can get a complete picture of how the browser handles layout, images, blocks of text, etc.

    Next, we care about speed. A lot of our non-traditional uses for HTML only require small portions of pages (such as a chat program's message display) but speed still matters. It's especially important for larger text blocks such as help files and book readers. To test speed we will use a copy of Shakespeare's Hamlet (from ClassicReader.com) formatted as one gigantic HTML file. The styling is simple, but it's a large file (over 10,000 lines) to parse into memory and scroll.

    We won't test JavaScript, Flash, or applets because most of our embedded browsers give us direct programmatic control from Java. Plus, the back end for the HTML is often our application itself, which reduces the need for validation or content generation. Some of the browsers below do support JavaScript, though, and I will make a mention when they do. More important is how hackable they are. How much can we control or change from the Java side? Can we capture click events or trigger pop-up menus? Can we extend the rendering at all? This will all be under the heading "Hackability."

    The final condition for this article is that there must a freely downloadable demo. Some of the commercial products we'll see in part two come with licensing fees, but they all have something you can download right now and try out. I've also added the condition that there must have been some update to the package, or at least the web page, in the last year. There are a lot of dead projects with questionable status out there that we want to avoid.

    The Types of Renderers

    There are two types of HTML renderers: 100 percent Javaand native wrappers. The 100-percent-Java renderers are just what they sound like, HTML renderers written completely in Java without calls to any native libraries. They have the advantage of being portable to almost anywhere, depending only on the standard JRE libraries (usually Swing). The second type are actually wrappers to a native platform web browser like Internet Explorer or Mozilla. They have the advantage of using a fast and reliable browser that can handle virtually any HTML you throw at it. The downside is that you may be tied to one platform and there is less opportunity for hacking the display from Java. Plus, loading a full web browser may be overkill (and slow) for something like a chat program.

    The license is another a distinguishing feature between these renderers. Some are open source or at least available for no cost. Some are free for non-commercial use, and some require licensing fees. Depending on your needs, one type may be preferable to another, so be sure to read the actual license before you decide.

    On the rendering tests, we will use a recent build of Mozilla Firebird as our control program. Figures 1, 2, and 3 show Mozilla viewing Slashdot, Amazon, and The ZenGarden.

    Click for larger view
    Figure 1. Amazon in Mozilla (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 2. Slashdot in Mozilla (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 3. CSS Zen Garden in Mozilla (You can click on the screen shot to open a full-size view.)

    Free HTML Renderers


    The Swing HTMLEditorKit

    Company: Sun Microsystems
    License: Part of the standard JRE
    URL: java.sun.com/j2se/1.4.2/docs/api/index.html
    Type: 100 percent Java

    Our first renderer is the venerable Swing HTMLEditorKit. Though it has a bad rap, it a lot better than it used to be. Recent revisions (I tested using the Java 1.4.2 JDK) have added preliminary XHTML and CSS support, though it still fails on a lot of complicated web sites. Since it's just a subclass of JEditorPane, it can integrate easily with any application, and the use of Views and Documents from javax.swing.text gives it a high hackability factor. Most importantly, it's included with every Java Runtime, so you can depend on it being there. Its one downside is that while you can view the source and modify it for your own use, you can't recompile it and distribute it to others along with your application. I'm not a big fan of the idea that we need to open source Java, but I do think that there would be a lot to gain from open sourcing the HTML component (or perhaps all of Swing).

    Here's how our three tested pages look with the HTMLEditorKit (Figures 4, 5, and 6).

    Amazon in HTMLEditorKit
    Figure 4. Amazon in HTMLEditorKit

    Slashdot in HTMLEditorKit
    Figure 5. Slashdot in HTMLEditorKit

    CSS Zen Garden in HTMLEditorKit
    Figure 6. CSS Zen Garden in HTMLEditorKit

    Not too bad. The HTMLEditorKit clearly has some issues with horizontal tables, but it's passable. There is almost no modern CSS support, but it shows the degraded version of the Zen Garden properly (the @import hack notwithstanding). If you use it, be sure to call setEditable(false) on your JEditorPane, or else all of the script tags will be visible. Speedwise, the HTMLEditorKit pulled up Hamlet in about one second, no slower than Mozilla, so it's pretty speedy with large amounts of text, at least.

    All in all, I would say that the HTMLEditorKit's presence in the JRE trumps its failings, and if you can work around its CSS bugs, then use it. It's probably best used in applications with simple styling, such as chat programs or help windows. I wouldn't use it for web previews or anything where you want lots of graphics or tricky alignment.

    Modern Compliance: Virtually none
    Legacy Web: Passable
    JavaScript: None
    Hackability: Lots
    Speed: Pretty good


    JRex

    Company: MozDev
    License: Mozilla Public License
    URL: jrex.mozdev.org
    Type: Native Wrapper

    JRex is a complete wrapper for Mozilla. It is still very much under development, but shows real potential. I was not able to get it to work with Mozilla Firebird, but it worked flawlessly with Mozilla 1.4. I'm guessing that this is just a version issue and hopefully will be worked out soon. Since it uses Mozilla underneath, the rendering and JavaScript support is perfect. Plugins are also supported except, strangely, the Java Plugin for Windows.

    In terms of hackability, JRex stacks up pretty well. There are APIs to receive events and direct DOM access is under development. Since this is Mozilla, we also get support for XUL, which may be useful for some developers. My only real complaints are the problems dealing with version issues, and lack of a way to auto-detect an existing installation of Mozilla. However, since you can simply include an entire copy of Mozilla (about 5MB of DLLs and binaries) with your application, this may not be as much of an issue.

    For people who need to embed a true browser into an application, either for general websurfing or proofing in a dev tool, I recommend JRex. And since it's still a work in progress, if you are an open source developer looking for a project to contribute to, this is one to consider. In particular, one of the leaders mentioned wanting contributors with "knowledge of XPCOM, SWING/AWT, and JNI." He also said that "knowing JUNIT would be an added advantage."

    Figures 7, 8, and 9 show JRex's handling of our sample sites:

    Click for larger view
    Figure 7. Amazon in JRex (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 8. Slashdot in JRex (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 9. CSS Zen Garden in JRex (You can click on the screen shot to open a full-size view.)

    Modern Compliance: Excellent
    Legacy Web: Excellent
    JavaScript: Excellent
    Hackability: Pretty good
    Speed: Excellent


    Multivalent

    Company: UC Berkeley's Digital Library Project
    License: Open source (GPL)
    URL: multivalent.sourceforge.net
    Type: 100 percent Java

    Multivalent is an interesting research web browser. Meant primarily for browsing documentation, its HTML features are a bit behind. It rendered Amazon pretty well, but showed only the unstyled version of the Zen Garden. It loaded Hamlet reasonably fast, but nothing spectacular. Strangely, I couldn't get it to load Slashdot. I kept getting GZip errors, but that may stem from some strange headers on Slashdot's front page. Multivalent supports complete visual and behavioral customization. Plus, since it's open source, you can always start banging on the code. It does have some interesting features, such as lenses for magnifying the screen, full text searching, on-screen annotation, PDF support, on-the-fly decompression, and a speed-reading mode.

    See Figures 10, 11, and 12 for a look at Multivalent.

    Click for larger view
    Figure 10. Amazon in Multivalent (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 11. Slashdot in Multivalent (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 12. CSS Zen Garden in Multivalent (You can click on the screen shot to open a full-size view.)

    Modern Compliance: Virtually none
    Legacy Web: Poor
    JavaScript: None
    Hackability: Good
    Speed: Good


    Jazilla

    Company: Matt McBride
    License: Open source (MPL)
    URL: jazilla.mcbridematt.dhs.org/
    Type: 100 percent Java

    Jazilla is a resurrection of the Javagator, Netscape's Navigator-in-Java project started before they open sourced Mozilla in 1998. Speed is poor, and the rendering for general web sites is almost unusable. Since it's based on so much legacy code, it will probably never support modern features such as CSS2. Still, it can be useful for certain things, especially where a small-footprint browser is required (such as a chat application).

    Figure 13, 14, and 15 show Jazilla in action (or, perhaps, Jazilla inaction).

    Click for larger view
    Figure 13. Amazon in Jazilla (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 14. Slashdot in Jazilla (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 15. CSS Zen Garden in Jazilla (You can click on the screen shot to open a full-size view.)

    Modern Compliance: Poor
    Legacy Web: Poor
    JavaScript: None
    Hackability: Some
    Speed: Slow


    CalPane

    Company: Andrew Moulden
    License: Free for non-commercial and some commercial apps.
    URL: www.netcomuk.co.uk/~offshore/index.html
    Type: 100 percent Java

    CalPane is an older browser without JavaScript support, but it can render legacy HTML fairly well. As you can see in the screenshots below, both Amazon and Slashdot render pretty well, though the lack of anti-aliasing is especially apparent on Slashdot. It doesn't support CSS at all, though it does degrade properly. This also highlights the principle of using CSS properly so that sites are still usable without it.

    As far as speed goes, it is pretty snappy on pages it supports. There is support for event callbacks, and you can override certain features such as how images are loaded, but there isn't too much hackability. In the long run, the lack of CSS and XHTML means that more and more sites will fail in CalPane. The greatest problem with CalPane is that its site doesn't appear to have been updated since 2002. I bent my own rule and included it in this roundup because the renderer is perfectly usable in its current state, as seen in Figures 16, 17, and 18.

    Click for larger view
    Figure 16. Amazon in CalPane (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 17. Slashdot in CalPane (You can click on the screen shot to open a full-size view.)

    Click for larger view
    Figure 18. CSS Zen Garden in CalPane (You can click on the screen shot to open a full-size view.)

    Modern Compliance: None
    Legacy Web: Decent
    JavaScript: None
    Hackability: Decent
    Speed: Good

    Conclusions

    Overall, our choices are a very mixed bag. JRex offers high compliance and speed, but requires integration with native code. The 100-percent-Java renderers have little support for modern standards, but some (Calpane and Multivalent) can at least render some popular pages accurately.

    In part two of this series, we'll take a look at what commercial HTML renderers can do, and we'll collect some other renderers that didn't make the cut for this survey but might yet find their place.

      
    http://today.java.net/im/a.gif