Reading the News with Sun's RSS Utilities Blog

Version 2


    RSS is the de facto standard for web syndication, and there are several frameworks that allow Java developers to work with RSS. Sun even hosts its own RSS library, hosted as part of the SDN article "RSS Utilities: A Tutorial."

    Sun's RSS utilities were created by Rodrigo Oliveira under contract from Sun as a simple JSP tag library for the manipulation of RSS. For the technical folks out there who have been living in a hole for the past two years and only read news from a newspaper, RSS stands for "Really Simple Syndication" and is used for the distribution of newsfeeds and podcasts. RSS feeds are merely XML files based on a version of the RSS standard. ATOM is another standard that can also be used to produce feeds.

    RSS feed content is delivered over HTTP through a web server of choice. This is one of the most popular reasons for the success of RSS. Before RSS, news was delivered through other types of news servers, like a Usenet news server. Today, most everyone is comfortable with setting up their own web server and XML has become a juggernaut, driving the development of any kind of software imaginable since it was accepted into wide use. Its flexibility and extensibility make it hard to beat. In all likelihood, XML will be in widespread use when most of today's programming languages have been replaced by the next big thing.

    If you like reading the news, almost every website has a RSS news feed. All you need is an RSS aggregator client and you can aggregate content from all of your favorite sites into one interface for viewing and reading. No more surfing from site to site and digging to get your news fix. From to to and Slashdot, every site and its cousin has an RSS feed; just look for the RSS icon. There are hundreds of RSS reader applications out there. If you just want to get a quick start with RSS, Mozilla Firefox supports RSS bookmarks out of the box.

    Parsing a Feed

    Let's get started with Sun's RSS utilities and parse a feed or two. The most practical application of this would be if we wanted to create our own web-based RSS reader, but our options have no bounds.

    At this point, it would be fair to mention that the Sun's RSS utility is not a very robust library for the task at hand. It was intended to be a simple use tag library for parsing RSS feeds through a JSP page, using a set of custom JSP tags provided with the download. The RSS utilities were a side effect of the development that Oliveira did. Like any good Java developer, he applied the principles of reusability and encapsulation to separate out the lower-level RSS manipulations into a separate set of back-end utilities. Those utilities are what we are going to look at here, along with the original intended uses of the library. This library was intended for novice Java or JSP developers or web developers who want to parse feeds in a simple way. We are going to look at this library and a few examples. More information on the usage can be found in the tutorialon the Sun website. That being said, if you're a more experienced developer and you are looking a more robust solution, we recommend that after looking at this article and tutorial, you examine the RSS library ROME.

    To use Sun's RSS utility, the first thing we have to do is download the libraryand put it on our classpath. Let's look at how we get started using the library programmatically. For the purpose of demonstration, we'll parse one of the CNN news feeds for world news,

    public void readRSSDocument() throws Exception{ //Create the parser RssParser parser = RssParserFactory.createDefault(); //Parse our url Rss rss = parser.parse ( new URL("")); }

    We've now parsed the RSS feed using the RssParser. Now we can use the Rss object to access the elements of the RSS XML document. The composition of the RSS feed is represented as it would be in XML: as a hierarchy of elements. The RSS utilities are provided with a JUnit testing class (RssTest). We are going to use an except from this class to test parsing the elements from our feed.

    public void readRSSDocument() throws Exception{ RssParser parser = RssParserFactory.createDefault(); Rss rss = parser.parse( new URL("")); //Get all XML elements in the feed Collection items = rss.getChannel().getItems(); if(items != null && !items.isEmpty()) { //Iterate over our main elements. Should have one for each article for(Iterator i = items.iterator(); i.hasNext(); System.out.println()) { Item item = (Item); System.out.println("Title: " + item.getTitle()); System.out.println("Link: " + item.getLink()); System.out.println("Description: " + item.getDescription()); } } //Iterate over categories if we are provided with any Collection categories = rss.getChannel().getCategories(); if(categories != null && !categories.isEmpty()) { Category cat; for(Iterator i = categories.iterator(); i.hasNext(); System.out.println("Category Domain: " + cat.getDomain())) { cat = (Category); System.out.println("Category: " + cat); } } }

    Now we are doing a little more work. The output of the method will reveal our news in a human-readable format (note that when parsing the world news feed from, we don't get any categories, because CNN divides categories into separate feeds). The following is just an excerpt from what the actual output is.

    Title: Al Qaeda video condemns cartoons Link: Description: In a video aired Saturday, Osama bin Laden's deputy Ayman al-Zawahiri, complimented the Islamic militant group Hamas on its Palestinian election victory and spoke against published Prophet Mohammed cartoons that have sparked violence throughout the Muslim world. The cartoons, al-Zawahiri said, are a continuation of the "crusaders' war" against Islam. Title: Iran issues warning on uranium enrichment Link: Description: Iran will resume large-scale nuclear enrichment if the International Atomic Energy Agency board of governors refers the Islamic Republic to the U.N. Security Council, the country's chief nuclear negotiator said Sunday.

    As you can see, using this simple parsing mechanism, we could output the elements of the feed in any method we choose. For example, we might create a Swing application and allow users to add feeds and display them in a Swing panel. We could add a component to our website that displays this information in a web page. Giving users an RSS aggregation feature for their user account on our example website would keep them coming back more often. The library might be simple, but the application could form the basis of something much more grandiose. We are limited only by our imagination here.

    We definitely need to provide some validation for parsing our feeds. Assume that no two feeds are alike and they may not be properly formed. The parser itself will give some assurances since it will throw RssParserException in certain cases (such as if the server is down). Then again, it could be the case that the feed is provided, but might be missing elements we can't live without. Nothing would be more frustrating for our users than to read a headline, want more, and find out feed doesn't provide the URL or description. We'll take a cue from Oliveira and do a little checking of our feed after it is parsed in a unit test. Here we check to make sure certain key elements of the feed are not missing.

    Assert.assertNotNull(rss); Assert.assertNotNull(rss.getChannel()); Assert.assertNotNull(rss.getChannel().getTitle()); Assert.assertNotNull(rss.getChannel().getLink()); Assert.assertNotNull(rss.getChannel().getDescription());

    Using the Library as a Tag Library

    The meat of the tutorial written by Oliveira is geared toward the application of his utility as a tag library, so let's see how we can apply the library in that fashion. We'll assume you can read the tutorial or you know how to use a JSP tag library and are ready to begin. To get started using the tag library, we need to tell it where to find our feed.

    <rss:feed url="" feedId="cnnworld" proxyAddress="host" proxyPort="80"/>

    If you don't have a proxy server, leave out the proxy information from this tag. The feedId attribute will be used to access elements of our feed later on. This is how we could handle multiple feeds in a single page. Remember, though, the more grandiose application of the utilities we talked about earlier. We could, for example, display the top story from each feed a user has stored in our RSS client on their home page. Let's assume we are learning and are just going to deal with a single feed for now. Still, pay attention to the use of thefeedId element.

     <rss:feed url="" feedId="cnnworld"/> <b>Image: </b><rss:channelImage feedId="cnnworld"/><br> <b>Title: </b><rss:channelTitle feedId="cnnworld"/><br> <b>Link: </b><rss:channelLink feedId="cnnworld" asLink="true"/><br> <b>Description: </b><rss:channelDescription feedId="cnnworld"/><br> <b>Copyright: </b><rss:channelCopyright feedId="cnnworld"/><br> <b>Docs: </b><rss:channelDocs feedId="cnnworld"/><br> <b>Generator: </b><rss:channelGenerator feedId="cnnworld"/><br> <b>Language: </b><rss:channelLanguage feedId="cnnworld"/><br> <b>Last Build Date: </b><rss:channelLastBuildDate feedId="cnnworld"/><br> <b>Managing Editor: </b><rss:channelManagingEditor feedId="cnnworld"/><br> <b>Pub Date: </b><rss:channelPubDate feedId="cnnworld"/><br> <b>Skip Days: </b><rss:channelSkipDays feedId="cnnworld"/><br> <b>Skip Hours: </b><rss:channelSkipHours feedId="cnnworld"/><br> <b>TTL: </b><rss:channelTTL feedId="cnnworld"/><br> <ul> <rss:forEachItem feedId="cnnworld" startIndex="1" endIndex="4"> <li><rss:itemDescription feedId="cnnworld"/><br><br> </rss:forEachItem> </ul>

    This is an example of parsing an RSS 2.0 feed. It is a pretty exciting and robust example. Note that the articles are printed out starting with the first article and ending at the fourth. RSS feeds can get lengthy, depending on how many articles are included. We might want to limit the display to only four articles for each feed we parse. The tutorial provided by Oliveira shows examples of three versions of RSS feeds using the parser.


    There is no doubt that Sun's RSS utilities are easy to use. We were up and going with this example in minutes. The only drawback to this library is the inability to create a new feed of our own; we can only manipulate other feeds. If all you are looking for is a reader, then this library may be a good library for your application, but with everyone wanting to have their own RSS feed, this may not provide enough functionality. Let's take our website example. We'd create an RSS feed aggregator for our users, but with no RSS feeds of our own. This might make sense, considering we could always display our site's news without generating RSS, since our users are coming to our site for RSS aggregation. It would also be very simple to create an RSS feed from a JSP page that generates XML, of course, but supporting the different versions of RSS could be difficult. If development on this tool continues, we'd expect for future releases to be more robust and allow greater flexibility. The tool is fairly new and is only at version 1.1.

    Sun and Oliveira have given us the tools we need to start parsing RSS feeds with very little ramp-up time. Only time will tell which of the RSS libraries will emerge as the most dominant out there. Some of the other libraries include the aforementioned ROME, RSS4J, and RSSLibJ, so you might stick with one or you might just mix and match depending on your needs; after all, just because there is more than one way to "skin a cat," doesn't mean they are all mutually exclusive.