Learn about JavaFX's APIs for Reading RSS and Atom Newsfeeds Blog

Version 2

    {cs.r.title}



    JavaFX 1.2 introduced many interesting APIs, including APIs for reading RSS and Atom newsfeeds. If you haven't worked with these APIs, you'll discover that they greatly simplify the task of integrating a newsfeed reader into a JavaFX application.

    This article introduces you to the RSS and Atom APIs. You first explore their common foundation, and then tour each API's key classes. Finally, you gain insight into how these APIs work by exploring the FeedTask class's newsfeed-polling implementation.

    Common Foundation

    The RSS and Atom APIs are offshoots of a common foundation that's rooted in the abstract javafx.async.Task class. This class makes it possible to start, stop, and track an activity (task) that runs on a background thread.

    Task provides onStart andonDone variables that identify functions to be invoked at the start/end of the task, and other variables that report task progress and disposition (success or failure). This class also provides abstract start(): Void and stop(): Void functions to initiate and terminate task execution.

    The abstract javafx.data.feed.FeedTask class extends Task. In addition to inheritingTask's variables, and overriding itsstart() and stop() functions,FeedTask provides the following functions and variables:

    • poll(): Void: Poll the newsfeedlocation for updated content, which is fetched, parsed, and delivered to the application.
    • update(): Void: Poll the newsfeedlocation. All content is fetched, parsed, and delivered to the application.
    • headers (of typejavafx.io.http.HttpHeader[]) identifies a sequence of HTTP request headers that are to be sent to locationeach time this newfeed is polled. This variable defaults tonull.
    • interval (of typejavafx.lang.Duration) specifies the amount of time that must elapse before the newsfeed is once more polled for updates. You must specify a positive value for this variable, which defaults to 0.0. (I wonder if it wouldn't be better to choose a positive value, such as 60s, to be the polling default, and perhaps allow 0.0 to indicate that polling isn't desired.)
    • location (of type String) specifies the newsfeed's address. This variable defaults to the empty string ("").
    • onException (of typefunction(:Exception):Void) identifies a function that's invoked when an exception occurs during the current poll. This variable defaults to null.
    • onForeignEvent (of typefunction(:javafx.data.pull.Event):Void) identifies a function that's invoked to handle extension elements, which are newsfeed elements whose namespace URI is not Atom or RSS. For example, given an Atom newsfeed whose feedelement's start tag is specified as <feed xmlns="http://www.w3.org/2005/Atom" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">, parsing a subsequent<opensearch:totalResults>1911</opensearch:totalResults>element results in three foreign events (for the start tag, text, and end tag) because the namespace for totalResults ishttp://a9.com/-/spec/opensearch/1.1/ (as specified by the opensearch: prefix) instead ofhttp://www.w3.org/2005/Atom. This variable defaults tonull.

    The common foundation is also rooted in the abstractjavafx.data.feed.Base class, which is the base class for RSS and Atom classes that describe various newsfeed elements. RSS's RSS and Atom's Feed top-level element classes are examples of Base subclasses.

    Base provides a namespaces variable (of type javafx.data.Pair[]) that contains the namespace definitions in effect for the element. Thename member of each Pair specifies the namespace prefix; the value member specifies the namespace URI.

    Base also provides a parent variable (of type Base) that identifies the parent (enclosing) element. For example, the parent variable of Atom'sEntry element class refers to its containingFeed instance. If there's no parent (as is the case with Feed), this variable containsnull.

    Finally, Base provides several functions that are useful when you need to create a custom feed parser. Because this task is beyond the scope of this article, I refer you to Rakesh Menon's Custom Feed Parsers blog post for more information and an example.

    RSS API Overview

    The RSS (Resource Description Framework Site Summary, Really Simple Syndication, Rich Site Summary) API consists of 10 classes that are located in thejavafx.data.feed.rss package. Central to this package is the RssTask class.

         
    RSS versions supported by the API
    The RSS API handles newsfeeds that conform to versions 0.91 (with non-optional item elements) through 2.0.11 (the most recent version at time of writing) of the RSS specification.

    The RssTask entry-point class extendsFeedTask, and provides the following variables for installing a custom factory, for reporting the newsfeed'schannel element's non-itemcontent, and for reporting the content of each of thechannel element's itemelements:

    • factory (of type Factory) identifies the factory that's used to create objects that represent newsfeed elements. You only need to install your own factory when creating acustom feed parser.
    • onChannel (of typefunction(:Channel):Void) identifies a function that's invoked to report the channel element's non-item elements -- the RSSchannel element contains item and non-item elements, and is itself contained within the top-level rss element. This variable defaults to null.
    • onItem (of type function(:Item):Void) identifies a function that's invoked to report the currentitem element. This variable defaults tonull.

    The Channel class extends the abstractRSS class, which represents the top-levelrss element, and which provides members for accessing the factory that's creating objects, for accessing the task that's parsing the newsfeed, and more. In turn,RSS extends Base.

    Channel also provides the following variables for accessing channel-oriented (non-item-specific) content:

    • categories (of type Category[]) identifies the categories (in terms of domains and text values) to which this channel belongs.
    • copyright (of type String) specifies a copyright notice for channel content.
    • description (of type String) presents a phrase or sentence that describes thischannel.
    • docs (of type String) specifies a URL that points to documentation for the format used in the RSS file. This might simply be a pointer to a Web page, and is useful for letting people, who encounter this RSS file in the future, understand the file's purpose (much like code comments).
    • generator (of type String) identifies the program that was used to generate thischannel.
    • image (of type Image) identifies an image (in terms of description, height, link, title, URL, and width) that can be displayed with the channelcontent.
    • language (of type String) identifies the language in which the channel was written.
    • lastBuildDate (of typejavafx.date.DateTime) specifies when thischannel's content was last changed.
    • link (of type String) provides the URL to the Website that corresponds to thischannel.
    • pubDate (of type DateTime) identifies the date when this channel was published.
    • title (of type String) provides thischannel's title.
    • ttl (of type Duration) provides the number of minutes in which the news-reader can cache thischannel before it must poll the newsfeed to refresh channel content.
         
    Unsupportedchannel elements
    For whatever reason, the RSS API doesn't support thechannel element's cloud,textInput, skipHours, andskipDays elements. These elements are not represented by javafx.data.xml.QName constants in theRSS class, and they are not represented by variables in the Channel class.

    As with Channel, the Item class, which describes one of the channel'sitem elements, extends RSS. It provides the following variables:

    • author (of type String) provides the email address of this item's author.
    • categories (of type Category[]) identifies the categories to which this itembelongs.
    • comments (of type String) specifies the URL of a Web page containing comments about thisitem.
    • description (of type String) provides a description of this item.
    • enclosure (of type Enclosure) describes a media object (in terms of length, MIME type, and URL) that's attached to this item.
    • guid (of type Guid) specifies, for this item, a globally unique identifier (in terms of text and an indicator of whether or not this text permanently points to the full item described by thisitem).
    • link (of type String) provides thisitem's URL.
    • pubDate (of type DateTime) identifies the date when this item was published.
    • source (of type Source) identifies the originating channel (in terms of the name of the channel and an XMLization of thatchannel) for this item.
    • title (of type String) provides thisitem's title.

    I've created a NetBeans RSSDemo project whoseMain.fx source code demonstrates RssTaskin terms of its interval, location,onStart, onChannel, onItem,onException, onForeignEvent, andonDone variables.

    /* * Main.fx */ package rssdemo; import java.lang.Exception; import javafx.data.feed.rss.Channel; import javafx.data.feed.rss.Item; import javafx.data.feed.rss.RssTask; import javafx.data.pull.Event; def MAX_POLLS = 3; var counter = 0; def task:RssTask = RssTask { interval: 15s // The following location demonstrates a basic RSS newsfeed. location: "http://javajeff.mb.ca/rss/javajeff.xml" // The following location demonstrates onException(). // location: "http://developers.sun.com/rss/sdn_features.xml" // The following location demonstrates onForeignEvent(). // location: "http://feeds.dzone.com/javalobby/frontpage?format=xml" // The following location demonstrates IllegalArgumentException (must use // AtomTask for Atom feeds). // location: "http://feeds.sophos.com/en/atom1_0-sophos-company-news.xml" onStart: function (): Void { println ("Task is starting"); if (++counter > MAX_POLLS) { task.stop (); FX.exit () } } onChannel: function (c: Channel): Void { println ("Channel: {c}") } onItem: function (i: Item): Void { println ("Item: {i}") } onException: function (e: Exception): Void { println ("Exception: {e}"); task.stop (); FX.exit () } onForeignEvent: function (e: Event): Void { println ("Event: {e}") } onDone: function (): Void { println ("Completed poll #{counter}") } } task.start ()
    

    The source code introduces a constant that specifies the maximum number of times to poll the newsfeed, and a variable that counts the number of polls that have been made so far. The idea is to limit the number of times the newsfeed is polled so that the application won't run indefinitely.

    After invoking the RssTask instance'sstart() function, which starts the newsfeed-polling operation, the newsfeed located at the address assigned tolocation is polled every 15 seconds. TheonStart() callback is invoked at the start of each poll.

    This callback tests to see if the counter has exceeded the maximum number of polls. If so, stop() is invoked to stop the polling, and FX.exit() is invoked to kill the background thread that's associated with the RssTaskinstance, allowing the application to exit.

    Perhaps you're wondering why I placed if (++counter > MAX_POLLS) in onStart(), as opposed toonDone's callback. I did this becauseonDone() isn't always called at the end of each poll. (You'll discover why this happens later in the article.)

    It's possible that an exception might be thrown as a result of the newsfeed being read or parsed. If this happens, theonException() callback invokes stop() to stop the polling task, and then invokes FX.exit() to kill the background thread and terminate the application.

    This simple framework serves as a starting point for exploring the RSS API. As an exercise, expand onChannel() andonItem() to output the values of theirChannel and Item arguments' various variables.

    Atom API Overview

    In contrast to RSS, the Atom API consists of 12 classes that are located in thejavafx.data.feed.atom package. Central to this package is the AtomTask class.

         
    Atom versions supported by the API
    The Atom API handles newsfeeds that conform to version 1.0 (the most recent version at time of writing) of the Atom specification.

    The AtomTask entry-point class extendsFeedTask, and provides the following variables for installing a custom factory, for reporting the newsfeed'sfeed element's non-entry content, and for reporting the content of each of the feedelement's entry elements:

    • factory (of type Factory) identifies the factory that's used to create objects that represent newsfeed elements. You only need to install your own factory when creating acustom feed parser.
    • onFeed (of type function(:Feed):Void) identifies a function that's invoked to report thefeed element's non-entry elements -- the Atom feed element containsentry and non-entry elements, and is itself the top-level element. This variable defaults tonull.
    • onEntry (of typefunction(:Entry):Void) identifies a function that's invoked to report the current entry element. This variable defaults to null.

    The Feed class extends the abstractAtom class (inheriting members for accessing the newsfeed's base URI, for accessing the factory that's creating objects, and more), which extends Base.

    Feed also provides the following variables for accessing feed-oriented (non-entry-specific) content:

    • authors (of type Person[]) identifies the authors (in terms of email address, name, additional person-specific text, and the Internationalized Resource Identifier (IRI) associated with the person) of thisfeed.
    • categories (of type Category[]) identifies the categories (in terms of a human-readable label, category name, and categorization scheme IRI) to which thisfeed belongs.
    • contributors (of type Person[]) identifies the persons who have contributed to thisfeed.
    • generator (of type Generator) identifies the program (in terms of human-readable name, program URI, and program version number) that was used to generate thisfeed. This information can be used to debug an Atom newsfeed.
    • icon (of type Id) identifies thisfeed's iconic image (in terms of a URI to the image).
    • id (of type Id) specifies a universally unique and a permanent identifier (in terms of a URI) for this feed.
    • links (of type Link[]) specifies links (in terms of href,hreflang, length,rel, title, andtype XML attributes, and text associated with the link) from this feed to Web resources.
    • logo (of type Id) identifies thisfeed's non-iconic image.
    • rights (of type Content) specifies the rights (in terms of src,text, and type XML attributes) held in and over this feed.
    • subtitle (of type Content) provides this feed's subtitle.
    • title (of type Content) provides thisfeed's title.
    • updated (of type Date) specifies when this feed's content was last changed.

    As with Feed, the Entry class, which describes one of the feed's entryelements, extends Atom. In addition to sharing most of the same variables as Feed, Entryprovides the following unique variables:

    • content (of type Content) specifies this entry's content.
    • published (of type Date) specifies when this entry was published.
    • source (of type Feed) identifies thisentry's feed source.
    • summary (of type Content) specifies a short summary, abstract, or excerpt for thisentry.

    I've created an AtomDemo NetBeans project for demonstrating AtomTask. This project'sMain.fx source code is very similar toRSSDemo's Main.fx source code.

    /* * Main.fx */ package atomdemo; import java.lang.Exception; import javafx.data.feed.atom.AtomTask; import javafx.data.feed.atom.Entry; import javafx.data.feed.atom.Feed; import javafx.data.pull.Event; def MAX_POLLS = 3; var counter = 0; def task:AtomTask = AtomTask { interval: 15s // The following location demonstrates a basic Atom newsfeed. location: "http://photos.dailycamera.com/hack/feed.mg?Type=gallery&Data=9573834_9ysrR&format=atom10" // The following location demonstrates onForeignEvent(). // location: "http://blogsearch.google.com/blogsearch/feeds?bc_lang=en&hl=en&output=atom" // The following location demonstrates IllegalArgumentException (must use // RssTask for RSS feeds). // location: "http://javajeff.mb.ca/rss/javajeff.xml" onStart: function (): Void { println ("Task is starting"); if (++counter > MAX_POLLS) { task.stop (); FX.exit () } } onFeed: function (f: Feed): Void { println ("Feed: {f}") } onEntry: function (e: Entry): Void { println ("Entry: {e}") } onException: function (e: Exception): Void { println ("Exception: {e}"); task.stop (); FX.exit () } onForeignEvent: function (e: Event): Void { println ("Event: {e}") } onDone: function (): Void { println ("Completed poll #{counter}") } } task.start ()
    

    This simple framework serves as a starting point for exploring the Atom API. Consider expanding onFeed() andonEntry() to output the values of theirFeed and Entry arguments' various variables.

    Behind the Scenes with FeedTask

    The important task of polling an RSS or Atom newsfeed occurs inFeedTask and a related class. I recently decompiled these classes to explore how newsfeeds are polled, and share my findings in this section to deepen your understanding ofRssTask and AtomTask.

    FeedTask creates an instance of thejava.util.Timer class in its static initializer. This instance starts a background thread and works with an instance ofFeedTask's nested SubscriptionTask class (a java.util.TimerTask subclass) to support newsfeed-polling.

    FeedTask's overridden start() function schedules the SubscriptionTask instance for execution by invoking Timer's public void schedule(TimerTask task, long delay, long period) method with the following arguments:

    • The SubscriptionTask instance is passed totask.
    • The long integer 0L is passed todelay.
    • The value of FeedTask's intervalvariable is passed to period.

    Approximately every period milliseconds, theSubscriptionTask instance's public void run() method is invoked. This method invokes theSubscriptionTask-specific doPoll() method with a true argument.

    The doPoll() method first clearsFeedTask's inherited started,stopped, failed, and doneBoolean variables to false. It also nulls out the inherited causeOfFailure variable, and assigns-1 to the inherited progress andmaxProgress variables.

    doPoll() next instantiates thejavafx.io.http.HttpRequest class, which is the vehicle used to obtain newsfeed content, and initializes the followingHttpRequest variables prior to executing this task:

    • location: The value of FeedTask'slocation variable is assigned to this variable.
    • onStarted: A function is assigned to this variable, and is invoked when the request starts to execute. The function is responsible for invoking onStart().
    • onResponseHeaders: A function is assigned to this variable to retrieve and save the values of the HTTPETag and Last-Modified response headers. These values are needed to ensure that only changed newsfeed content will be returned in the next poll request.
    • onToRead: A function is assigned to this variable to obtain the total number of bytes to read, which is assigned tomaxProgress.
    • onRead: A function is assigned to this variable to obtain the number of bytes read so far, which is assigned toprogress.
    • onInput: A function is assigned to this variable to parse the request content via an internal parse(is)method call (where is is onInput()'sjava.io.InputStream argument). If parsing results in a thrown exception, the exception object is assigned tocauseOfFailure, true is assigned tofailed, and onException() is invoked. Finally, true is assigned to done, andonDone() is invoked. (The onInput()function isn't invoked, and hence onDone() isn't invoked, when only changed content is requested but that content isn't available.)
    • onException: A function is assigned to this variable to report a problem with the request itself (and not parsing). If the request fails, the exception object is assigned tocauseOfFailure, true is assigned tofailed, and onException() is invoked.

    Continuing, doPoll() ensures that only updated newsfeed content is returned by setting the request'sIf-Modified-Since andIf-None-Match headers to the previously savedLast-Modified and ETag values, respectively.

         
    Obtaining a newsfeed's updated versus entire content
    When true is passed to doPoll(), which happens when this method is called fromSubscriptionTask's run() method orFeedTask's poll() function,doPoll() sets If-Modified-Since andIf-None-Match so that only updated content is returned. In contrast, when you invoke FeedTask'supdate() function, which invokes doPoll()with a false argument, those request headers will not be set, and the entire content will be returned.

    doPoll() now iterates over FeedTask'sheaders variable, assigning each storedHttpHeader instance to the HttpRequestinstance by invoking the latter instance's setHeader()function.

    Finally, doPoll() invokes theHttpRequest instance's start() function to execute this task, resulting in retrieved and parsed content.doPoll() then returns to the run()method. If it throws an exception, run() invokesonException().

         
    A parsing tidbit
    For brevity, I don't discuss parsing beyond theparse(is) method call. However, if you decide to explore the parsing implementation, here's a tidbit to save you some head-scratching: The parse(InputStream) method initializes the javafx.data.pull.PullParser instance'simpl_skippedElements variable to the qualified names of Atom's summary, content,rights, title, andsubtitle elements, and RSS'sdescription, title, andcopyright elements, to ensure that the parser treats any HTML or other markup that's embedded in these elements as literal text.

    At some point, you'll probably invoke FeedTask's overridden stop() function. This function invokes theSubscriptionTask instance's inherited public boolean cancel() method to cancel the newsfeed-polling task (but not kill the Timer instance's background thread).

    Conclusion

    Enough theory! Now that you've gained knowledge of JavaFX's RSS and Atom APIs, you might want to create your own newsfeed reader. To help you with this task, I present a practical example that handles RSS and Atom newsfeeds in my forthcoming companion to this article.

    Resources

      
    http://today.java.net/im/a.gif