Getting Groovy with XML Blog

Version 2

    {cs.r.title}




              
                              

    Contents
    Java Straight to the DOM
    Simplifying with XPath
    Groovy Take One
    Groovy and the DOM
    Wrapping the XML DOM
    Groovy and Object Orientation
    Conclusions
    Resources

    XML sucks. Oh, wait, XML rocks. Well, it actually does a lot of both. It rocks because of all of the editors, validators, and tools written for it. XML has all but replaced any notion of a new custom text-based data language. But it also sucks because it's hard to use. Using a DOM to read and manipulate XML is a pain, and SAX is even worse. XPath helps a little, but even XSLT, the ultimate XML processing tool, is hard to learn, follows an uncommon functional programming paradigm, and is overkill for small problems.

    Is there something that we can do to take the pain out of XML? The E4Xcommittee, which is made up of representatives from a bunch of big companies (Mozilla.org, Microsoft, Macromedia, etc.), seems to think so. They have an extension layer proposal for JavaScript (ECMAScript) that will make building and accessing an XML DOM as easy as working with objects using the "dot" notation.

    Here is example XML data file that I will use for all of the examples in this article:

     <transactions> <account id="a"> <transaction amount="500" /> <transaction amount="1200" /> </account> <account id="b"> <transaction amount="600" /> <transaction amount="800" /> <transaction amount="2000" /> </account> </transactions>
    

    Wouldn't it be great if you had this XML document attached to a variable named doc? You could say this:

    var id = doc.transactions.account[0].id
    

    And have the id variable set to a. That is what E4X is all about. It's about making XML access simple and easy to understand. The only problem is that E4X hasn't been approved yet, and isn't shipping. So can we make XML simpler today?

    When I asked myself that question, I thought of applying my new favorite embedded language, Groovy, to the task. You can judge for yourself how far I have simplified the task, but I hope in the meantime that you will learn something about XML and a lot about Groovy.

    What are these fixes? First, we will use a dot notation for traversing the DOM tree, instead of using accessors. We will also default any node access to map to the first child of that node in the tree. This means that you won't have to indicate which child you want to work with if there is only one child. Access will always default to the first matching child. And finally, we will make XPath access simpler through a native method on every node.

    In twelve-step programs they have you admit your addiction in order begin to to deal with it. In order to understand how bad using the DOM is, we need to start with a hand-coded example.

    Java Straight to the DOM

    The example I will use throughout this article is to take the original XML data file and to add up all of the transaction numbers by account. We will call this functioncalculateAccounts, and it should return a hash or a map that has an entry for each account with the correct values. In this case, that means 1700 for account a and 3400 for account b.

    The simplest way to do this would be to use the DOM using Java:

    
    import java.util.Hashtable; 
    import java.io.File; 
    import 
    javax.xml.parsers.DocumentBuilder; 
    import 
    javax.xml.parsers.DocumentBuilderFactory; 
    import 
    org.w3c.dom.Document; 
    import 
    org.w3c.dom.NodeList; 
    import 
    org.w3c.dom.Element; 
    import 
    org.w3c.dom.Node; 
    public 
    class GroovyXML1 { 
    public 
    static 
    Hashtable calculateAmounts( 
    String fileName ) 
    throws 
    Exception {
    

    We first read in the XML file:

     // Read the XML DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse( 
    new 
    File( fileName ) ); // Initialize the list of account values 
    Hashtable accountValues = 
    new 
    Hashtable();
    

    Next we iterate through the account nodes:

     // Get the initial account nodes NodeList accountNodes = doc.getChildNodes().item(0).getChildNodes(); 
    for(
    int accountNodeIndex = 0; accountNodeIndex < accountNodes.getLength(); accountNodeIndex++ ) {
    

    One of the problems with DOM access is that we have to account for the white-space nodes that are in the tree. This conditional ensures that we only look at the element nodes.

     // Go only through the account Element nodes 
    if ( accountNodes.item(accountNodeIndex).getNodeType() == Node.ELEMENT_NODE ) {
    

    Because Node doesn't have a convenient accessor to get attributes, we need to cast the Node to anElement before we can get the accountid.

     Element accountElement = (Element)accountNodes.item( accountNodeIndex); // Get the account ID 
    String accountID = accountElement.getAttribute( "id" );
    

    Now we need to iterate through the transaction nodes to add up the amounts.

     // Go through the transaction nodes // within the account node 
    int amount = 0; NodeList transactionNodes = accountElement.getChildNodes(); 
    for( 
    int transIndex = 0; transIndex < transactionNodes.getLength(); transIndex++ ) { // Go through just the elements 
    if ( transactionNodes.item( transIndex ).getNodeType() == Node.ELEMENT_NODE ) { // Add the amount to the amount counter Element transaction = (Element)transactionNodes.item( transIndex ); 
    Integer value = 
    new 
    Integer( transaction.getAttribute( "amount" ) ); amount += value.intValue(); } }
    

    And the final step in the processing is to add the amount to the hash table. Because hash tables only take objects, we need to wrap the total in an Integer object before we can add it to the output.

     // Add the account total to the hash table accountValues.put( accountID, 
    new 
    Integer( amount ) ); } } 
    return accountValues; }
    

    With the results in hand, we can output the results to see if we did our math correctly.

     
    public 
    static 
    void main( 
    String[] args) 
    throws 
    Exception { 
    System.out.println( "Using XML DOM" ); 
    Hashtable out = calculateAmounts( "test_data.xml" ); 
    System.out.println( "a = " + out.get( "a") ); 
    System.out.println( "b = " + out.get( "b") ); } }
    

    Of the twenty-odd lines that were involved in getting the results, only two of those were the actual algorithm itself. So it's no surprise when we can't actually see the algorithm forest for all of the infrastructure trees.

    Perhaps things would be better if we used XPath.

    Simplifying with XPath

    We need to bring in the XPath API:

     
    import 
    org.apache.xpath.XPathAPI;
    

    Then we can make some changes to thecalculateAmounts method:

     
    public 
    static 
    Hashtable calculateAmounts( 
    String fileName ) 
    throws 
    Exception { // Read the XML DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse( 
    new 
    File( fileName ) ); // Initialize the list of account values 
    Hashtable accountValues = 
    new 
    Hashtable();
    

    We make some improvements in the robustness of the code, because XPath now handles finding the account elements:

     // Get the initial account nodes 
    NodeList accountNodes = XPathAPI.selectNodeList( doc, "/transactions/account");  
    for( 
    int accountNodeIndex = 0; accountNodeIndex < accountNodes.getLength(); accountNodeIndex++ ) { Element accountElement = (Element)accountNodes.item( accountNodeIndex ); // Get the account ID 
    String accountID = accountElement.getAttribute( "id" );
    

    Note that we didn't have to use the conditional to check to see whether we were looking at elements, because with XPath guarantees that we are only looking at elements.

    We can also replace the amount fetcher with an XPath search from the account node down to get the amounts:

     // Go through the transaction nodes within the account node 
    int amount = 0; 
    NodeList amountNodes = XPathAPI.selectNodeList( accountElement, "transaction/@amount" ); 
    for( 
    int amountIndex = 0; amountIndex < amountNodes.getLength(); amountIndex++ ) { // Add the amount to the amount counter nodeValue = amountNodes.item( amountIndex ).getNodeValue(); amount += 
    Integer.valueOf( nodeValue ).intValue(); }
    

    Notice that we are getting just the amount attributes by using the @amount specifier in the XPath. The XPath code is more robust than the original DOM code because it ensures that we are only looking at transaction elements within the account node. The original code would look at any type of node to find an amount attribute. If we added new types of nodes to the account node, we would be in trouble.

     // Add the account total to the hash table accountValues.put( accountID,
    new 
    Integer( amount ) ); { 
    return accountValues; }
    

    We have taken about five lines out of the code with XPath and made the whole processing system more robust, but the algorithm is still pretty obscured. Can we make it any cleaner using Groovy? I thought you'd never ask.

    Groovy Take One

    In the spirit of XP and refactoring, I'll keep changing the code until I get the right blend of algorithm and infrastructure. To use Groovy in this process, the first step is to move thecalculateAmount method into the Groovy engine, as illustrated in Figure 1:

      
    Figure 1. Moving the logic into Groovy
    Figure 1. Moving the logic into Groovy

    The application will create a Groovy scripting shell instance, load it up with our script, and then run acalculateAmounts closure. The Java for this starts here:

     
    import 
    groovy.lang.GroovyShell; 
    import 
    groovy.lang.Binding; 
    import 
    groovy.lang.Closure; 
    import java.io.File; 
    import java.util.Map; 
    public 
    class GroovyXML3 { 
    public 
    static Map calculateAmounts( 
    String fileName ) 
    throws 
    Exception { GroovyShell shell = 
    new GroovyShell( 
    new Binding() ); shell.evaluate( 
    new 
    File( "GroovyXML3.groovy" ) );
    

    Here we create the Groovy shell and load the script.

     // Get the calculateAmounts closure Closure calc = (Closure)shell.getVariable( "calculateAmounts" );
    

    We then get the calculateAmounts closure.

     // Run the closure on the file name 
    return (Map)calc.call( fileName ); }
    

    We call that closure with the file name string, and coerce the return value back to a Map. Why the change from aHashtable to a Map? Because Groovy's native datatype for an associate array (hash table) is aMap and not a java.util.Hashtable.

     
    public 
    static 
    void main( 
    String[] args) 
    throws
    Exception { 
    System.out.println( "Using Groovy with a file name" ); Map out = calculateAmounts( "test_data.xml" ); 
    System.out.println( "a = " + out.get( "a") ); 
    System.out.println( "b = " + out.get( "b") ); } }
    

    The corresponding Groovy is:

     
    import java.io.File; 
    import javax.xml.parsers.DocumentBuilderFactory; 
    import org.apache.xpath.XPathAPI;
    

    Yep, Groovy can directly import Java namespaces.

     calculateAmounts = { fileName |
    

    Here is where we define the calculateAmountsclosure. A closure is a lot like a function. In fact, for the purposes of this article, you can think of it as a function that takes arguments. In this case, the argument is a file name. If you are curious as to what closures are, you might want to read up on functional programming languages such as Haskell, or languages such as LISP or Scheme.

    The rest of the Groovy code looks like our original Java algorithm, with the exception that there are no types and thatfor loops are different:

     factory = DocumentBuilderFactory.newInstance(); builder = factory.newDocumentBuilder(); doc = builder.parse( 
    new 
    File( fileName ) ); accountValues = [:]; accountNodes = XPathAPI.selectNodeList( doc, "/transactions/account"); 
    for( accountNodeIndex in 0..(accountNodes.getLength()-1) ) {
    

    The for loop in Groovy has several different variations. I'm using the variation that iterates over a range of values. In this case, from zero to the number of nodes in the list minus one.

     accountID = accountNodes.item( accountNodeIndex ).getAttribute( "id" ); amount = 0; amountNodes = XPathAPI.selectNodeList( accountNodes.item( accountNodeIndex ), "transaction/@amount" ); 
    for( amountIndex in 0..(amountNodes.getLength()-1) ) { nodeValue = amountNodes.item( amountIndex ).getNodeValue(); amount += 
    Integer.valueOf( nodeValue ).intValue(); } accountValues.put( accountID, amount );
    

    Another nice thing about Groovy is that I don't have to handle the int to Integer conversions. I can just concentrate on the math.

     } 
    return accountValues; };
    

    We're down to about 13 lines, with two being the algorithm. Better, but still not very good.

    One problem that I have had with my approach since the beginning is that the calculateAmounts function handles the reading of the XML data. That doesn't make a lot of sense.

    Groovy and the DOM

    Can we pass a DOM object into Groovy, as Figure 2 shows?

      
    Figure 2. Injecting the DOM into Groovy
    Figure 2. Injecting the DOM into Groovy

    Let's make some changes to the Java file to have it do the DOM reading and then pass it to Groovy:

     
    public 
    static Map calculateAmounts( Document doc ) 
    throws
    Exception { GroovyShell shell = 
    new GroovyShell( 
    new Binding() ); shell.evaluate( 
    new 
    File( "GroovyXML4.groovy" ) ); // Get the calculateAmounts closure Closure calc = (Closure)shell.getVariable( "calculateAmounts" ); // Run the closure with the document node 
    return (Map)calc.call( doc ); }
    

    We can call the closure with the Document, just as we did with the file name.

     
    public 
    static 
    void main( 
    String[] args) 
    throws 
    Exception { 
    System.out.println( "Using Groovy with a DOM" ); // Read the XML DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse( 
    new 
    File( "test_data.xml" ) ); Map out = calculateAmounts( doc ); 
    System.out.println( "a = " + out.get( "a") ); 
    System.out.println( "b = " + out.get( "b") ); } }
    

    Here is the updated Groovy code that no longer does the DOM-reading work.

     
    import org.apache.xpath.XPathAPI; calculateAmounts = { doc | accountValues = [:]; accountNodes = XPathAPI.selectNodeList( doc, "/transactions/account"); 
    for( accountNodeIndex in 0..(accountNodes.getLength()-1) ) { accountID = accountNodes.item( accountNodeIndex ).getAttribute( "id" ); amount = 0; amountNodes = XPathAPI.selectNodeList( accountNodes.item( accountNodeIndex ), "transaction/@amount" ); 
    for( amountIndex in 0..(amountNodes.getLength()-1) ) { nodeValue = amountNodes.item( amountIndex ).getNodeValue(); amount += 
    Integer.valueOf( nodeValue ).intValue(); } accountValues.put( accountID, amount ); } 
    return accountValues; };
    

    This small change gets us down to 10 lines of code and two lines of algorithm. That's starting to get into the reasonable range. But at the beginning of the article, I talked about creating an easier non-DOM syntax for reading XML.

    Wrapping the XML DOM

    I wonder if we could wrap the DOM nodes in something that would make them a little easier to use, as in Figure 3:

      
    Figure 3. Wrapping the DOM with our own proxy object
    Figure 3. Wrapping the DOM with our own proxy object


    It would be great if we could access the node attributes just as we do properties, like this:

    accountNode.id
    

    Then we could execute an XPath search on a node by just calling an XPath method on that node. Let's start by creating a subclass ofGroovyObject that will wrap a DOM node:

     
    import 
    groovy.lang.GroovyObject; 
    import 
    groovy.lang.MetaClass; 
    import org.w3c.dom.Element; 
    import org.w3c.dom.Node; 
    import org.w3c.dom.NodeList; 
    import org.apache.xpath.XPathAPI; 
    import java.util.Vector; 
    public 
    class DOMNodeGroovyObject 
    implements GroovyObject { 
    private Node _node;
    

    Our constructor will take a reference to the node:

     DOMNodeGroovyObject( Node node ) { _node = node; }
    

    Then we will listen for method invocations such asgetValue, which will return the text of the node, andxpath, which will run an XPath query from this node down.

     
    public 
    Object invokeMethod( 
    String arg0, 
    Object arg1) { 
    if ( arg0 == "getValue" ) 
    return _node.getNodeValue(); 
    if ( arg0 == "xpath" ) 
    return xpath( (( 
    Object[])arg1)[0].toString() ); 
    return 
    null; }
    

    We will also override getProperty to return the value of any attributes on the node we are wrapping.

     
    public 
    Object getProperty( 
    String arg0) { 
    if ( _node.getNodeType() == Node.ELEMENT_NODE ) { Element elem = (Element)_node; 
    if ( elem.hasAttribute( arg0 ) ) 
    return elem.getAttribute( arg0 ); } 
    return 
    null; } 
    public 
    void setProperty( 
    String arg0, 
    Object arg1) { } 
    public MetaClass getMetaClass() { 
    return 
    null; } 
    public 
    void setMetaClass(MetaClass arg0) { }
    

    The xpath method will return a Vectorof resulting nodes, each of which will be wrapped with our class instead of being just basic Nodes.

     
    private 
    Vector xpath( 
    String path ) { 
    Vector children = 
    new
    Vector(); 
    try { NodeList nodes = XPathAPI.selectNodeList(_node,path ); 
    for( 
    int child = 0; child < nodes.getLength(); child++ ) { Node node = nodes.item( child ); children.add( 
    new DOMNodeGroovyObject( node ) ); } } 
    catch( 
    Exception e ) { } 
    return children; } }
    

    In order to hook it all up, we need to make a small change to the original Java file to pass the closure a Groovy DOM node and not just a DOM node:

     // Run the closure with the dom node wrapper 
    return (Map)calc.call( 
    new DOMNodeGroovyObject( doc ) ); }
    

    Now the Groovy looks like this:

     calculateAmounts = { doc | accountValues = [:]; accountNodes = doc.xpath( "/transactions/account" ); 
    for( accountNode in accountNodes ) { amount = 0; amountNodes = accountNode.xpath( "transaction/@amount" ); 
    for( amountNode in amountNodes ) { amount += 
    Integer.valueOf( amountNode.getValue() ).intValue(); } accountValues.put( accountNode.id, amount ); } 
    return accountValues; }; 
    

    Wow! This code is almost becoming readable. We are down to nine lines, two of which are the algorithm. We don't have any Java imports. We are using an xpath method on the node, which makes more sense than calling XPath directly. We can also use the very safe version of for that iterates over a vector. So no more "minus one" stuff on our forloops.

    But this procedural stuff -- it doesn't feel right. We are doing the Java thing right. We should be object-oriented!

    Groovy and Object Orientation

    Can we make a Groovy object and access it with Java, as in Figure 4?

      
    Figure 4. Referencing a groovy class directly
    Figure 4. Referencing a Groovy class directly

    How cool would that be? The first step would be to make an interface to which the Groovy code must conform:

     
    public 
    interface GroovyNodeIterator { 
    public 
    Object process( DOMNodeGroovyObject doc ); }
    

    Our simple interface has one method, which takes a DOM wrapper object and returns some sort of object. With this in hand, we can make some final changes to the Java code:

     
    import 
    groovy.lang.GroovyClassLoader; ... 
    public 
    static Map calculateAmounts( Document doc ) 
    throws 
    Exception { // Create a class loader GroovyClassLoader groovyLoader = 
    new GroovyClassLoader(); // Create the shell GroovyShell shell = 
    new GroovyShell( groovyLoader, 
    new Binding() );
    

    We need to build a GroovyClassLoader, because we will be asking it for one of our Groovy classes. We pass this new class loader to the shell.

     // Load the AmountAdder Groovy class 
    Class adderClass = groovyLoader.loadClass( "AmountAdder" );
    

    We then ask the class loader to load up ourAmountAdder class.

     // Create an instance of it GroovyNodeIterator obj = (GroovyNodeIterator)adderClass.newInstance();
    

    And we create an instance of that class. No kidding. Seriously. We are creating an object that looks and feels just like a Java object, but it's really been written in Groovy.

     // Run the process method 
    return (Map)obj.process( 
    new DOMNodeGroovyObject( doc ) ); }
    

    We then call the process method and get the return value. We can call this Groovy object just as we would any Java object.

    The Groovy code for the AmountAdder class is shown below:

     
    class AmountAdder 
    implements GroovyNodeIterator { 
    public 
    Object process( DOMNodeGroovyObject doc ) { accountValues = [:]; 
    for( accountNode in doc.xpath( "/transactions/account" ) ) { accountValues[ accountNode.id ] = 0; 
    for( transNode in accountNode.xpath( "transaction" ) ) { accountValues[ accountNode.id ] += 
    Integer.valueOf( transNode.amount ); } } 
    return accountValues; } }
    

    The process method has to be defined exactly as it would be in Java; otherwise, we will get an AbstractMethodexception, because the class will not have implemented everything in the interface.

    Inside of the method, we have simplified the algorithm as much as we can. It's down to six lines, two of which are the algorithm. That's 33 percent, which is really good. Certainly, the logic of the algorithm is now more visible than it was before.

    But the most important thing we have learned is that Groovy is really cool. It can extend interfaces and creates byte code that can be executed by Java directly. Which means that you can have a reference to a Groovy object that looks and works exactly like a Java object.

    Conclusions

    XML is pretty cool stuff, and the more we use it, the more we need to think about how to make it easier to use. We need to think about ways to reduce the XML infrastructure work for ourselves and our customers.

    Now, I don't seriously think that people will start replacing XSLT with Groovy code. But I do think that this article should spur your imagination with ideas about how you can use a flexible scripting language like Groovy to simplify your application and to make it more extensible. When integrating Groovy into your application is as simple as creating an interpreter and asking the class loader for a class, you have to think about all of the things you can do in the script layer as part of rapid prototyping, things that can then be brought back into the Java layer for efficiency.

    Adding dynamic scripting-language support to your application is at your fingertips. Give it a go!

    Resources

      
    http://today.java.net/im/a.gif