Extensible Code Generation with Java, Part 1 Blog

Version 2



    The Basics
    The Problem
    Subclassing Solution
    Next Time: Safe Zones

    Code generation is a key new trend in engineering, one that you need to understand well. The reason is simple: today's modern frameworks are extremely code-intensive. Using a code generator to build the code for you can save you a lot of time, both in writing the code and fixing the inevitable bugs that spring from swathes of hand-written code.

    There are a number of code generation options for Java development. These range from canned solutions that generate whole applications (e.g. Compuware's OptimalJ), to open source generators (e.g. XDoclet), to custom-built solutions. This article focuses on using XSLT to build custom generators. More information about XSLT is presented below.

    Building a custom generator is an easy, fun, and cheap (free) way to understand how generation works. With a new understanding about code generation in hand, you will be able to evaluate off-the-shelf tools as well as have the ability to write something yourself.

    Let's start with the basics of code generation.

    The Basics

    Code generation is using one application to build code for another application. In this case, XSLT will be our generator application. Input for a code generator can come in many forms (source code, database schemas, XML models, etc.). Regardless of the source, we call the input the model because it represents (models) what is to be built. On the other side are the templates. The templates render the model into code, or other artifacts such as documentation. Figure 1 illustrates this process.

    Figure 1
    Figure 1. Flow chart for basic XSLT-based code generation

    There are two types of code generation: passive and active. In the passive model, you generate the code once and then tweak it. In the active model, you generate the code continuously (often as part of a build process). As changes to the model are made, the generator will be run and new code is created.

    It's easy to think about how to use code generation on new projects that build new code, but what about existing an code base? We all love new code, but most of us get paid to work on old code. Is there a way we can use code generation to aid us in extending and maintaining existing applications?

    The Problem

    There are several problems with applying code generation to existing code bases. You may have custom homebrew APIs that need custom work. These APIs may have an inconsistent interface, which makes coding to them difficult. You may have, as is often the case, a bunch of classes that could probably be generated, but have small changes required for each case. Some examples would be custom error-checking or validation within the class, or cross-dependency special cases where two classes depend on each other in a unique on-off way.

    With either legacy code or new code, the problem with code generation often comes down to, "Can you generate 100 percent of the code, or only 90 percent? And if only 90 percent, then what happens to that other 10 percent?" There are two basic solutions: the first is generate partial classes. These are base classes that you can then subclass to build custom behaviors. A second option is to generate code with safe zones. When you want to extend the functionality you add your code into these safe zones and it will be preserved between generation cycles.

    Before we get into the specifics, we should step back for a second to give a general overview of code-generation basics to bring you up to speed.

    Subclassing Solution

    We start with the input model that will be used for both generators. This model is copied from an article on IOM-based generation by Giuseppe Naccarato.

    <?xml version="1.0" encoding="UTF-8"?> <Content> <Class name="Customer" stereotype=""> <Attribute name="code" type="integer"/> <Attribute name="description" type="string"/> </Class> <Class name="Order" stereotype=""> <Attribute name="number" type="integer"/> <Attribute name="date" type="date"/> <Operation name="getTotalAmount" returnType="double"> <Parameter name="calculateVAT" type="boolean"/> </Operation> </Class> <Association name=""> <Role class="Order" multiplicity="*" name="" type="start"/> <Role class="Customer" multiplicity="1" name="" type="end"/> </Association> </Content>

    There are two important structures, the Class tag, which contains enough information to create a single class, and theAssociation tag, which creates a connection between two classes. In the example case, we are creating two classes,Customer and Order, and associating them. One Customer can have multipleOrders.

    Each class tag has some sub-tags and attributes that define it in more detail. For example, the attribute tags in theOrder class specify that there should be both a number and a date on the order. In addition there should be an operation (a method) called getTotalAmount that takes avalue-added tax Boolean as a parameter. The model doesn't specify what getTotalAmount should do, that's up to us as engineers to fill in. We do that either by subclassing, or by adding code into a safe zone, as we will see.

    The next step is to use XSLT to generate Java source code files from the model. An XSLT template is an XML file that contains both processing directives and template output. You need an XSLT processor (such as Michael Kay's excellent Saxon XSLT processor) to take the XML and the XSLT as input and to create one or more files as output. The flow is illustrated in Figure 2.

    Figure 2
    Figure 2. Flow for Saxon-based XSLT code generation

    Our XSLT template file is called main.xsl. Note that the excerpts in this article have some line breaks added for web-page formatting -- part two of this series will have downloadable source, in which you can examine the original formatting. At any rate, main.xsl starts with a little preamble:

    <?xml version="1.0" encoding="UTF-8" ?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.1"> <xsl:output 
    method="text" />

    Because we are generating Java source code (and not XML), we set the output mode to text.

    The next thing to do is write a template that matches the root node of the XML input tree. This template will build a file for each of the Java class tags:

    <!-- The main template loops over all of the classes and creates a file for each one --> <xsl:template match="/"> 
    <xsl:for-each select="Content/Class"> <xsl:variable name="filename" select="concat('output/',@name,'Base.java')" /> <xsl:message>Creating <xsl:value-of select="$filename" /></xsl:message> <xsl:document href="{$filename}" method="text"> 
    <xsl:call-template name="java-class"> <xsl:with-param name="class" select="." /> <xsl:with-param name="associations" select="/Content" /> </xsl:call-template> </xsl:document> </xsl:for-each> </xsl:template>

    The first for-each statement iterates over all of the class tags. The variable statement then builds the file name for the Java class file. The call-template tag then invokes a template that will build the Java to go in the file.

    Now that we are cycling over the model to pick out the classes and to build the Java files, we need to build the actual contents. That is handled in the next template block:

    <!-- The main template for a Java class --> <xsl:template name="java-class"> <xsl:param name="class" /> <xsl:param name="associations" /> /* This file has been generated */ import java.util.*; public class 
    <xsl:value-of select="$class/@name" />Base {

    Here you can start to see the beginnings of the Java class. The highlighted part is where we put in the name of the class.

    The next step is to build the attributes:

    <!-- Builds the attributes --> 
    <xsl:for-each select="$class/Attribute"> <xsl:if test="@type='integer'"> private int <xsl:value-of select="@name" />; public int get<xsl:value-of select="@name" />() {return this.<xsl:value-of select="@name" />;} public void set<xsl:value-of select="@name" /> (int <xsl:value-of select="@name" />) {this.<xsl:value-of select="@name" />=<xsl:value-of select="@name" />;} </xsl:if> <xsl:if test="@type='date'"> private Date <xsl:value-of select="@name" />; public Date get<xsl:value-of select="@name" />() {return this.<xsl:value-of select="@name" />;} public void set<xsl:value-of select="@name" /> (Date <xsl:value-of select="@name" />) {this.<xsl:value-of select="@name" />=<xsl:value-of select="@name" />;} </xsl:if> </xsl:for-each>

    It looks pretty hairy, but it's really quite simple. Thefor-each statement iterates through all of the attributes, and the if statement checks for various types of attributes. The code within the if builds the right methods for each data type.

    XSLT has a reputation for being very complex and verbose. I can't agree with the complexity part, but it does score heavily in the verbosity department, as this section shows. My recommendation to any XSLT programmer is to get an editor to help with the grunt work. I use oXygen. It has excellent intellisense for XSLT and with version 4.0, it now includes an XSLT debugger. Suffice it to say, with the right tools, XSLT can be a joy.

    Back to the task at hand, the next segment deals with the associations:

    <!-- Builds the locals for any associated classes --> 
    <xsl:for-each select="$associations/Association"> <xsl:variable name="startrole" select="./Role[@type='start']"/> <xsl:variable name="endrole" select="./Role[@type='end']"/> <xsl:variable name="endvar" select="translate($endrole/@class,'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz')"/> 
    <xsl:if test="$startrole/@class=$class/@name"> private <xsl:value-of select="concat( $endrole/@class, ' ', $endvar )"/>; public <xsl:value-of select="$endrole/@class"/> get<xsl:value-of select="$endrole/@class"/>() {return this.<xsl:value-of select="$endvar" />;} public void set<xsl:value-of select="$endrole/@class"/> (<xsl:value-of select="concat( $endrole/@class, ' ', $endvar )"/> ) {this.<xsl:value-of select="$endvar" />=<xsl:value-of select="$endvar" />;} </xsl:if> </xsl:for-each>

    This is pretty complex, but it boils down to iterating through all of the associations in the model, then finding out if any are relevant to this class. If they are, then we build a little code to add the associated fields into the class and create some accessors for them.

    The final segment builds the operations of the class:

    <!-- Builds the operations --> 
    <xsl:for-each select="$class/Operation"> public <xsl:value-of select="concat( @returnType, ' ', @name)" /> ( <xsl:call-template name="operation-params"> <xsl:with-param name="operation" select="." /> </xsl:call-template> ) { <xsl:variable name="zoneid" select="concat( $class/@name, '-', @name)" /> <xsl:if test="@returnType='double'">return 0.0;</xsl:if> } </xsl:for-each> } </xsl:template> <!-- A helper template to build the parameters for an operation --> 
    <xsl:template name="operation-params"> <xsl:param name="operation" /> <xsl:for-each select="Parameter"> <xsl:value-of select="concat( @type, ' ', @name )"/> </xsl:for-each> </xsl:template> </xsl:stylesheet>

    The XSLT code iterates through the operations and builds a stub method for each. The little helper template just makes it easier to build comma-separated lists of arguments.

    To apply the template to the input file, we run the following at the command prompt:

    java com.icl.saxon.StyleSheet input.xml main.xsl Creating output/CustomerBase.java Creating output/OrderBase.java

    The new OrderBase.java file contains:

    /* This file has been generated */ import java.util.*; public class OrderBase { private int number; public int getnumber() {return this.number;} public void setnumber(int number) {this.number=number;} private Date date; public Date getdate() {return this.date;} public void setdate(Date date) {this.date=date;} private Customer customer; public Customer getCustomer() {return this.customer;} public void setCustomer(Customer customer ) {this.customer=customer;} public double getTotalAmount( boolean calculateVAT ) { return 0.0; } }

    Not a work of beauty, by any means. But that has less to do with the technology and more to do with the brevity and lack of tuning in the template. In your templates you should add JavaDoc and error checking, and try and keep the formatting as clean as possible.

    Let's step back to take a look at the big picture. The class that was generated is suitable for subclassing. In fact, the example will not compile because Customer has not yet been implemented, only CustomerBase has. So wemust implement a derived class, if only to get the system to compile!

    Next Time: Safe Zones

    The other approach to extending generated code is through the use of safe zones. In part two of this series, we'll look at what safe zones are and jump through some hoops to get them to work.

    Learn More About XSLT

    If you are still hesitant about XSLT, consider that there is a three-fold benefit of learning the XSLT you will need to use the techniques in this article. First, you will be learning a valuable tool for converting XML from one form to another, or into a web page, or into text. Second, you will be learning the related XPath technology, which is a very powerful system for specifying nodes in a tree. And last, you will have learned the functional programming model, which is at the heart of XSLT.

    For more information about XSLT I recommend that you read Michael Kay's "XSLT: For Programmers." It is a great introduction to XSLT from someone who is a key leader in the field.