Extensible Code Generation with Java, Part 2 Blog

Version 2



    Using Safe Zones
    One More Alternative
    The Downsides

    In part 1of this series, we looked at the idea of generated code, which is code written not by hand but by another application. The appeal of generated code is that it can eliminate drudgery (and mistakes) in some kinds of coding. A primary problem with this approach is getting the generator-created code to integrate with existing code bases.

    We introduced a workflow in which an XML document was used to describe desired classes, whose source was then generated with XSLT, via the Saxon XSLT processor. One downside of this approach is the need for subclassing some parts of the generated code to add the functionality that can't be generated for us. An alternative is to use "safe zones".

    Using Safe Zones

    A safe zone is a comment-delimited section of code where you can add any custom code you require. Here is an example fromgetTotalAmount in the example from part 1:

    public double getTotalAmount( boolean calculateVAT ) { // START-SAFE(Order-getTotalAmount) 
    System.out.println("Hello"); // END-SAFE return 0.0; }

    You can make any changes you like to the interior of the highlighted section and they will be preserved between code generation cycles. Of course, you can have as many safe zones as you like, but you should use them judiciously to demonstrate where you would like to see the code extended, and not just anywhere.

    Implementing safe zones is a little tricky because we have to scan the original output files and store the safe-zone contents. Then we need to feed the stored safe-zone contents back into the template so they can be reintegrated into the new output.

    So, how do we do that? Certainly XSLT can't take on that task alone. The approach I will use in this article is to wrap the Saxon XSLT engine in a Java application. This application will read the input XML, then look through the original output files for safe zones. The application then merges the safe-zone material back into the original model as XML tags. It then invokes a modified version of the XSLT template that we used in the original example. The flow is illustrated in Figure 1.

    Figure 1
    Figure 1. Flow for preserving safe zones in generated code

    You could argue that mixing the safe-zone code into the model is blasphemy, and in the case of XSLT you can have it read from more than one data source. Using two files is something to think about, but it's nice to know that a single file version works as well.

    Let's start with the Java code for the code generator. We start with the ridiculously long preamble:

    import javax.xml.transform.Source; import javax.xml.transform.Result; import javax.xml.transform.TransformerFactory; import javax.xml.transform.Transformer; import javax.xml.transform.stream.StreamSource; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.dom.DOMSource; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.FactoryConfigurationError; import javax.xml.parsers.ParserConfigurationException; import org.xml.sax.SAXException; import org.xml.sax.SAXParseException; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.NodeList; import org.w3c.dom.DOMException; import java.util.Hashtable; import java.util.Enumeration; import java.io.File; import java.io.FileReader; import java.io.BufferedReader; import java.util.regex.Pattern; import java.util.regex.Matcher;

    Wow. That's a lot of stuff, but it's all necessary for the example. The first thing is to read in the input XML file.

    public class SafeGen { private String _inputFile; private String _templateFile; private Document _document; public SafeGen( String inputFile, String templateFile ) { _inputFile = inputFile; _templateFile = templateFile; } private void readInputFile() throws Exception { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); _document = 
    builder.parse( new File( _inputFile ) ); }

    The parse section is where we read in the input XML and store it in a member variable in DOM form.

    Next, we scan the original output files for the safe zones:

     private void scanForSafeZones() throws Exception { // Creates a temporary has of safe zones where // the key is the ID of the zone and the value // is the code within the safe zone Hashtable safeZones = new Hashtable(); // The start and end patterns Pattern 
    startPattern = Pattern.compile( "\\/\\/ START-SAFE\\((.*?)\\)" ); Pattern 
    endPattern = Pattern.compile( "\\/\\/ END-SAFE" ); // Goes through the XML looking for classes Element root = _document.getDocumentElement(); NodeList nodes = root.getElementsByTagName( "Class" ); for( int index = 0; index < nodes.getLength(); index++ ) { Element elem = (Element)nodes.item( index ); String className = elem.getAttributeNode( "name" ).getNodeValue(); // Open up the file for the class name File file = new File( "output/" + className + ".java" ); BufferedReader reader = new BufferedReader( new FileReader( file ) ); // Set up the state machine variables boolean inSafeZone = false; String zoneName = ""; StringBuffer safeBuffer = new StringBuffer(); // Go through the Java code a line at // a time looking for safe zones String line; while( ( line = reader.readLine() ) != null ) { if ( inSafeZone ) { if (endPattern.matcher( line ).matches()) { 
    safeZones.put( zoneName, safeBuffer.toString() ); inSafeZone = false; safeBuffer.setLength( 0 ); } else { safeBuffer.append( line + "\n" ); } } else { Matcher match = startPattern.matcher( line ); if ( match.matches() ) { inSafeZone = true; zoneName = match.group( 1 ); } } } reader.close(); } // For each zone we find add a node // in the input DOM for (Enumeration sz = safeZones.keys() ; sz.hasMoreElements() ; ) { String zoneName = ( String )sz.nextElement( ); String zoneValue = ( String )safeZones.get( zoneName ); 
    Element elem = _document.createElement( "safezone" ); elem.setAttribute( "id", zoneName ); elem.appendChild ( _document.createCDATASection( zoneValue ) ); root.appendChild( elem ); } }

    The important parts are where we create and use the start and end regular expressions. We use these to find the start comment and to pick out the safe-zone ID. The heart of the reader is a state machine. It starts off outside the safe zone, then switches state inside the safe zone, and back out when the safe-zone ends. Each time we reach the end of a safe zone we push the zone into a hash table of zones using the zone ID as a key.

    When the entire file is read we augment the input XML DOM with some extra tags that contain CData sections for the safe-zone code.

    That was really the majority of the work. The next step is to run Saxon with the template file:

     private void runTransform() throws Exception { Source xmlSource = new DOMSource( _document ); TransformerFactory transFact = TransformerFactory.newInstance( ); Transformer trans = transFact.newTransformer( new StreamSource( new File( _templateFile ) ) ); 
    trans.transform( xmlSource, new StreamResult( System.out ) ); }

    This code uses the TransformFactory; you could get any XSLT engine. For the example, however, I used Saxon 6.5.3.

    The rest of the code just runs the methods in the proper order:

     public void run() { try { readInputFile(); scanForSafeZones(); runTransform(); } catch( Exception e ) { System.out.println( e ); } } static public void main( String args[]) { SafeGen gen = new SafeGen( "input.xml", "main.xsl" ); gen.run(); } }

    To finish off the generator with the safe zones we need to make some minor modifications to the XSLT template from the previous example. To start with, we need to build Order.javainstead of OrderBase.java. So this line:

    <xsl:variable name="filename" select="concat('output/',@name,'Base.java')" />


    <xsl:variable name="filename" select="concat('output/',@name,'.java')" />

    More importantly, this section:

    <xsl:for-each select="$class/Operation"> public <xsl:value-of select="concat( @returnType, ' ', @name)" />( Code Wrap
     <xsl:call-template name="operation-params"><xsl:with-param name="operation" Code Wrap
     select="." /></xsl:call-template> ) { <xsl:variable name="zoneid" select="concat( $class/@name, '-', @name)" /> <xsl:if test="@returnType='double'">return 0.0;</xsl:if> } </xsl:for-each>


    <xsl:for-each select="$class/Operation"> public <xsl:value-of select="concat( @returnType, ' ', @name)" />( Code Wrap
     <xsl:call-template name="operation-params"><xsl:with-param name="operation" Code Wrap
     select="." /></xsl:call-template> ) { <xsl:variable name="zoneid" select="concat( $class/@name, '-', @name)" /> 
    // START-SAFE(<xsl:value-of select="$zoneid" />) <xsl:value-of select="$safezones/safezone[@id=$zoneid]" />// END-SAFE <xsl:if test="@returnType='double'">return 0.0;</xsl:if> } </xsl:for-each>

    The really tricky part here is to make sure that you can run the template multiple times without adding any extra returns or spaces. This means that the template must be absolutely symmetric. This version is because the // END-SAFE text is on the same line as the xsl:value tag.

    There are a few extras that aren't accounted for by the example:

    • The output files should be backed up before they are overwritten.
    • If you have any safe-zone regions that were not used in the output those should be reported to the user.
    • The generator should not write the file if the contents haven't changed. This will save you expensive re-compiles.
    • There need to be sections of comments at the top and the bottom of the output that say that the file was generated, that it should not be modified, and how it could be regenerated.

    One More Alternative

    There is another alternative for code generation combined with custom code in the same file.

    You can invert the safe-zone model by having sections of the code where the generator is able to add in code. These sections are delimited by comments that the generator looks for in the input file. Here is an example of what it might look like in the context of our example:

    public class Order { 
    // START-GENERATED // <Attribute name="number" type="integer"/> // <Attribute name="date" type="date"/> private int number; public int getnumber() {return this.number;} public void setnumber(int number) {this.number=number;} private Date date; public Date getdate() {return this.date;} public void setdate(Date date) {this.date=date;} 
    // END-GENERATED private Customer customer; public Customer getCustomer() {return this.customer;} public void setCustomer(Customer customer ) {this.customer=customer;} public double getTotalAmount( boolean calculateVAT ) { return 0.0; } }

    The generator would look for the code block bracketed by theSTART-GENERATED and END-GENERATEDsections and update that with new code as required.

    Microsoft uses this technique in its developer tools that aid in using the Microsoft Foundation Classes. In particular, they manage blocks of event routing code within special commented sections.

    The Downsides

    Let's step back a little bit here at the end. Starting with some downsides and stuff to look out for, and then finishing up with more advantages of generation.

    Code generation is controversial. The primary concern is that code generation is a design smell... meaning that if you have to write so much code for a platform, then the platform itself is probably bad. Surprisingly, I agree in full with this argument. I think a platform that requires an excessive amount of coding is a design problem. Unfortunately, we are often stuck with platforms that are, at best, not optimal, like J2EE. Code generation, in that case, is not the design smell, but the solution to the design smell that you are stuck with.

    There are other arguments against code generation, far too many to put in this article. But there are lots of positives, as we have seen.


    I started this article by saying that code generation was important and it was something that you need to understand. Why is that? It's not just because today's frameworks are code-intensive. It's also because the code that generators build is far more consistent in form and quality than hand code.

    Using code generation also raises the level of abstraction. As you can see in this example, the business model, and some of the logic, is actually in an XML model and not in the code. That means you can port your business model to other languages and technologies much more easily than you could port source code.

    Using the two techniques I have presented here can give you a smoother approach to generating sections of your code than you would have had if it were an all-or-nothing proposition.