From Writing Programs to Creating Compilers Blog

Version 2



    Start Here: A Readymade Gra mmar for Java
    Modifying the Java Grammar
    Producing Output
    Going Deeper, Changing ClassDeclaration
    Finally, the AJ Specialty: TaskDeclaration
    We're Ready to Test!
    Life Beyond AJ

    In this article we build a simple compiler that augments Java with tasks (independent blocks of code that execute in parallel), thus creating a new language called AJ that well supports the programming of systems with concurrent activities. This is not to imply that Java does not support multithreading, but to create a syntax that enables the clear expression of concurrency requirements. (AJ programs do, in fact, compile into Java programs that use java.lang.Thread objects to support tasks.) You will see how to use a completely visual, user-friendly parser generator, VisualLangLab, to quite easily build a compiler for AJ.

    Start Here: A Readymade Grammar for Java

    VisualLangLab is a freely downloadable (GPL) project. To install it, just unzip the distribution file ( into a directory. The visual parser generator (see Figure 1) is started by clicking on VisualLangLab.jar. The only other thing to keep in mind is that VisualLangLab.jar must be on the classpath when compiling or executing the generated parser.

    Figure 1. The VisualLangLab GUI
    Figure 1. The VisualLangLab GUI

    The heart of any compiler project is the grammar--a typically cryptic piece of text that serves as a specification for the language. Since we wish to augment the capabilities of an existing language (Java), it is simpler to start with an existing grammar for the base language. The starting point for this exercise will therefore be a Java 1.4 grammar included in the VisualLangLab distribution (the file java14C.lkv). Well-written grammars, like well-written programs, are divided into smaller units calledrules. A few sample rules from the Java 1.4 grammar (as displayed by the VisualLangLab GUI) are shown in Figure 2 (IfStatement), Figure 3 (SwitchStatement) and Figure 4 (UnaryExpression).

    Figure 2. Rule IfStatement in Java 1.4 grammar
    Figure 2. The rule IfStatement in Java 1.4 grammar

    Figure 3. The rule <code>SwitchStatement</code> in Java 1.4 grammar
    Figure 3. The rule SwitchStatement in Java 1.4 grammar

    Figure 4. The rule <code>UnaryExpression</code> in Java 1.4 grammar
    Figure 4. The rule UnaryExpression in Java 1.4 grammar

    Note that unlike other parser generator tools, a VisualLangLab grammar is visual (trees with intuitive icons that can be viewed and edited using interactive menus and commands). The .lkv files produced and used by VisualLangLab are not text files, and therefore not human-readable. These files contain everything: token definitions, parser rules, and any semantic actions defined in the grammar. A grammar file (such as java14C.lkv) can be opened using File -> Open. When a grammar file is opened, the top-level rule (or start symbol) is displayed first (see Figure 5). The combo box at the top of the tree display (or the Goto item on the grammar tree's pop-up menu) can be used to display other rules. Refer to the documentation (docs/index.html in the distribution kit) for more details.

    Modifying the Java Grammar

    Readers unfamiliar with the terms and technology used here can find help in the books cited below, but most should be able to follow the discussion even without a deep background in compiler technology. Running VisualLangLab and actually performing the steps described will help. The VisualLangLab GUI allows readers to find their way around quite easily. Obviously, having the documentation (docs/index.html in the distribution kit) around also helps.

    All grammars have a start symbol (equivalent to a Java application's main function) that is the entry point to the grammar. The Java 1.4 grammar's start symbol is calledCompilationUnit, and appears as shown in Figure 5. As you can see, the start symbol's root node is distinguished by rendering it on a green background.

    Figure 5. CompilationUnit (the Java 1.4 start symbol)
    Figure 5. CompilationUnit (the Java 1.4 start symbol)

    Before we move on, we must handle a fundamental issue--the grammar in java14C.lkv has no semantic actions (application-specific code to be executed when a rule or rule element matches the input). This means that a parser created from it will verify that its input complies with the grammar, but do nothing else (not even produce any output). So the first change we will make is to add semantic actions that output the requiredgenerated code.

    Our approach to generating output will be based on the two following guiding principles:

    • Identify sub-trees (at high levels of the grammar) that will remain unchanged in AJ. Since the AJ compiler's generated code is plain (sans-"task") Java, the generated code for such sub-trees can be produced by a semantic action that merely outputs a copy of the part of the input text that matches the sub-tree.
    • Identify sub-trees (at high levels) that will change in AJ. These sub-trees are left without semantic actions--allowing code-generation responsibility to devolve (recursively) to the lowest possible levels in the rule tree. Custom code-generating logic (as semantic actions associated with the rule elements) is added to suitable sub-trees at the lower levels of the grammar tree to translate AJ constructs into the required plain Java code.

    Further, to keep things simple, all generated-code is just written out to the standard output (java.lang.System.out). Command-line I/O redirection, as described below, can be used to get the generated code into a file.

    Producing Output

    Before we begin, let's discuss the apex rule (CompilationUnit depicted in Figure 5) in more detail. The locomotive icon at the root indicates that the child nodes are a sequence (they must follow each other in the given order). The blue arrows (used for PackageDeclaration,ImportDeclaration, and TypeDeclaration) represent references to other grammar rules, and the orangeslotted disk (EOF) represents a token (anatomic piece of the input (e.g., if, 3.25,while, and +=, which a parser never has to break down any further). EOF is a pseudo-token representing the end of the input stream. What this means is that a unit of Java (a file containing Java source) contains an optional (notice the?) PackageDeclaration, followed by 0 or more (notice the *) ImportDeclarations, followed by 0 or more TypeDeclarations.

    Since the grammars for PackageDeclaration andImportDeclaration remain unchanged in AJ, we can produce the required output by merely copying the input to the output. (Remember, the AJ compiler's generated code is in plain Java.) A semantic action to copy the text of anyPackageDeclaration to the output is shown in Figure 6. (Semantic actions are snippets of Java mixed with special separators and macros entered in the Code tab below the grammar tree.) Semantic actions can be associated with each node of a grammar rule, and are executed when the input text matches the pattern specified by the associated node.

    Figure 6. Semantic actions for PackageDeclaration reference
    Figure 6. Semantic actions for PackageDeclarationreference

    To understand how this works, we need to know the function of%%, $B$ and $Z$. Each%% is a separator, thus splitting the code into up to four sections (of which we only use the second and fourth here). The second section (int begin = $B$;) is executed (by the generated parser) when the first element of aPackageDeclaration (the keyword package, in this case) is found, and the fourth section (int end = $B$; System.out.print( $Z$.substring( begin, end ) );) is executed after the last element of the declaration (the; at the end of the package declaration, in this case) has been found and processed. Each $B$ translates into a source-code offset at that point (so the variablebegin will be initialized to the position of the word "package" in the source, while the variable end will be initialized to the beginning of whatever token follows the; at the end of the package declaration). The$Z$ represents a String containing the entire source code. It is therefore obvious that theprintln will output all of the source code of the package declaration (including any white space and comments preceding the next token in the input stream).

    The following additional remarks also apply toCompilationUnit (see Figure 6). The same copy verbatim semantic action (described in the preceding paragraph) should also be used for theImportDeclaration, reference as the grammar of that rule also does not change in AJ. However, we let the reference toTypeDeclaration remain without any semantic action, to be handled at lower levels of the grammar rule tree (where AJ differs from Java).

    Going Deeper, ChangingClassDeclaration

    Having completed CompilationUnit, we move on toTypeDeclaration, which is depicted in Figure 7. TheInterfaceDeclaration reference and the tokenSEMI will remain unchanged in AJ, and so are also equipped with the same copy verbatim semantic action described above. Again, we leave alone theClassDeclaration reference, to be handled at still deeper levels of the grammar tree.

    Figure 7. TypeDeclaration (unchanged from java14C.lkv)
    Figure 7. TypeDeclaration (unchanged fromjava14C.lkv)

    The special locomotive symbol (with a cyan disk superimposed) represents a syntactic lookahead. The first child node of such a sequence is a rule that exists only to check the structure of the input text at that point--and is required to disambiguate the grammar when LL(k) logic is not strong enough. An important thing to note about such lookahead rules is that neither the reference itself nor any element within such a rule should carry a semantic predicate.

    Moving on to ClassDeclaration (shown in Figure 8), we see that this is where the AJ grammar starts to differ from plain Java. For reasons the reader can easily discover by probing around in the java14C.lkv grammar (and this is left as an exercise), the ClassDeclaration rule for AJ is most effective when revised to that shown in Figure 9.

    Figure 8. Original ClassDeclaration (as in java14C.lkv)
    Figure 8. Original ClassDeclaration (as injava14C.lkv)

    Figure 9. New ClassDeclaration (modified for AJ)
    Figure 9. New ClassDeclaration (modified forAJ)

    The semantic actions for this rule are spread out in several different locations (the first of which is the rule's root node, as depicted in Figure 9). The second piece is associated with the token LCURLY, as shown in Figure 10. (Observe that the tabbed pane below the rule-tree displays information about the selected node.)

    Figure 10. Semantic action for LCURLY in ClassDeclaration
    Figure 10. Semantic action for LCURLY inClassDeclaration

    The two semantic actions described above (associated with the root node of ClassDeclaration and theLCURLY node respectively) result in the copying out of everything from the beginning of any ClassDeclarationup to the { preceding the member declarations (since this part of the Java specification is unchanged in AJ). The third piece of ClassDeclaration's semantic action is associated with the ClassBodyDeclaration reference, as depicted in Figure 11.

    Figure 11. Semantic action for ClassBodyDeclaration reference
    Figure 11. Semantic action forClassBodyDeclaration reference

    Note that this code is our old faithful copy verbatim semantic action. We can use this semantic action here because the rule ClassBodyDeclaration (inherited from java14C.lkv) describes syntactic structures in Java that remain valid in AJ (and source code matching that rule can therefore be copied over unchanged).

    Another interesting thing to note inClassBodyDeclaration (Figure 11) is the reference toTaskDeclaration. As will be explained below, this rule specifies the structure of a task, and represents our new addition to the Java grammar. The two references,ClassBodyDeclaration and TaskDeclaration, have been combined using a choice element, so, in effect, the task construct adds on to the structural options permitted by ClassBodyDeclaration at this point in the grammar for ClassDeclaration. (Check out the definition of ClassBodyDeclaration to understand how this works.)

    Finally, the AJ Specialty:TaskDeclaration

    So, we finally reach TaskDeclaration--our very own addition to AJ. In the discussion of ClassDeclarationabove, we mentioned that the reference toTaskDeclaration is not equipped with any semantic action--thus relegating responsibility for code generation to the rule itself, as described below.

    The semantic action for TaskDeclaration is spread over four locations, as described below. The first location is the rule's root node itself (as in Figure 12), and the code here helps to define required variables.

    Figure 12. Semantic action for TaskDeclaration root node
    Figure 12. Semantic action for TaskDeclarationroot node

    The second location is the node IDENTIFIER (as in Figure 13), and that code helps to capture the name of the task being defined. (The $$, used in Figures 13 and 14, translates into a type-safe value notation for the current token.)

    Figure 13. Semantic action for IDENTIFIER node of TaskDeclaration
    Figure 13. Semantic action for the IDENTIFIER node of TaskDeclaration

    The third location is the node INT (as in Figure 14), and the associated code helps to capture the priority value.

    Figure 14. Semantic action for INT node of TaskDeclaration
    Figure 14. Semantic action for the INT node ofTaskDeclaration

    The fourth (and last) location is the nodeBlockStatement, as depicted in Figure 15. The code in this semantic action uses the information gathered by the preceding sibling nodes, and uses all of it to instantiate ajava.lang.Thread object (with the task's name) to run the code in the BlockStatement within itsrun() method.

    Figure 15. Semantic action for BlockStatement node of TaskDeclaration
    Figure 15. Semantic action for the BlockStatementnode of TaskDeclaration

    Careful application of the principles described above (the function and use of %%, $B$, and$Z$) will help you understand the text transformation performed by this semantic action. The following two code snippets illustrate the transformation of task blocks.

    /* * Sample input 
    task code */ task incrAge priority 5 { while (true) { age += 10; try { Thread.sleep( 3000 ); } catch (Exception e) {} } }
    /* * Generated (
    plain Java) output code */ private Thread 
    incrAge = new Thread() { public void run() { Thread.currentThread().setPriority( 
    5 ); 
    while (true) { 
    age += 10; 
    try { 
    Thread.sleep( 3000 ); 
    } catch (Exception e) {} 
    } }

    The underlined portions of the output text are generated using information directly obtained from the AJ source. The lineThread.currentThread().setPriority( 5 ); is, of course, only generated if a priority value is explicitly provided in the TaskDeclaration.

    We're Ready to Test!

    All that remains now is to generate, compile, and test the AJ compiler. Generate the compiler by selecting the File -> Write code menu choice (which should produce a file called AJ.javain the same directory as the grammar file). Then use the Java compiler: javac (and remember to keepVisualLangLab.jar on the classpath). To make sure that all's well with the grammar, use the file AJ.lkv from thedemos/AJ directory of the distribution. Finally, to test the compiler, try to process the following AJ program (fileaj00.aj in the distribution).

    /* * Sample AJ program (
    aj00.aj) */ public class aj00 { task aa { for (int i = 0; i < 5; ++i) { ++count; try { Thread.sleep(2000); } catch (Exception e) {} } } task b priority 7 { for (int i = 0; i < 10; ++i) { System.out.println( count ); try { Thread.sleep( 2000 ); } catch (Exception e) {} } } int count = 0; public static void main( String a[] ) { aj00 me = new aj00(); me.aa.start(); me.b.start(); } }

    To process the file, use the command line java > AJ -f aj00.aj (with VisualLangLab.jaron the classpath). Note that the output has been redirected, so that file receives the generated Java code, and should be compiled (as a Java program: javac Finally, running the program (using java aj00) should produce the output shown below.

    1 1 2 3 4 5 5 5 5 5

    Life Beyond AJ

    Java, and other third-generation languages, are used as building blocks for solutions in a wide range of domains. Since these languages do not support many domain-specific concepts, the resulting code sometimes looks low-level, unintuitive, or even contrived. The approach described here can be used to augment programming languages (or create new ones) to produce more expressive (and therefore more elegant, compact, and supportable) code.

    For many situations, a simple tool (such as VisualLangLab) may be adequate by itself. And for situations where it isn't, this article should have provided a firm basis on which to build the capability to use more sophisticated tools.


    1. Programming Language Processors in Java, David A. Watt and Deryck F. Brown, Pearson Education (2000).
    2. Modern Compiler Implementation in Java, A.W. Appel, Cambridge University Press (1997).
    3. Compilers: Principles, Techniques and Tools, A.V. Aho, R. Sethi, and J.D. Ullman, Addison Wesley (1986).