Tackling Java Performance Problems with Janino Blog

Version 2


    Performance is important. For the types of large-scale multi-user applications that Java is commonly used to develop, it can be vital. Unfortunately, identifying performance problems before they occur is very difficult and fixing them afterwards is usually very costly. In this article I outline an approach to boosting the performance of Java code using the embedded compiler library Janino. It can be applied to some performance problems even after all other approaches have been exhausted.

    It is a simple observation that software cannot be faster than the layer beneath it. It follows that improving a program's performance may require "dropping" to a lower layer to write performance-critical sections of code. This is a standard technique that has evolved with each new generation of languages. C developers revert to assembly language; Java developers to C; and even JavaScript developers to Java.

    This is a bitter situation for Java developers because one of Java's main benefits is the homogenous environment it provides to insulate applications from the host system. Revert to a lower-level language and that is lost. However, it turns out that Java developers can play this game by compiling their application logic directly into Java. This effectively moves the evaluation of domain-specific languages to lower-level Java bytecode, but it's important to see this approach in the larger context.

    Performance: Traditional Approaches

    The implications of building slow Java applications go beyond the observed performance of the delivered system. It has an impact on developer productivity, deployment costs, and the composability of the application into larger systems. So what are the options for companies that have developed a Java application but need better performance?

    Employ better algorithms.
    In my experience, this is the safest initial approach to improving performance. Profiling an application and then changing poorly performing objects usually requires only local code changes. These are generally easy to test for compatibility with the previous implementation.
    Deploy better hardware.
    This is often the option that provides the greatest initial return in performance. Doubling the power of a single server is often cheap compared to the developer hours to make software faster. But this does not take into account many complicating factors including the needs of developers, the possibility that the application will be deployed multiple times, and the scope for making hardware upgrades.
    Depend on faster software.
    It is an obvious fact that some Java code (in the form of libraries, application servers, database drivers, etc.) performs better than other code. Ensuring that your application depends on only well-performing external code is obviously a good strategy for ensuring good performance. Unfortunately, this is difficult to determine beforehand, since the performance characteristics for any library are dependent on its operating environment and that is usually uncertain until the latest phases of development.
    Improve the application architecture.
    This is often the last resort due to the human resources it requires. Often a system will need to be partially rewritten and then fully reevaluated after each architectural change. For sufficiently large projects, localized architectural changes are possible, but then in these circumstances the risks are commensurately higher.

    It's frightening how quickly you can exhaust the obviously available approaches, especially on complex systems where the room for independent maneuver is small. So what's left?

    The Janino Library

    Janino is an open source (BSD-licensed) embedded Java compiler. It is not a compiler that a developer would use to build an application. Instead, it's designed to be used within Java applications to compile Java code on the fly.

    Any Java developer who has used JSP will be familiar (if only indirectly) with the technique of dynamically compiling Java source code (which is embedded within web pages in the case of JSP) into classes "on demand." They might also be unfortunate enough to be familiar with the complications that can arise from it. The reason that this technique has proven awkward in the past is that any code that needs to use this technique requires a JDK to be installed (not just a JRE). This can introduce licensing issues because the JDK is not (yet) freely distributable. In addition, platform-specific configuration is needed to locate an appropriate JDK and to identify the application's class files. Portable implementations also require that compilation occurs in a separate system process, a detail that can cause performance issues of its own.

    Despite these problems, open source libraries such as Jasperhave provided excellent implementations based on this approach. The Java-based build tool Ant also needs to cope with these complications when compiling Java code. Nevertheless, for the reasons outlined above, the dynamic compilation of Java code is usually regarded as a last resort--in the case of JSP, the specification requires it--and certainly not generally sound practice within an application.

    Janino improves on this situation in three different ways:

    • Instead of passing the work onto javac (or an equivalent Java compiler), Janino is a Java library that runs in the same JVM as your application. No extra configuration or JDK installation is necessary.

    • Rather than requiring access to your application's class files and .jar files, Janino obtains classes directly from the JVM. This means that there are no problems concerning file permissions or build path configurations.

    • Janino provides easy-to-use primitives for compiling expressions, scripts, and classes. Developers using the library don't need to concern themselves with any of the technicalities of loading the dynamically generated code: it's done for them. At the simplest level, they can pass in a string containing Java code and get back an object.

    The broader implication of Janino's availability is that compiling source code within Java applications, which was once difficult, messy, and platform-specific, is now none of those things. It makes possible the targeting of performance hotspots through the technique of dynamically compiling code.

    Applying the Technique

    Much of the inefficiency in large applications arises from the need to make them as flexible as possible. This means that much of the program's functionality is devolved to individually configured units. To take an example, an application that is responsible for crunching data reports may depend upon a whole range of factors that all affect the final formatting of a report. These might include the user's locale, the server's locale, current data set, user preferences, user permissions, and so on. This means that most program actions need to be delegated many times. For complex operations over large datasets, this translates to poor performance. Dynamic code compilation can tackle problems like these by compiling the rules once instead of evaluating them for each operation.

    The effectiveness of this approach for a given application depends in general on the answers to two questions:

    • To what degree is the application configured at runtime?
    • To what degree is the performance problem localized in the configured code?

    If both are significant, then the application can probably benefit from dynamic compilation; otherwise, any gains are likely to be marginal. Naturally, the overhead of compilation is relatively high, so the application will need to be performing a significant amount of work to benefit. Specialized libraries that provide "opaque" services (e.g. searching, pattern matching, or numerical calculation) are also likely to be good candidates.

    To evaluate this technique for my own applications, I produced a factory class called BasicEvalFactory that can pre-parse a simple language for performing basic arithmetic over a set of numeric variables (multiplication, division, addition, and subtraction). The implementation is intentionally very basic but care was taken to ensure that evaluation was efficient and pre-optimized for good performance; remember, we want to compare Janino to code that is already optimized because that is the scenario we are in.

    The code for the BasicEvalFactory class is 400 lines long and contains a large number of inner classes; too long to list here, the source is freely available for download together with all the code used in the evaluation (see the Resources section). All of this code results in a satisfaction of this simple interface:

    public interface Evaluator { public double evaluate(Variables vars); }

    The evaluate method calculates the value of an expression based on the variables obtained from this interface:

    public interface Variables { double getVariable(String name); }

    I then produced a second implementation using Janino. Unlike the first implementation, this one is short enough to be listed in its entirety here. This is because I was able to simply translate the expressions I want to evaluate into Java expressions. The power of this technique is that Janino is doing the work of parsing the expression and the JVM is doing the work of evaluating it. As a result, much less code is needed and the implementation is simple enough to be listed here. It provides a straightforward example of how to use Janino:

    public class JaninoEvalFactory { private static Pattern PATTERN = Pattern.compile("([a-zA-Z]+)"); private static SimpleCompiler compiler = new SimpleCompiler(); public static Evaluator fromString(String string) { StringBuffer varCode = new StringBuffer(); Matcher matcher = PATTERN.matcher(string); Set names = new HashSet(); while (matcher.find()) { String name = matcher.group(0); if (names.contains(name)) continue; varCode.append("double " + name + " = vars.getVariable(\"" + name + "\");"); names.add(name); } String source = "package janinotest.eval;\n" +"public class JaninoEvaluator implements Evaluator {\n" +"\tpublic double evaluate(Variables vars) {\n" + "\t\t" +varCode + "\n" + "\t\treturn "+ string +";\n" + "\t}\n" +"}\n"; try { compiler.cook(new StringReader(source)); Class clss = compiler.getClassLoader().loadClass( "janinotest.eval.JaninoEvaluator"); Evaluator eval = (Evaluator) clss.newInstance(); return eval; } catch (Exception e) { throw new IllegalArgumentException(e.getMessage()); } } }

    The single static method in this class does the following work:

    1. The variables in the expression are identified usingPATTERN and are used to generate a Stringcontaining Java code to assign them values from aVariables instance that is to be supplied.

    2. This is combined with the supplied expression and a basic class definition to produce the Java source for an implementation of theEvaluator interface.

    3. The source is supplied to a Janino SimpleCompilerobject that cooks (compiles) the source code.

    4. The class is then loaded via the compiler's class loader and instantiated using reflection.

    5. The finished object is returned.

    For example, when supplied with the string "(x + y)/(x - y) * 100/(x*y)", the resulting source is:

    public class JaninoEvaluator implements Evaluator { public double evaluate(Variables vars) { double x = vars.getVariable("x"); double y = vars.getVariable("y"); return (x + y)/(x - y) * 100/(x*y); } }

    Those adopting Janino should make themselves aware that there are some limitations on the Java compilation Janino performs, though very few. Key among these are the lack of support for any Java 1.5 language features such as generics and the newfor-loop syntax.

    Comparing Performance

    I compared the performance of these two implementations using three expressions of varying complexity, and the results are charted below in Figure 1. Java was invoked without any flags. The command line used to launch the test is given below. To run the performance test yourself you will naturally need to adjust the classpath appropriately.

    java -cp janino.jar;. janinotest.eval.JaninoTestEval

    As one might expect, the benefits of compilation become more apparent with increasing complexity. The evaluation of themoderate expression shows a 7x speed improvement. These figures are not provided as a benchmark (nor do they prove that speed increases will be realized), but they are useful indication of the degree of improvement that developers can expect in some scenarios.

    • 0 (trivial)
    • 100 * x + 20 / 2 (simple)
    • (x + y)/(x - y) * 100/(x * y) (moderate)

    Chart demonstrating superiority of compiled execution.
    Figure 1. Chart demonstrating superiority of compiled execution

    It is vital that anyone who is looking to adopt Janino for their own projects perform their own evaluations; optimizations on established applications rarely produce simple wins. In an evaluation I conducted previously, I evaluated the performance of Janino for speeding up the substitution of tokens like${token-name} within Java strings. Figure 2 below shows a chart of the results. I compared the following implementations:

    A simple but efficient implementation using Java's regular expression parsing to replace tokens with strings from a map.
    An optimized implementation of naive-map that pre-stores the decomposition of the tokenized string.
    An implementation that pre-compiles the code to generate the string from map values.
    An optimized implementation that uses reflection to draw token values directly from a POJO (Plain Old Java Object).
    An implementation that pre-compiles the code to generate the string from map values.
    To benchmark the best possible time, this implementation draws values directly from a specific Java class.

    Chart demonstrating inferiority of compiled execution.
    Figure 2. Chart demonstrating inferiority of compiled execution

    To my surprise, Java's reflective method invocation turns out to be so fast (in this instance) that the JDK-compiled code outperforms Janino's. The optimized map implementation is also faster than the compiled version. These results arise from optimizations that are made by the Sun's Java compiler but not by Janino. In particular, the strategy for implementing the string concatenation operator differs between the compilers. The good news is that in response to this evaluation, Janino's string handling has since been optimized and its performance is now very competitive with that of javac.

    Anyone interested in running this performance test for themselves can do so by downloading the source (see Resources), compiling it, and executing it with the command line below (again with appropriate modifications to the class path).

    java -cp janino.jar;. janinotest.sub.JaninoSubTest

    The lesson to draw from this result is that the technique of optimizing code through dynamic compilation is valuable, but should only be applied in instances where performance gains can be proved. As ever, there is no silver bullet.

    Some Closing Ideas

    There are many situations that lend themselves to this approach. This list might provide some ideas.

    • Use of the Java Proxy class can be replaced by dynamically compiled classes that avoid all reflection, thereby providing a significant performance improvement.

    • Requests on the application that include user-defined functions (which would normally need to be parsed and then evaluated as a domain-specific language) can be compiled and then evaluated as Java code much more quickly.

    • Data records in which the fields are fixed, but not known at compile time, are typically implemented usingHashMaps. Using Janino, the known fields can be used to construct a specific private field for each record value. With some extra work, the fields can still be exposed through theMap interface. If the number of record types is low as a proportion of the number of records, the memory savings can be very significant.


    There are many more ways that Janino could be used in applications and I recommend that anyone who needs better-performing Java code investigate this approach.