Updated Feb. 22nd 2011: after some very good feedback from (among others) Mark Wielaard, Florian Weimer, Roman Divacky and Chris Lattner himself, I decided to re-run my tests with new compiler versions (Clang trunk rev. 125563 and GCC 4.5.2) and an improved Clang configuration which now finally fully enables precompiled header support for the Clang build.

At the FOSEDM 2011 I've heared Chris Lattner's very nice "LLVM and Clang" keynote. The claims he made in his talk have been very impressing: he was speaking about Clang being a "production quality" "drop-in replacement" for GCC with superior code generation and improved compile speed. Already during the talk I decided that I would be interesting to prove his pretensions on the HotSpot VM which in generally is not known as the worlds most simple C++ project. Following you can find my experiences with Clang and a  small Clang patch new Clang patch for the OpenJDK if you want to do some experiments with Clang yourself.

GCC compatibility

GCC is the standard C/C++ compiler on Linux and available on virtually any Unix platform. Any serious challenger should therefore have at least a GCC compatibility mode to ease its adoption. Clang pretends to be fully GCC compatible, so I just created a new Clang configuration by changing some files and creating some new, Clang specific ones from their corresponding GCC counterparts:

> hg status -ma
M make/linux/makefiles/buildtree.make
M src/os_cpu/linux_x86/vm/os_linux_x86.cpp
M src/share/vm/adlc/output_c.cpp
M src/share/vm/utilities/globalDefinitions.hpp
A make/linux/makefiles/clang.make
A make/linux/platform_amd64.clang
A src/share/vm/utilities/globalDefinitions_clang.hpp

and started a new build (for a general description of the HotSpot build process see either the README-builds file or the more detailed but slightly outdated explanation in my previous blog):

> ALT_BOOTDIR=/share/software/Java/jdk1.6.0_20 \
  ALT_OUTPUTDIR=../output_x86_64_clang_dbg \
  make jvmg USE_CLANG=true

One of the very first observations is the real HUGE amount of warnings issued by the compiler. Don't get me wrong here - I really regard this as being a major feature of Clang, especially the clear and well-arranged fashion in which the warnings are presented (e.g. syntax colored, with macros nicely expanded). But for the current HotSpot code base this is really too much. Especially the issue "6889002: CHECK macros in return constructs lead to unreachable code" leads to a bunch of repeated warnings for every single compilation unit which make the compilation output nearly unreadable. So before I started to eliminate the warnings step by step I decided to turn the warnings off all together in order to get a first impression of the overall compatibility and performance:

> ALT_BOOTDIR=/share/software/Java/jdk1.6.0_20 \
  ALT_OUTPUTDIR=../output_x86_64_clang_dbg \

Except for the -fcheck-new option, Clang seems to understand all the other compiler options used during the HotSpot build process. For -fcheck-new a warning is issued advertising that the option will be ignored. So I just removed it from make/linux/makefiles/clang.make. I have also removed obvious workarounds for some older GCC versions in the new Clang files which were derived from their corresponding GCC counterparts. The following compiler options have been used in the dbg and opt build respectively:

dbg-options: -fPIC -fno-rtti -fno-exceptions -m64 -pipe -fno-omit-frame-pointer -g -MMD -MP -MF
opt-options: -fPIC -fno-rtti -fno-exceptions -m64 -pipe -fno-omit-frame-pointer -O3 -fno-strict-aliasing -MMD -MP -MF

Besides this, I only had to change the source code of two files to make the HotSpot compilable by Clang. The first change was necessary only because the ADLC part of the make does not honor the general warning settings of the HotSpot build and always runs with-Werror. Here's the small patch which prevents a warning because of an assignment being used as a Boolean value:

-- a/src/share/vm/adlc/output_c.cpp    Tue Nov 23 13:22:55 2010 -0800
+++ b/src/share/vm/adlc/output_c.cpp    Wed Feb 09 16:39:30 2011 +0100
@@ -3661,7 +3661,7 @@
     // Insert operands that are not in match-rule.
     // Only insert a DEF if the do_care flag is set
-    while ( comp = comp_list.post_match_iter() ) {
+    while ( (comp = comp_list.post_match_iter()) ) {
       // Check if we don't care about DEFs or KILLs that are not USEs
       if ( dont_care && (! comp->isa(Component::USE)) ) {

Updated Feb. 22nd 2011: I decided to leave the file output_c.cpp untouched and instead change the ADLC make file adlc.make to use the same warning flags like the main HotSpot make instead of using -Werrer.

--- a/make/linux/makefiles/adlc.make    Wed Feb 16 11:24:17 2011 +0100
+++ b/make/linux/makefiles/adlc.make    Tue Feb 22 12:59:37 2011 +0100
@@ -60,7 +60,7 @@
 # CFLAGS_WARN holds compiler options to suppress/enable warnings.
 # Compiler warnings are treated as errors
-CFLAGS_WARN = -Werror

The second change was necessary because of a strange inline assembler syntax which was used to assign the value of a register directly to a variable:

diff -r f95d63e2154a src/os_cpu/linux_x86/vm/os_linux_x86.cpp
--- a/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Tue Nov 23 13:22:55 2010 -0800
+++ b/src/os_cpu/linux_x86/vm/os_linux_x86.cpp  Wed Feb 09 16:45:40 2011 +0100
@@ -101,6 +101,10 @@
   register void *esp;
   __asm__("mov %%"SPELL_REG_SP", %0":"=r"(esp));
   return (address) ((char*)esp + sizeof(long)*2);
+#elif CLANG
+  intptr_t* esp;
+  __asm__ __volatile__ ("movq %%"SPELL_REG_SP", %0":"=r"(esp):);
+  return (address) esp;
   register void *esp __asm__ (SPELL_REG_SP);
   return (address) esp;
@@ -183,6 +187,9 @@
   register intptr_t **ebp;
   __asm__("mov %%"SPELL_REG_FP", %0":"=r"(ebp));
+#elif CLANG
+  intptr_t **ebp;
+  __asm__ __volatile__ ("movq %%"SPELL_REG_FP", %0":"=r"(ebp):);
   register intptr_t **ebp __asm__ (SPELL_REG_FP);

Updated Feb. 22nd 2011: to compile the newest HotSpot tip revision another small change was necessary to overcome a problem with a method name look-up of an non-dependent method name in dependent base classes (see M. Cline's C++ FAQ 35.19 for a nice explanation). This was wrongly accepted by GCC (see GCC bug 47752) but it will be correctly rejected by Clang. The problem is tracked as bug 7019689 and will be hopefully fixed soon in the HotSpot code base:

diff -r 55b9f498dbce -r c83e921b1bf7 src/share/vm/utilities/hashtable.hpp
--- a/src/share/vm/utilities/hashtable.hpp      Thu Feb 10 16:24:29 2011 -0800
+++ b/src/share/vm/utilities/hashtable.hpp      Wed Feb 16 11:09:16 2011 +0100
@@ -276,7 +276,7 @@
   int index_for(Symbol* name, Handle loader) {
-    return hash_to_index(compute_hash(name, loader));
+    return this->hash_to_index(compute_hash(name, loader));

In summary, the overall compatibility can be rated as very good. Taking into account that the newly build VM could successfully run the SPECjbb2005 * benchmark it seems that also the code generation went mostly well although more in depth tests are probably required to ensure full correctness (well - at least the same level of correctness known from GCC).

Compilation performance and code size

After the build succeeded, I started to do some benchmarking. I measured the time needed for full debug and opt builds with one and three parallel build threads respectively.  As you can see in table 1, the results are very clear: Clang 2.8 is always significantly (between two and three times) slower than GCC 4.4.3:

Table 1: Resulting code size and user (wall) time for a complete HotSpot server (C2) build compared to GCC 4.4.3
GCC 4.4.3 1GCC 4.5.2 1Clang 2.8 2Clang trunk 3Clang trunk 4GCC 4.4.3 1GCC 4.5.2 1Clang 2.8 2Clang trunk 3Clang trunk 4
opt5m04s4m55s97%10m45s212%3m10s63% 3m05s3m03s99%6m12s201%2m01s65% 
 libjvm.so size5 

Honestly speaking these numbers where somehow disappointing for me - especially after Chris Lattner's talk at FOSDEM. I haven't done a more in depth research of the reasons but I suspect the shiny results presented at the conference are mainly based on the fact that they focus more on Objective-C than on C++ and they have been measured against older 4.0 and 4.2 version of GCC. This assumption was also confirmed after looking at the Clang Performance page.

Updated Feb. 22nd 2011: I've written the previous paragraph under the impression of my first measurements. It turned out however, that the Clang build was not using precompiled headers properly. This is because Clang is not fully GCC compatible with respect to precompiled header files. GCC transparently searches for a precompiled version of directly included header files whereas Clang only considers a precompiled version for headers which are included explicitly on the command line as prefix headers with the -includeoption (see the Precompiled Headers section of the Clang Users Manual). The HotSpot project uses a precompiled header file which is directly included in most of the source files, but for the reasons just mentioned, this has no effect with Clang - it just uses the bare header file instead of the precompiled version.

To successfully enable PCH support for Clang, I had to change the Clang configuration such that it emits corresponding "-include precompiled.hpp" compiler flags for the files (and only for them) which include precompiled.hpp directly. This didn't work correctly with Clang 2.8, where it led to strange errors during compilation, but with a brand new trunk version from SVN (rev. 125563) the problems were gone. As you can see in the columns labeled "Clang trunk3", this roughly doubled the compilation speed in the debug build and made the opt build more than tree times faster! Compared to GCC 4.4.3, this still ranks Clang at about 150% for the debug build, but already for the opt build Clang now considerably outperforms GCC and uses only 65% of the time required by GCC for the full build.

Another point that concerned me during the first measurements was the size of the resulting shared library. While the size was basically the same for the opt build, the Clang debug build produces a huge, ~700MB file which is nearly seven times larger compared to the results produced by GCC.  I haven't looked into this deeper either - perhaps some Clang/LLVM wizard can comment on this topic? It turned out that this was a known problem which can be partially worked around by using the -flimit-debug-info flag. As you can see in the columns labeled "Clang trunk4" this not only reduces the size of the resulting shared library by about 50%, it also makes the debug build up to 15% faster compered to the corresponding GCC build.

Runtime performance

After I had successfully compiled the HotSpot I decided to run some benchmarks to see what the code quality of the Clang generated HotSpot is. Because I know that for the SPEC JVM98 benchmark the VM spents most of the time (about ~98% if we have a proper warm-up phase) in compiled code, I decided to use SPECjbb2005 * which at least does a lot of garbage collection and the GC is implemented in C++ in the HotSpot VM.

For the tests I used an early access version of JDK 7 (b122) with a recent HotSpot 20 from http://hg.openjdk.java.net/jdk7/jdk7/hotspot. The exact version of the JDK I used is:

> java -version
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b122)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b03, mixed mode)
> java -Xinternalversion
Java HotSpot(TM) 64-Bit Server VM (20.0-b03) for linux-amd64 JRE (1.7.0-ea-b122),
built on Dec 16 2010 01:03:29 by "java_re" with gcc 4.3.0 20080428 (Red Hat 4.3.0-8)

As you can see, the original HotSpot was compiled with GCC 4.3.0 while I used 4.4.3 on my local machine. The SPECjbb2005 benchmark was configured to use 16 warehouses. I have compared the scores of the two versions compiled by me with GCC and Clang respectively with the score achieved by the original HotSpot version from the early access binary package:

Table 1: SPECjbb2005 score
 JDK 1.7.0-ea-b122, HotSpot 64-Bit Server VM (20.0-b03)
GCC 4.3.0GCC 4.4.3GCC 4.5.2Clang 2.8Clang trunk

Again, the Clang compiled code loses against its GCC counterpart. It is approximately 4% slower. One feature which was actively promoted at the FOSDEM presentation was link time optimization (LTO). Unfortunately I couldn't get this running with Clang 2.8 on my Linux box. I searched the web a bit and found the following interesting blog: "Using LLVM's link-time optimization on Ubuntu Karmic". However, it only describes how to get LTO working with llvm-gcc, which is a GCC front end based on LLVM. Clang itself only seems to support LTO on MacOS X out of the box.

Updated Feb. 22nd 2011: I also did performance measurements for the two new compiler version, but here the results didn't changed significantly, so I just add the new numbers here for reference. (Notice that the results oscillated +/-1% during benchmarking, so the actual differences shouldn't be taken too seriously.)


Updated Feb. 22nd 2011: While the overall GCC compatibility is excellent  the Clang compile times and and the compile times are impressive, the performance of the generated code is still lacking behind a recent GCC version. Nevertheless, Clang has an excellent C/C++ front end which produces very comprehensive warnings and error messages. If you are developing macro intensive C or heavily templateized C++ code, this feature alone can save you much more time than you loose trough longer compile times. Taking into consideration Clangs nice design and architecture and the fact that it must still be considered quite new, I think it may become a serious challenger for the good old GCC in the future.


Please note that theSPECjbb2005 results published on this page come from non-compliant benchmark runs and should be considered as published under the "Research and Academic Usage" paragraph of the "SPECjbb2005 Run and Reporting Rules"