Well, it turned out to not be as bad as I thought, and with a little help and donated code from Robert Field, I was able to create a small native library that does some basic BCI called java_crw_demo. This library is used by HPROF in JDK 5.0+ when doing BCI.
Turns out that this little library was very handy when it came to writing some demos of JVM TI and using BCI. The source to this library is provided in the demo/jvmti directory, so people are free to browse this C source. The java_crw_demo native library is a primitive classfile transformation library that will insert bytescodes at selected and limited locations in methods, returning a new classfile image. It is important that you understand the classfile layout as described in the Java Virtual Machine Specification.
WARNING: Do not take on a task like BCI lightly. I wrote java_crw_demo because I couldn't find a C version of a BCI library that met my needs. I would highly recommend you investigate the freely available BCI libraries out there before taking on the task of writing a new one. Having said that, I know a few people like myself suffer from insanity at times and will attempt this effort, so I've accumulated a few tips for the beginner.
I haven't mentioned how or when you get the classfile images, there are a variety of ways including:
- Just modify the class images on disk and change the classpath setting.
- Capture the class image in memory with the JVM TI ClassFileLoadHook event and return back a modified class image. This is what HPROF and the demo agents in JDK 5.0+ do.
- Redefine the class on the fly with JVM TI RedefineClasses.
Some of the common issues you may encounter doing classfile transformations are:
- Additions to the constant pool will be needed. It is easiest if these are added at the end of the constant pool. Don't forget to set the constant pool count correctly in the classfile header. If you do change the constant pool order or if you exceed 256 constant pool entries, watch out for the ldc bytecodes, some may need to be changed to ldc_w bytecodes. I highly recommend just adding to the end of the constant pool. If you need to push a constant greater than 16bits onto the stack, you will need a constant pool entry for it.
- Adding bytecodes can also cause bytecodes that precede the intrumented location to need to change. In addition, things like "ifeq" may need to be changed into "ifneq" and "goto_w" if the addition of code pushes things beyond 16-bits away.
- Adding bytecodes can cause some of the bytecodes that follow to need different bytecodes to deal with changing ranges, e.g., a jsr vs. a jsr_w instruction. Special care needs to be taken when re-constructing the new Code attribute that all the bytecodes, both inserted and original, are using the correct wide and '*_w' bytecodes.
- Changes to the bytecodes means that the offsets in the "Exceptions", "LineNumberTable", "LocalVariableTable", and (new for JDK 5.0) "LocalVariableTypeTable" attributes will be invalid. You will need to adjust all these offsets. Careful with offset 0, you may want 0 to remain 0 for the local variables or the line table, but maybe not for the exception table.
- If the inserted bytecode causes or could cause the maximum stack size to increase, the "Code" attribute will need the max_stack field adjusted along with the code_length. If you add local variables, you will also need to adjust max_locals.
- Insertion of bytecodes at the very beginning needs to be done carefully, consider a jump to offset 0 in the original bytecodes. You need to decide if the inserted bytecodes at 0 will be executed once on entry, or also when the method does a jump to offset 0. So inserting bytecodes for method entry is not the same as inserting bytecodes at offset 0.
- Insertion before return bytecodes can also be tricky. If in the original bytecodes this return bytecode is a target of a jump, do you want the inserted bytecodes to be executed? (Hint: Answer is yes. :^). You need to make sure that the old jumps to the return bytecode now jump to the inserted bytecodes.
- The special case of the new bytecode is a less-than-obvious problem. Objects that have not been initialized cannot be passed to ANY Java methods, so doing the obvious injection of a dup and an invokestatic after the new bytecode will not work when bytecode verification is on. This object must be initialized first. So if you wish to capture newly-allocated objects, the best place to catch them is in the java.lang.Object.<init> method. Once the object makes it here, you can pass the object around (or of course you could run Java with the verifier off, but that generally isn't a good idea). An alternative is to try and find the appropriate <init> The newarray bytecode doesn't have this problem, and in fact the only way to capture these objects is by inserting bytecodes immediately after the newarray bytecode (don't forget the anewarray and multianewarray bytecodes).
- It is best to insert the fewest bytecodes possible, and in java_crw_demo, the insertion is limited to pushing a few items on the stack, and making a static method call to a class found in the boot classpath. This so-called tracker class and tracker methods contain Java code that, in our demos, grab the current Thread with a call to java.lang.Thread.currentThread, and pass all the arguments, plus the current thread to a native method belonging to the class. The agent will have registered the natives for this class and those native functions are actually static functions inside the agent library.
- Any bytecode insertion needs to be careful of the state of the VM when it is called, e.g., has the VM started, or has the VM been initialized. The downside to the JNI call in the Tracker class is that you can't make the JNI call unless the VM is started and the natives have been registered, and the downside to calling currentThread is that before the VM is initialized, this thread reference could be null.
- Once past the VM initialization, any JNI usage and all the native code in the agent library needs to watch out for JVMTI_EVENT_VM_DEATH, making sure the code can recover cleanly.
- You may wish to be selective in what classes or methods you apply BCI to, depending on what information you are after. It is always best to limit the intrusion of BCI on the application.
- There is the idea of a system class in java_crw_demo, probably badly named. These system classes are treated special and <init> methods of length 1, finalize methods of length 1, <clinit> methods, and the java.lang.Thread.currentThread() methods will not have BCI applied to them. This may not be necessary, and the <clinit> part may be masking allocations. Fortunately, HPROF tells java_crw_demo that a class is a system class on only a few few classes, less than a dozen, and only when the class is being loaded during or before VM initialization.
- Creating too much new bytecode can cause stack overflow errors in the VM prior to VM initialization, so for the early loaded classes you need to take special care about what inserted bytecode is executed and what that bytecode is doing when the VM is not fully initialized yet.
- It's possible at ClassFileLoadHook time that you won't have a classname. This is a pain if you need to track what has been BCI'd. You need to dig the name out first using java_crw_demo_classname(), which parses the classname out of the classfile. This does not happen often in the real world.
- As new classfile versions are created by newer JDKs, they may contain attributes that may need to be adjusted due to bytecode offset changes. I haven't figured out a clean way to handle this, and in fact this java_crw_demo library currently ignores the classfile version number, but that is actually a bug that should be filed. Any BCI processor should probably be well aware of what classfile versions it can work on.
I'll add to this entry as new issues come up.