One of the common issues that JAXB 2.0 users face is this. When they compile their favorite schema, XJC reports back the following scary-looking errors and refuses to compile it:

parsing a schema...
[ERROR] Property "MiOrMoOrMn" is already defined.
  line 132 of file:/C:/kohsuke/Sun/JAXB/jaxb-unit/schemas/individual/MathML2/presentation/scripts.xsd

[ERROR] The following location is relevant to the above error
  line 138 of file:/C:/kohsuke/Sun/JAXB/jaxb-unit/schemas/individual/MathML2/presentation/scripts.xsd

A year ago, I explained why this is not a bug in JAXB but it's something users need to fix by themselves, but the truth is, this just reduces the ease of use of JAXB. I mean, users are always lazy and in a hurry, and so they just want their schema to compile. If anyone is interested, I can probably spend a good hour explaining why it was difficult and not desirable to handle this out of the box, but at the end of the day, who cares? I know you just want your schema to compile.

I take pride in JAXB, so I didn't like the idea that it slows down the productivty of some people by issueing errors. Therefore, I've been thinking about doing a better job on this since then. I wanted to improve XJC so that it works like a magic, so that you can get your job done quickly. And to this end, I was able to implement a new algorithm that hopefully does a better job on schemas like this.

Since this change was made after EA3, to try this out today you need to download a JAXB RI nightly (but it will be a part of the upcoming JAXB RI 2.0 release.)

Once you download the RI, also download this "schemalet", then compile your favorite schema like this (the additional option and the file is to give you a mentral barrier to cross, because this is still an experimental stuff and subject to change.):

$ xjc -extension simpleBinding.schemalet abc.xsd def.xsd ...

That's it. This puts XJC into a new binding mode, where it has less compilation errors. It works more magically.

Now, let me brag a bit more about how I did this.

The way the JAXB 2.0 spec binds XML content models is to do it inductively on the structure of the content model definition, which makes it sensitive to the way the content model definition is written. For example, A,A* and A+ represent the same content model, but just because they are written differently, JAXB spec binds them to different things (the former fails with an error, the latter works just fine.)

More common occurence of this problem is that when a schema author wants to define a content model that can't be naturally captured by W3C XML Schema, he often ends up writing a convoluted content model. Here are a few simple real-world examples:

  1. You want to say "a sequence of length > 1 where A and B can only appear as the head", and you end up writing(A|B,C*)|C+
  2. You want to say "there has to be at least A, B, or C, but it can also have any number of Xs in between", and you end up writingX*, ((A|B|C), X*)+

When the binding algorithm is defined based on the structure of the content model, it really breaks down quickly when it faces these convoluted content models. What that means is that it was not a good idea to do a binding based on the structure. That eventually led me to a new algorithm that borrows a few concepts from the graph theory --- like strongly connected components, cut set, and so on. The new algorithm focuses on the (possibly infinite) set of sequence of elements that are allowed by the content model, which more closely reflect the schema author's intention. It builds an acceptor graph where elements are nodes, then de-compose it to a set of strongly-connected components. Then check if each one of them form a cut set of the original graph. This result is then turned into a set of Java properties in an obvious way :-)

I don't expect you to care about those theories and details, but as a result, when it sees a content model ike (A,B)|(B,C), it can bind this to the following Java class (it's smart enough to figure out that B is effectively mandatory):

class Foo {
  String a;
  String b;
  String c;

I'm confident that this greatly reduces the likelihood of our users hit any schema compilation issue.

This new binding algorithm is a part of my bigger "simple and better binding mode" effort. I briefly talked about one aspect of this mode a week ago. I plan to talk about other improvements in this mode in a near future.