There's a major difference between the XML11EntityScanner and the older XMLEntityScanner that I don't understand, and maybe someone can explain it to me. It is the cause of some strange behavior by the Java SAX parser that I have spent a couple of days understanding. Both classes are in com.sun.org.apache.xerces.internal.impl. The former is called when parsing files in XML version 1.0, and the latter for version 1.1.
Both scanners use a character buffer to hold the chunk of the document they are working on. As they scan and find attribute values, the values are stored as instances of XMLString, which simply point into the buffer with an offset and a length. Obviously if the scanner is in the middle of parsing an element when the buffer is consumed and refilled, some fixup has to occur otherwise the values of the attributes that have been found up to that point will be overwritten by the new buffer content.
Therefore, before the XMLEntityScanner refills the buffer it always notifies listeners so that they can take the fixup actions. Here's a typical piece of code from XMLEntityScanner#peekChar() that is executed to refill the buffer:
// load more characters, if needed
if (fCurrentEntity.position == fCurrentEntity.count) {
invokeListeners(0); // tells listeners to cache work in progress
load(0, true); // refills the buffer
}
But here's the similar code from XML11EntityScanner#peekChar:
// load more characters, if needed
if (fCurrentEntity.position == fCurrentEntity.count) {
load(0, true);
}
Notice that in the second code fragment listeners are not notified, so there is no fixup. In fact XML11EntityScanner doesn't do fixup at all when it reloads the character buffer. As a result, attribute values can be corrupted if the buffer reload happens to fall in the middle of processing an entity. This leads to all kinds of strange XML errors parsing a version 1.1 file that is larger than the buffer size of 8192 characters. Those errors don't happen when parsing a version 1.0 file.
So here's my question -- why is there this difference between XMLEntityScanner and XML11EntityScanner?
Thanks
Alan