Customizing Lexer for XML constructs
I was wondering if it is feasible, or if anyone has any experience with, customizing the Oracle Text lexer to recognize certain XML constructs and affect the identification of tokens. Specifically, I have some XML entities that I would like to be significant within tokens (e.g., "doesn't", where "'" is the XML entity). I would also like to have Oracle Text ignore XML tag names, attribute names and attribute values when identifying tokens (basically, ignore everything between "<" and ">"). I using Oracle Text against VARCHAR2 data -- No use of the available XML datatypes.
Any help and insights is greatly appreciated!
0