Customizing Lexer for XML constructs

Aug 24, 2009 8:37AM

I was wondering if it is feasible, or if anyone has any experience with, customizing the Oracle Text lexer to recognize certain XML constructs and affect the identification of tokens. Specifically, I have some XML entities that I would like to be significant within tokens (e.g., "doesn't", where "'" is the XML entity). I would also like to have Oracle Text ignore XML tag names, attribute names and attribute values when identifying tokens (basically, ignore everything between "<" and ">"). I using Oracle Text against VARCHAR2 data -- No use of the available XML datatypes.

Any help and insights is greatly appreciated!

Oracle Text (MOSC)

Customizing Lexer for XML constructs

Howdy, Stranger!

Category Leaderboard

Top contributors this month