2 Replies Latest reply: Jan 21, 2013 1:50 PM by sperkmandl RSS

    CTX_DOC_TOKENS offsets do not match text returned by CTX_DOC_FILTER

    sperkmandl
      Hi all, offsets returned by CTX_DOC_TOKENS do not seem to be related to the text block as returned by CTX_DOC_FILTER.
      I indexed the Oracle Text Reference manual (a pdf), and using 'Release' as a query term, I get:

      CTX_DOC_FILTER (I replaced newlines by 'n'):

      nnnnnnnnOracle® Text Referencennnn11g Release 2 (11.2) E24436-01nnnnAugust 2011nnnnnOracle Text Reference, 11g Release 2 (11.2) ....

      CTX_DOC_TOKENS:

      +120 6 ORACLE+
      +129 4 TEXT+
      +184 9 REFERENCE+
      +248 6 11G+
      +259 7 RELEASE+ ???
      +267 1 2+
      +270 4 11.2+
      +316 6 E24436+

      CTX_DOC_HIGHLIGHT:

      +40: 7+ !!!!
      +113: 7+
      +4575: 7+
      +4698: 7+
      +88521: 7+
      +88716: 7+

      CTX_DOC_MARKUP:

      nnnnnnnnOracle® Text Referencennnn11g '<<<'Release'>>>' 2 (11.2) E24436-01 ...

      It appears that token offsets are totally unrelated from text (Release -> 259), while highlight offsets appear ok (Release -> 40).
      Any comment is welcome.

      Renzo