1 Reply Latest reply on Nov 4, 2010 8:35 PM by Barbara Boehmer

    Obtaining which Oracle Text tokens belong to which "file"... is it possible


      I posted this on the Oracle 10g XE forum but got a response that this forum might be able to help........

      I am using Oracle 10g and I have a table with a BLOB column that gets indexed and is retrieveable via an Oracle text search:


      The storage, search , and retrieval of the BLOB info (each BLOB represents the contents of a file) works fine. The issue is that I need to be able to find all of the tokens
      created during the indexing phase for any given BLOB object. This is needed for verification purposes. For example, we index a PDF file named my_data.pdf. I later go back to search the index and use a term that I know is in the my_data.pdf file, but no results are returned. I need to be able to find out all of the tokens created for the file my_data.pdf when the file content was indexed. I can get each token name and count (ie how many files ahave this woken) from the DR$IDX_TXT_BODY$I table, but I can't figure out how to list the files (or actually the ID of the RPT_SOURCE row that the bLOB is part of) that are associated with a given token. For example, if there is the token "MOUNTAIN" that has a count of 3, I would like to find something that will give me:

      MOUNTAIN --> 6 (for rowid 6 of the RPT_SOURCE table. ie, record 6 in the RPT_SOURCE table has a blob which contains the token MOUNTAIN)
      MOUNTAIN --> 32
      MOUNTAIN --> 89

      I see that there are other index tables, such as DR$IDX_TXT_BODY$[K,N,R], but I can't find the link that gives me what I am looking for.

      Any thoughts on if this is possible?

      Thanks - Peter