This discussion is archived
1 Reply Latest reply: Nov 4, 2010 1:35 PM by Barbara Boehmer RSS

Obtaining which Oracle Text tokens belong to which "file"... is it possible

251503 Newbie
Currently Being Moderated
Hello,

I posted this on the Oracle 10g XE forum but got a response that this forum might be able to help........

I am using Oracle 10g and I have a table with a BLOB column that gets indexed and is retrieveable via an Oracle text search:

CREATE INDEX idx_txt_body ON RPT_SOURCE(BODY)
INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS
('FILTER CTXSYS.AUTO_FILTER format column BODY_FORMAT');

The storage, search , and retrieval of the BLOB info (each BLOB represents the contents of a file) works fine. The issue is that I need to be able to find all of the tokens
created during the indexing phase for any given BLOB object. This is needed for verification purposes. For example, we index a PDF file named my_data.pdf. I later go back to search the index and use a term that I know is in the my_data.pdf file, but no results are returned. I need to be able to find out all of the tokens created for the file my_data.pdf when the file content was indexed. I can get each token name and count (ie how many files ahave this woken) from the DR$IDX_TXT_BODY$I table, but I can't figure out how to list the files (or actually the ID of the RPT_SOURCE row that the bLOB is part of) that are associated with a given token. For example, if there is the token "MOUNTAIN" that has a count of 3, I would like to find something that will give me:

MOUNTAIN --> 6 (for rowid 6 of the RPT_SOURCE table. ie, record 6 in the RPT_SOURCE table has a blob which contains the token MOUNTAIN)
MOUNTAIN --> 32
MOUNTAIN --> 89

I see that there are other index tables, such as DR$IDX_TXT_BODY$[K,N,R], but I can't find the link that gives me what I am looking for.

Any thoughts on if this is possible?

Thanks - Peter

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points