This content has been marked as final. Show 15 replies
Do I've to use some kind of preferences?
Don't you have to synchronize your ctxsys.context index after DML changes ?
You have used sync(on commit) in your index creation and committed within your procedure, so additional synchronization is not necessary, but you need to make sure that you wait long enough for it to finish before searching.1 person found this helpful
You need to add the ctxsys.auto_filter to your index parameters:
Create index Content_ORATEXT_BLOBIndex
on CONTENT_ORATEXT_BLOB (pdf_file)
INDEXTYPE is ctxsys.context
('SYNC (ON COMMIT)
You also need to check that the pdf version and features are supported on your Oracle version and operating system:
Also have a look in CTX_USER_INDEX_ERRORS view to see if there are any filtering or indexing errors returned for the PDF documents.
It's possible that the version of PDF you're using is too new for the filters available in your database version - or that the PDF documents are corrupted. Some PDF creation programs generate non-valid PDFs which can be opened by some tools (typically Adobe Reader, since that's what they're tested against) but not by our filters.
Finally, certain types of PDF documents cannot be filtered. The most obvious examples are image-only files, such as are generated by scanning a document and saving it to PDF, but there are also problems with PDF files which use "custom fonts".
I've checked and found no errors.
Trying solution provided by Barbara.
Any luck?1 person found this helpful
It shouldn't be necessary to explicitly specify AUTO_FILTER since you're indexing a BLOB - AUTO_FILTER should be the default for this type.
Have you tried a variety of PDF files from different sources?
No luck at all at this time.
Now experimenting with different filters on maintenance.
I Will let you know if I'm able to find any solution.
By the way. I've found following tables have no result.
I hope this might help us to find the solution.
and following query returns some of rows. around 21.
select token_text,token_count from Dr$content_Oratext_Blobindex$i;--0 records select docid ,textkey from Dr$content_Oratext_Blobindex$k;--0 records select nlt_docid from dr$content_Oratext_Blobindex$n;--0 records
returns 21 rows. But what I was wondering is the output of following query.
select r.row_no from dr$content_Oratext_Blobindex$r r;
returns 21 row numbers but no data in the r.data's substring version.
select r.row_no,dbms_lob.substr(r.data,1000) from dr$content_Oratext_Blobindex$r r;
shows 0 in length.
select r.row_no,dbms_lob.getlength(r.data) from dr$content_Oratext_Blobindex$r r
Hope this helps.
What is your Oracle version and edition?
What is your operating system and version?
What is your PDF version?
Can you index a very simple PDF file with no special fonts or anything else, just some brief text?
Oracle Version & Edition : Oracle Database 10g Enterprise Edition Release 10.1.0.4.0 (64 bit)
OS : HP UX 11i
PDF Version : 1.5
Even PDF with single line without any special fonts not working.
Thanks & regards,
I believe Oracle 10.1 is no longer supported. You should upgrade to at least 10.2. I wasn't even able to access the old 10.1 documentation through Oracle. The following copy lists what platforms and document formats were supported in 10.1. I know there have been lots of patches and major changes in filters since then.1 person found this helpful
Is your OS HP-UX Itanium or PA-RISC ?