Hi, I'm using CTXSYS.CONTEXT with URL_DATASTORE. All other parameters are left unspecified (defaults).
While plain text docs appear properly indexed, pdf are not. They appear unfiltered, the index contains pdf keywords only.
I understood that by default auto_fillter should enter the game in this case and pdf docs should be recognized as such.
What's wrong ? Thanks.
Although, according to the documentation, ctxsys.auto_filter is the default, so you should not have to specify it, I have found that in some versions, explicitly specifying it as an index parameter makes a difference.
You need to make sure that the auto_filter supports your PDF versions and operating system and version and Oracle version and edition. You also need to make sure there isn't any password protecting the PDF or any special PDF features that prevent filtering. These things are listed in the documentation and are different for different versions.
There have been a lot of changes to filtering and there are some patches.
What is your Oracle version and edition?
What is your operating system and version?
What is/are the PDF versions?
Can you filter a simple PDF document without anything special, just a single sentence for testing?
I'm using Oracle 11g2. I tried several pdfs, among the others Oracle Text Application Developer's Guide (E24435-01).
Only pdf keywords appear indexed (e.i. "trailer"), as well as contents recognizable as plain text (e.i. "getting started").
It looks like, according to the documentation in the link below, that ctxsys.auto_filter is only the default with bfiles or file_datastore, so that must be why you have to specify it with your url_datastore. So, I take it your problem is solved?