This discussion is archived
5 Replies Latest reply: Apr 23, 2012 1:19 AM by sperkmandl RSS

No pdf indexing

sperkmandl Newbie
Currently Being Moderated
Hi, I'm using CTXSYS.CONTEXT with URL_DATASTORE. All other parameters are left unspecified (defaults).
While plain text docs appear properly indexed, pdf are not. They appear unfiltered, the index contains pdf keywords only.
I understood that by default auto_fillter should enter the game in this case and pdf docs should be recognized as such.
What's wrong ? Thanks.
  • 1. Re: No pdf indexing
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    Although, according to the documentation, ctxsys.auto_filter is the default, so you should not have to specify it, I have found that in some versions, explicitly specifying it as an index parameter makes a difference.

    You need to make sure that the auto_filter supports your PDF versions and operating system and version and Oracle version and edition. You also need to make sure there isn't any password protecting the PDF or any special PDF features that prevent filtering. These things are listed in the documentation and are different for different versions.

    There have been a lot of changes to filtering and there are some patches.

    What is your Oracle version and edition?
    What is your operating system and version?
    What is/are the PDF versions?
    Can you filter a simple PDF document without anything special, just a single sentence for testing?
  • 2. Re: No pdf indexing
    sperkmandl Newbie
    Currently Being Moderated
    I'm using Oracle 11g2. I tried several pdfs, among the others Oracle Text Application Developer's Guide (E24435-01).
    Only pdf keywords appear indexed (e.i. "trailer"), as well as contents recognizable as plain text (e.i. "getting started").
  • 3. Re: No pdf indexing
    sperkmandl Newbie
    Currently Being Moderated
    Btw, specifying the parameter "FILTER CTXSYS.AUTO_FILTER" at index creation seems to force proper pdf indexing.
  • 4. Re: No pdf indexing
    Barbara Boehmer Oracle ACE
    Currently Being Moderated
    It looks like, according to the documentation in the link below, that ctxsys.auto_filter is only the default with bfiles or file_datastore, so that must be why you have to specify it with your url_datastore. So, I take it your problem is solved?

    http://docs.oracle.com/cd/E11882_01/text.112/e24436/cdatadic.htm#CCREF2076
  • 5. Re: No pdf indexing
    sperkmandl Newbie
    Currently Being Moderated
    Solved, thanks.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points