This discussion is archived
1 Reply Latest reply: May 29, 2012 4:39 PM by 940440 RSS

OCR with WebCenter Content and Imaging

927550 Newbie
Currently Being Moderated

I'm searching for some information about OCR with webcenter content and imaging.

I've seen that there are two solutions :
- WebCenter Capture
- Webcenter Forms Recognition

But I can't find if any of that product use lexical post-correction of OCR results. Is anyone have this information?

  • 1. Re: OCR with WebCenter Content and Imaging
    940440 Newbie
    Currently Being Moderated

    Both products will use print (vs handwritten) character recognition to identify "tokens" from the image of a document with text. Capture usage focuses mainly on zonal recognition of information - what I would call structured forms processing. Forms Recognition is for information capture from semi-structured documents - say invoices, where you know that many fields are present, but their location differs from example to example. It can use patterns, text locators, etc to find the field. Both benenfit when there exists a reference DB of acceptable values, but that is not a requirement.

    Lexical correction (as I understand it to mean) is not a feature of either. They are not going to try to validate extracted tokens based upon language analysis. I would try to use them to extract all of the tokens and then add some tool to do lexical analysis. You could do that as a separate, post OCR process, or try to see if you can fit it into an FR post-extraction EP.

    Might I ask what the business problem you are trying to solve is?



  • Correct Answers - 10 points
  • Helpful Answers - 5 points