I'm searching for some information about OCR with webcenter content and imaging.
I've seen that there are two solutions :
- WebCenter Capture
- Webcenter Forms Recognition
But I can't find if any of that product use lexical post-correction of OCR results. Is anyone have this information?
Both products will use print (vs handwritten) character recognition to identify "tokens" from the image of a document with text. Capture usage focuses mainly on zonal recognition of information - what I would call structured forms processing. Forms Recognition is for information capture from semi-structured documents - say invoices, where you know that many fields are present, but their location differs from example to example. It can use patterns, text locators, etc to find the field. Both benenfit when there exists a reference DB of acceptable values, but that is not a requirement.
Lexical correction (as I understand it to mean) is not a feature of either. They are not going to try to validate extracted tokens based upon language analysis. I would try to use them to extract all of the tokens and then add some tool to do lexical analysis. You could do that as a separate, post OCR process, or try to see if you can fit it into an FR post-extraction EP.
Might I ask what the business problem you are trying to solve is?