This content has been marked as final. Show 6 replies
PDF is not structured data - it is formatted text (unstructured) data.
So I would say that the basic concept of using PDF as a data entry format for structured data makes as much sense as fitting doors to a motor cycle.
The basic process you describe is known as a workflow. Workflow systems have been around since the mid 90's. Nothing new.
So either a workflow system is needed, or instead an Apex web application to supports the basic processing steps (using database entry web forms instead of PDF) can be put together in a couple of hours with minimal effort (and little cost as Oracle Apex is free).
after googling for over 4 hours I came up empty on converting PDF to XML.
Glad I'm not paying you by the hour! You might need to switch to decaf.
It only took a few seconds for me using 'convert pdf to xml'
Right there on the first page was a link on how to use Adobe Acrobat to do the conversion.
So then you use 'adobe acrobat convert pdf to xml' and it's amazing - there are links to the adobe site. One of them is about an XMl Plug-in for Windows.
And an adobe forum question about it
Maybe try searching again after the caffeine wears off a little?
If it can be done with PDF Adobe has the tools to do it.
Now, my problem is, HOW ON EARTH TO READ A "PDF"???One option is Oracle Text as shown e.g. in Re: Read from a file. Though again you'll be confronted with the problem on how to structure unstructured data unless the resulting plain text file is quite easy to parse ...
user12240205 wrote:I'd probably investigate ORACLE_Text, the free document/text indexing capability of the database. Described at http://docs.oracle.com/cd/E11882_01/text.112/e24435/ind.htm#i1004902
we have a requirement like this: We have a Time Tracking and Project Monitoring System whose DB is a Oracle 10g R2. We want to automate our ''Meeting Minute'' processing.
But that's just me ...