12 Replies Latest reply: Dec 25, 2007 12:58 AM by 807603 RSS

    Reading a PDF file as a String and creating another pdf file

    807603
      Hi,

      I have a program that reads a pdf file and converts it into a string. i am supposed to write this string as a pdf file in another location.
      When i write it, the file is not getting copied completely, most of the contents are missing. Please help me how to solve this problem.

      I am using a FileInputStream to read the contents as bytes and convert it into a string and converting this string as bytes and writing it to a pdf file using FileOutputStream.

      If i don't convert the bytes to string and directly write it to a pdf file, it works correctly. But i have a requirement to convert it to string to pass it to the server and store it as a pdf file in a location.

      Thanks,
      cmbl
        • 1. Re: Reading a PDF file as a String and creating another pdf file
          807603
          PDF files are binary files so must be copied as bytes not characters. As you found out, treating them as text results in a corruption.
          • 2. Re: Reading a PDF file as a String and creating another pdf file
            796440
            Are you just reading the whole file and convering all the bytes, including markup, to a String? Or are you parsing the file and extracting the text. If the former, that's your problem. A String is just a sequence of textual characters. A PDF file has a bunch of bytes in it that are not part of the text and that are probably not legal Unicode characters.
            • 3. Re: Reading a PDF file as a String and creating another pdf file
              807603
              1) Does that mean i need to keep pdf files as bytes until i write it to another file. Is there any way i can do the conversion to a string and still take care the data is not corrupt.

              2) I get similar error when i copy word and Excel document. The data will be copied but i get an error saying "Some data may be lost" why is this?
              • 4. Re: Reading a PDF file as a String and creating another pdf file
                rahul_akkina
                Well PDF is not constructed using plain old text format.Data on it is being encoded according to a stated format.
                You might think of taking help Specific set of Java libraries(API's) and try parsing and read the content of the specific PDF document accordingly.The below link would give you a list of commercial/open source solutions for doing this job in java.

                http://java-source.net/open-source/pdf-libraries
                http://schmidt.devlib.org/java/libraries-pdf.html
                http://www.etymon.com/

                You might think of choosing one among them as per your requirement.

                Hope this might help :)

                REGARDS,
                RaHuL
                • 5. Re: Reading a PDF file as a String and creating another pdf file
                  807603
                  i am reading the file into bytes and converting into String. If this this not correct way yo do it, is there any way we can convert the pdf file into String.
                  • 6. Re: Reading a PDF file as a String and creating another pdf file
                    796440
                    cmbl wrote:
                    1) Does that mean i need to keep pdf files as bytes until i write it to another file.
                    Yes. Why would you not want to do that?
                    Is there any way i can do the conversion to a string and still take care the data is not corrupt.
                    Did you not read what I wrote? A PDF file is more than a bunch of letters.
                    2) I get similar error when i copy word and Excel document. The data will be copied but i get an error saying "Some data may be lost" why is this?
                    :headdesk:

                    Same thing.
                    • 7. Re: Reading a PDF file as a String and creating another pdf file
                      796440
                      cmbl wrote:
                      i am reading the file into bytes and converting into String. If this this not correct way yo do it, is there any way we can convert the pdf file into String.
                      Do you just want the text, without any formatting, pictures, etc.? Then google for java pdf library or something to that effect.

                      Note that if you do this, then write that string out as-is to a .pdf file, it will NOT be a valid PDF doc.

                      Edited by: jverd on Dec 18, 2007 11:07 AM
                      • 8. Re: Reading a PDF file as a String and creating another pdf file
                        807603
                        Thanks, for the replies to all of you. I will try out the options you have posted.

                        Thanks,
                        cmbl
                        • 9. Re: Reading a PDF file as a String and creating another pdf file
                          rahul_akkina
                          Well if need a readymade solution of extracting text out from a pdf document Please try making use of stated library and achieve the needful

                          http://www.lowagie.com/iText/tutorial/ch13.html
                          • 10. Re: Reading a PDF file as a String and creating another pdf file
                            807603
                            one more query related to this question.

                            How to send the contents of pdf through XML file?
                            How to send binary data through XML file?

                            Thanks,
                            cmbl
                            • 11. Re: Reading a PDF file as a String and creating another pdf file
                              rahul_akkina
                              -->Create an XSD if the PDF files which you are reading and are of generalized format

                              -->Parse the Content of PDF Documents using stated API's.Map the acquired content to Java Persistant Objects and.

                              -->Now create XML Content based on XSD stated using the data under Java Persitant Objects created

                              -->Then save the content accordingly.

                              Hope that might answer your question

                              REGARDS,
                              RaHuL
                              • 12. Re: Reading a PDF file as a String and creating another pdf file
                                807603
                                Hi,


                                I am facing the same problem. Please help me out. I just want to read a PDF file as bytes from one location and write it as another pdf file in some other location with a diolog box prompting to open or save in the location where we want.

                                I executed the following code:



                                try{

                                File report =new File(location);

                                BufferedInputStream in=new BufferedInputStream(new FileInputStream(report));

                                response.setContentType("application/x-download");

                                response.setHeader("Content-Disposition", "attachment; filename=" + report.getName());

                                OutputStream outs = response.getOutputStream();

                                int readlen;

                                byte buffer[] = new byte[ 256 ];

                                while( (readlen = in.read( buffer )) != -1 )

                                {

                                outs.write( buffer, 0, readlen );

                                }



                                outs.flush();

                                outs.close();

                                in.close();

                                response.setStatus(HttpServletResponse.SC_OK);

                                } catch (FileNotFoundException fileNotFoundException) {

                                PrintWriter out= response.getWriter();

                                out.print("<center><Font color = 'RED'><b>"+PxDSLUtils.getApplicationProperty("label.error.CTM_E017")+"</b></Font></center>");

                                }


                                Though it prompts with open, save dialog box when i try to open directly or when i save it some where locally and then open it i am getting the following message " File is repaired ot damaged.Operation failed." Any idea about what can be done??? Its very urgent.Please suggest.

                                I am not convetin to string just reading and writng as bytes itself.


                                Thanks in advance,
                                Mani