2 Replies Latest reply: Jul 16, 2007 8:16 AM by 807605 RSS

    Reading worddoc file

    807605
      Hi..

      i am reading a worddoc of 3000 pages(17MB file) using poi 3, its giving me the below exception.plz help me


      Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

      below is th code
      /**
      */

      public final static Reader getDOCContent(final File f) {
           Reader reader = null;
           FileInputStream fis = null;

           
      try {
      fis = new FileInputStream(f);
      WordExtractor we = new WordExtractor(fis);
      String contents= we.getText();
      reader = new StringReader(contents);
      }
      catch (FileNotFoundException e) {
      System.out.println(" file not found exception in doc" + e.getClass());
      e.printStackTrace();
      }
      catch (Exception e) {
      System.out.println("exception in doc " + e.getClass());
      e.printStackTrace();
      }
      finally {
      if (fis != null) {
      try {
      fis.close();
      }
      catch (IOException e1) {
      System.out.println(" io exception in doc" + e1.getClass());
      e1.printStackTrace();
      }
      }
      }
           return reader;
           }
        • 1. Re: Reading worddoc file
          800282
          Hi..

          i am reading a worddoc of 3000 pages(17MB file)
          using poi 3, its giving me the below exception.plz
          help me
          OMG. I can't imagine MS-Word can open such a file itself without going flat on it's ass!

          Exception in thread "main"
          java.lang.OutOfMemoryError: Java heap space
          - increase your heap (hint: Google);
          and/or
          - post a message to the Jakarta-POI mail list.
          • 2. Re: Reading worddoc file
            807605
            Hi...

            programming is reading upto 1800 pages after this its giving this exception..

            i am writing a desktop search program using lucene, so it should read all the content from a file, what ever may be its size...