11 Replies Latest reply: May 18, 2009 1:22 PM by 807588 RSS

    How do I read specified lines in a text file?

    807588
      Hi!

      I would like to read a html file and print all text between every <td> and </td> tag.

      My code already reads and prints the entire file, but how can I make it just print the text between the above tags? Using the String.indexOf()?

      Please, give me a hint.
        • 1. Re: How do I read specified lines in a text file?
          807588
          String.indexOf() is certainly an option. If you are a newbie, is probably the easiest to understand / explain.
          You could also try creating a regular expression to match the desired text.
          There may also be some HTML parsing packages out there somwhere.
          • 2. Re: How do I read specified lines in a text file?
            807588
            Use an XML/HTML parser. SAX might be appropriate here.
            • 3. Re: How do I read specified lines in a text file?
              807588
              es5f2000 wrote:
              Use an XML/HTML parser. SAX might be appropriate here.
              Can you parse HTML with SAX? I thought that SAX could only handle well formed XML.
              Lots of (most???) HTML does not qualify as well formed XML.
              Note: HTML, not XHTML which at least is supposed to be XML.
              • 4. Re: How do I read specified lines in a text file?
                807588
                Hence "might be". :-)
                • 5. Re: How do I read specified lines in a text file?
                  807588
                  es5f2000 wrote:
                  Hence "might be". :-)
                  I missed ever useful weasle words <g>.
                  • 6. Re: How do I read specified lines in a text file?
                    807588
                    Let´s say the html file consists of these lines:
                    <table>
                    <tr>
                    <td>A simple text 12345 and some more text.</td>
                    </tr>
                    <tr>
                    <td>A second simple text 54321and some more text.</td>
                    </tr>
                    <tr>
                    <td>A third simple text 99999 and some more text.</td>
                    </tr>
                    </table>
                    Now I just want to pick out the text inside the <td> and </td> tags.

                    I have read all the html text into a String.

                    What is the simpliest way to loop through the String to check the <td> tags?
                    • 7. Re: How do I read specified lines in a text file?
                      807588
                      If your HTML is well formed you could do
                      String html = IOUtils.toString(new FileInputStream(filename));
                      String[] parts = html.split("</?td>");
                      for(int i=1;i<parts.length;i+=2)
                        System.out.println(parts);
                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
                      • 8. Re: How do I read specified lines in a text file?
                        807588
                        Simplest is in the eye of the coder.
                        But since you mentioned String.indexOf() why not try that.
                        • 9. Re: How do I read specified lines in a text file?
                          YoungWinston
                          johndjr wrote:
                          Can you parse HTML with SAX? I thought that SAX could only handle well formed XML.
                          Correct. However, there is a wonderful little tool called HtmlTidy out there on the Net, written, a long time ago, by one of the guys at CERN. I've stuck a few gigabytes through it in my time; have yet to have a problem. There may even be a Java version now.

                          Winston
                          • 10. Re: How do I read specified lines in a text file?
                            807588
                            Thanks!

                            To go a step further, how can I just select the numbers in each <td> tag?

                            I have tried a bit but I do not succeed:
                            String[] parts = str.split("</?td>");     // The String between the <td> and </td> tags
                            char c;
                            for(int i=1; i< parts.length; i += 2)    // Loop through each String
                            {
                                   for(int j=0; j < parts.length(); j++) // Loop through each character in the string
                            {
                            c = parts[i].charAt(j);
                            if (Character.isDigit(c)) // Check if the character is numeric
                            {
                            System.out.print(c);
                            if(i/5==0) // Make a new line each 5th charcater
                            System.out.println();
                            }
                            }
                            }

                            The output looks very strange, it does not print 5 numeric charcaters on each line.
                            What am I doing wrong?
                            • 11. Re: How do I read specified lines in a text file?
                              807588
                              Re-read your code. Especially the part where you do the printing.
                              Print out some the important values to ensure that they have the values you expected.