4 Replies Latest reply: Apr 2, 2008 9:42 PM by 807591 RSS

    Pattern matching

    631403
      I need to get a value out of a given stream. For eg: In below input stream I want to get "ABCDE", below stream also has new lines.

      --------
      <table align=center cellspacing=10 cellpadding=0 style="font-size: 100%;" class=dformDisplay>
      <tr>
      <td>
      <table cellspacing=2 cellpadding=0 class=LabelsLeft>
      <tr>
      <td><IMG SRC='/i/clear2x2.gif' width=10 height=0 alt=""></td>
      <td class=lc id='tdl_0'>Requestor </td>
      <td class=cc id='tdf_0'>ABCDE</td>
      --------

      Should I use pattern matching to get this value or is there any other better way.
        • 1. Re: Pattern matching
          807591
          HTMLParser is a better way, i blogged about it a while back:

          [http://notetodogself.blogspot.com/2007/11/parse-xml-with-jdom-and-xpath.html|http://notetodogself.blogspot.com/2007/11/parse-xml-with-jdom-and-xpath.html]

          i also blogged about doing the same thing with regex, which turned out to be slow:

          [http://notetodogself.blogspot.com/2007/10/extract-links-from-webpage.html|http://notetodogself.blogspot.com/2007/10/extract-links-from-webpage.html]
          • 2. Re: Pattern matching
            631403
            The html doesn't look like a true XML in my case and probably will not even parse successfully. Any suggestions.

            Also, I am trying to retrieve ABCDE in following stream using patter matching but doesn't seem to be working, am I doing something wrong:
            1. Input
            -----
            <table align=center cellspacing=10 cellpadding=0 style="font-size: 100%;" class=dformDisplay>
            <tr>
            <td>
            <table cellspacing=2 cellpadding=0 class=LabelsLeft>
            <tr>
            <td><IMG SRC='/i/clear2x2.gif' width=10 height=0 alt=""></td>
            <td class=lc id='tdl_0'>Requestor </td>
            <td class=cc id='tdf_0'>ABCDE</td>
            -----
            2. Code
            Pattern p = Pattern.compile("^.*silk>(.*)<.*$");
            Matcher matcher;
            tok = theClient.getRecordAsHTML(tbid, m.get("Reject request ID#")).split("\\\n"); //This is html input
            
            for (String s : tok){
            if (s.matches(".*popupUserWin.*slk>.*")){
            matcher = p.matcher(s);
            System.out.println(s);
            while (matcher.find()){
            System.out.println(matcher.start());
            System.out.println(matcher.groupCount()); 
            }
            
            }
            }
            • 3. Re: Pattern matching
              807591
              sorry i gave you the wrong link there..

              this is the html parser link

              [http://notetodogself.blogspot.com/2007/11/extract-links-using-htmlparser.html|http://notetodogself.blogspot.com/2007/11/extract-links-using-htmlparser.html]

              you should be able to use that code to get your link text, see html parser API
              • 4. Re: Pattern matching
                807591
                If you really want a pure regex solution, this should work:
                Pattern p = Pattern.compile("silk>([^<]*+)<");
                String html = theClient.getRecordAsHTML(tbid, m.get("Reject request ID#"));
                Matcher m = p.matcher(html);
                while (m.find()) {
                  System.out.println(m.group(1)); 
                }