1 Reply Latest reply: Jan 30, 2009 5:49 PM by 3004 RSS

    Parse html (a href) using regex


      i would like to extract all the urls from a website that are included in < a href=" parse string">

      I have already the regex which is

      String regex = "< *a.*href *= *['|\"]";
      May you please advise me which method in Pattern or Matcher classes shall i use in order to take as output
      *only* the url inside the " " marks?
      I have already tried end and start methods which return the indexes, but i don't get the desirable result.

      Thanks, in advance

      P.S.Also, i have already tried to use HtmlParser but i prefer to use regex cause i found a difficulty in it.