2 Replies Latest reply: Dec 15, 2006 5:53 PM by 807607 RSS

    good java web crawlers

      i am writing a search engine implementation using lucene,
      you may skip this part:
      to do this i need to crawl the site to get the content i need.
      i found some example code on how to do it, it uses HTTPParser and apache HTTPClient to do the job, the problem is that the code isnt very good, and it opens up a bunch of sessions while crawling.

      i cant find the documentation i need to figure out how to NOT open new sessions
      the question:
      do you guys know of tested good java open source crawlers?