I'm trying to crawl documents residing on an FTP server into an Endeca data source. The documents can be of many types: pdf, word, xls, etc. The CAS Web Crawler Guide says that the current version of the Endeca Web Crawler does not support crawling FTP sites. Can anyone recommend a way to do this within Integator, and CAS if necessary?
Since the contents of the documents have to be "brought back across the wire" anyway, could you script the FTP get to pull them local, consume them via Integrator or CAS and then remove the local copies?
Just a thought,