http://zillionics.com/resources/articles/NutchGuideForDummies.htm 3 Setup the crawler: Create a directory called urls to hold the a text file with urls inside of it. becomes: Create a directory called seedlist to hold the text file with urls inside of it. In this directory, create the text file with any name you like. Put any URL’s line by line. This is the crawler’s “shopping list”. becomes: c:\nutch-1.0\seedlist\seedlist.txt = http://www.neocodesoftware.com/ 4. Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAME with +^http://([a-z0-9]*\.)*neocodesoftware.org/ Edit the file conf/nutch-site.xml. insert at minimum following properties into it and edit in proper values for the properties: http.agent.name neocode http.agent.description neocodeagent http.agent.url http://www.neocodeosftware.com http.agent.email sales@neocodesoftware.com searcher.dir crawl so bin/nutch crawl urls -dir crawl -depth 3 -topN 50 becomes bin/nutch crawl s...