Skip to main content

nutch - what i did different

http://zillionics.com/resources/articles/NutchGuideForDummies.htm

3 Setup the crawler:
Create a directory called urls to hold the a text file with urls inside of it.
becomes:
Create a directory called seedlist to hold the text file with urls inside of it.

In this directory, create the text file with any name you like. Put any URL’s line by line. This is the crawler’s “shopping list”.

becomes:
c:\nutch-1.0\seedlist\seedlist.txt = http://www.neocodesoftware.com/

4. Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAME with
+^http://([a-z0-9]*\.)*neocodesoftware.org/

Edit the file conf/nutch-site.xml. insert at minimum following properties into it and edit in proper values for the properties:

http.agent.name
neocode





http.agent.description
neocodeagent





http.agent.url
http://www.neocodeosftware.com





http.agent.email
sales@neocodesoftware.com





searcher.dir
crawl



so
bin/nutch crawl urls -dir crawl -depth 3 -topN 50
becomes
bin/nutch crawl seedlist -dir crawl -depth 3 -topN 50

plus i made this:
cd \nutch
net stop "apache tomcat 6"
rmdir /s /q crawl
net start "apache tomcat 6"

Comments

Popular posts from this blog

How to change default calendar for new events in Lightning

https://getsatisfaction.com/mozilla_messaging/topics/how_set_default_calendar_for_new_events_in_lightning Edited version Open Tools > Options > Advanced tab, and click Config Editor button. In the "Filter:" box enter "calendar.registry"  Find a .calendar-main-default key - it will be set to true Other calendars either won't have a .calendar-main-default key (or it will be set to false) Right click on the value of the .calendar-main-default key that goes with the calendar that currently shows up by default in new events to toggle the value to false Click on the .calendar-main-in-composite key that goes with the calendar you want as default Right click on the same key and choose Copy Name from the menu that appears. Now right click on the key again and select New > Boolean Paste the name of the key and Use the backspace key to erase "in-composite" and type "default" Click OK and Choose true and click OK Now exit out of ...

clipy mac clipboard crash - troubleshoot and fix

 clipy for mac wouldn't load solution - timemachine restore of /Users/jpaul/Library/Application Support/com.clipy-app.Clipy/default.realm troubleshoot key command didn't work app not loaded console.app showed app crashing tried older newer versions of app tried older version of os tried to find data location no documentation on github /Users/ME/Library/Application Support/com.clipy-app.Clipy/ - didn't see any data /Users/ME/Library/Application Support/Clipy - found data in binary format tried silver searcher for the only keyword i could remember: ag keyword ~/ -  no results added binary ag --search-binary keyword ~/ Binary file /Users/ME/Dropbox/Backups/com.clipy-app.Clipy/default.realm matches. it is in default.realm downloaded realm studio - but i was getting an error Unable to open a realm at path '/Users/ME/Desktop/default.realm': Invalid top array size  googling that - it's a corrupt database restored default.realm from timemachine  and database opened and ...