Skip to main content

Posts

Showing posts from March, 2010

nutch - what i did different

http://zillionics.com/resources/articles/NutchGuideForDummies.htm 3 Setup the crawler: Create a directory called urls to hold the a text file with urls inside of it. becomes: Create a directory called seedlist to hold the text file with urls inside of it. In this directory, create the text file with any name you like. Put any URL’s line by line. This is the crawler’s “shopping list”. becomes: c:\nutch-1.0\seedlist\seedlist.txt = http://www.neocodesoftware.com/ 4. Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAME with +^http://([a-z0-9]*\.)*neocodesoftware.org/ Edit the file conf/nutch-site.xml. insert at minimum following properties into it and edit in proper values for the properties: http.agent.name neocode http.agent.description neocodeagent http.agent.url http://www.neocodeosftware.com http.agent.email sales@neocodesoftware.com searcher.dir crawl so bin/nutch crawl urls -dir crawl -depth 3 -topN 50 becomes bin/nutch crawl s...

Hadoop - it says highly fault tolerant but does it ACT highly fault tolerant?

I have copy and pasted snippets from: http://hadoop.apache.org/common/docs/current/hdfs_design.html Note: I added my proprietary, copyrighted, handcrafted and uniquely Vancouver, BC, tags. Note each tag is created with a physical serialized twin linked using the Imaginary Quantum Entanglement DRM. In addition to the tags I also sell premium tags. Email joshua@neocodesoftware.com to order yours today. (note: don't be fooled by the "counterfeit" tags - these are open source, not imaginarily quantumly entangled, and not commercially supported knock offs made by my clone.) Hadoop - it says highly fault tolerant but does it ACT highly fault tolerant? 1. Introduction The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware... HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. 2. NameNode and DataNodes HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode , ...