Skip to main content

Posts

Showing posts from 2010

what's my story?

learn from my mistakes 25+ years experience solving problems with computers and computer problems I know how to identify the problem analyze the problem design the solution implement the solution maintain the solution the process tools i use are user centered design iterative occam's rasor design first code second milestones client approval at every step alchemist it director with team network admins - lamp programmers - lamp, filemaker designers servers - 50+ data centre and bandwidth projects - new, renovation, analysis, roi process: less is more old is gold lead to gold respect wireframing inspiration unintended conseuqences the problem with computers

nutch - what i did different

http://zillionics.com/resources/articles/NutchGuideForDummies.htm 3 Setup the crawler: Create a directory called urls to hold the a text file with urls inside of it. becomes: Create a directory called seedlist to hold the text file with urls inside of it. In this directory, create the text file with any name you like. Put any URL’s line by line. This is the crawler’s “shopping list”. becomes: c:\nutch-1.0\seedlist\seedlist.txt = http://www.neocodesoftware.com/ 4. Edit the file conf/crawl-urlfilter.txt and replace MY.DOMAIN.NAME with +^http://([a-z0-9]*\.)*neocodesoftware.org/ Edit the file conf/nutch-site.xml. insert at minimum following properties into it and edit in proper values for the properties: http.agent.name neocode http.agent.description neocodeagent http.agent.url http://www.neocodeosftware.com http.agent.email sales@neocodesoftware.com searcher.dir crawl so bin/nutch crawl urls -dir crawl -depth 3 -topN 50 becomes bin/nutch crawl s

Hadoop - it says highly fault tolerant but does it ACT highly fault tolerant?

I have copy and pasted snippets from: http://hadoop.apache.org/common/docs/current/hdfs_design.html Note: I added my proprietary, copyrighted, handcrafted and uniquely Vancouver, BC, < fail > < /fail > tags. Note each tag is created with a physical serialized twin linked using the Imaginary Quantum Entanglement DRM. In addition to the < fail > < /fail > tags I also sell premium < FAIL > < /FAIL > tags. Email joshua@neocodesoftware.com to order yours today. (note: don't be fooled by the "counterfeit" < Fail > < /Fail > tags - these are open source, not imaginarily quantumly entangled, and not commercially supported knock offs made by my clone.) Hadoop - it says highly fault tolerant but does it ACT highly fault tolerant? 1. Introduction The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware... HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardwa