Patterns of Hadoop Deployment
Hadoop only makes sense deployed onto a cluster, which means that you have to
- keep a whole set of machines up to date with code
- keep the hadoop cluster configuration consistent across the cluster
- push out the cluster configuration to everyone who can submit jobs
- lock down the LAN to keep out untrusted people (there is no more security in the Hadoop filesystem than NFS: it is based on trust). You can set up every host to only allow requests from trusted IPs, but there is still the risk of ARP corruption and forged IP addresses.
- monitor the health of the entire cluster
- pay a lot of attention to the NameNode, as that is the Single Point of Failure for the filesystem