Patterns of Hadoop Deployment

Contents

Patterns of Hadoop Deployment

Hadoop only makes sense deployed onto a cluster, which means that you have to

  1. keep a whole set of machines up to date with code
  2. keep the hadoop cluster configuration consistent across the cluster
  3. push out the cluster configuration to everyone who can submit jobs
  4. lock down the LAN to keep out untrusted people (there is no more security in the Hadoop filesystem than NFS: it is based on trust). You can set up every host to only allow requests from trusted IPs, but there is still the risk of ARP corruption and forged IP addresses.
  5. monitor the health of the entire cluster
  6. pay a lot of attention to the NameNode, as that is the Single Point of Failure for the filesystem
Get SmartFrog at SourceForge.net. Fast, secure and Free Open Source software downloads