Pattern - Manual Hadoop Cluster Setup
It is technically possible to set up a hadoop cluster by hand.
Features
- The system administrators install Hadoop by hand
Advantages
- The system administrators get to learn what steps are needed to bring up a Hadoop cluster.
Disadvantages
- The system administrators repeat the steps needed to bring up a Hadoop node for every node in the cluster.
- Installation costs scale O(N) where N is the number of nodes.
- Maintenance costs scale O(N) where N is the number of nodes.
- If any machine is misconfigured, work on that machine could have a different outcome, or data could get lost.
- There's no formal means of monitoring cluster health.
Process
- Install the OS images; bring them up to date.
- Make sure that DNS is live, or edit every host's /etc/hosts file to make consistent.
- Install the Java runtime.
- expand the Hadoop tar files.
- Write your hadoop-site.xml with all site-specific configuration options.
- Copy it out to every site in the cluster.
- Decide which machines will be namenodes, datanodes, job trackers and task trackers.
- On the namenode, set the namenode script to run when the system boots.
- On the datanode, set the datanode script to run when the system boots.
- On the job tracker, set the job tracker script to run when the system boots.
- On any task tracker machines, set the task tracker script to run when the system boots.