|
Contents
|
Pattern - Hadoop Cluster Setup with Hadoop files under SCM
Features
- An SCM tool such as ClearCase or SVN is used to store the Hadoop binaries and configuration files.
- The nodes in the cluster are configured to update their local cache of these files before starting hadoop.
- The nodes then boot off this local cache.
Advantages
- It is possible to have the nodes fall back to their cached copies if the server is not responding, and so eliminate the SVN repository as a point of failure for the live cluster.
- All changes are now under SCM, so can be rolled back and checkpointed. Furthermore, all changes to the configuration files can be viewed and audited.
- Maintenance costs are now O(1).
- Different branches of Hadoop and the configuration files can now be experimented with.
Disadvantages
- For very large clusters, there will be significant load on the SCM repository during boot-up. This will create network congestion and could overload the server.
- If the nodes fall back to their cached copies, there is a risk that under load, some parts of the cluster will fall out of sync with the rest.
- Experimentation with different hadoop-site.xml configurations and hadoop versions must be managed carefully, so as not to corrupt or accidentally upgrade the HDFS filesystem.
|