Pattern - Hadoop Cluster Setup with Hadoop files under SCM

Contents

Pattern - Hadoop Cluster Setup with Hadoop files under SCM

Features

  • An SCM tool such as ClearCase or SVN is used to store the Hadoop binaries and configuration files.
  • The nodes in the cluster are configured to update their local cache of these files before starting hadoop.
  • The nodes then boot off this local cache.

Advantages

  • It is possible to have the nodes fall back to their cached copies if the server is not responding, and so eliminate the SVN repository as a point of failure for the live cluster.
  • All changes are now under SCM, so can be rolled back and checkpointed. Furthermore, all changes to the configuration files can be viewed and audited.
  • Maintenance costs are now O(1).
  • Different branches of Hadoop and the configuration files can now be experimented with.

Disadvantages

  • For very large clusters, there will be significant load on the SCM repository during boot-up. This will create network congestion and could overload the server.
  • If the nodes fall back to their cached copies, there is a risk that under load, some parts of the cluster will fall out of sync with the rest.
  • Experimentation with different hadoop-site.xml configurations and hadoop versions must be managed carefully, so as not to corrupt or accidentally upgrade the HDFS filesystem.
Get SmartFrog at SourceForge.net. Fast, secure and Free Open Source software downloads