Setting up a Hadoop test VM

Contents

Setting up a Hadoop test VM

This documents how to bring up an RHEL5 or CentOS VMWare image.

Getting vmware modules working

To get full VMWare integration, you need to install the VMWare tools. This comes with prebuilt modules for many Linux releases, but usually after a couple of system updates, the tools stop working and you need to reinstall them.

Run

Unknown macro: {code}

vmware-config-tools.pl

This will try and rebuild and reinstall the modules. If it fails to find headers, install the kernel development tools:

Unknown macro: {code}

yum install kernel-devel kernel-headers gcc
vmware-config-tools.pl

After configuring, reboot.

Installing Java

In theory, Hadoop runs with only the JRE, not the JDK. This means that all you need is the jre RPM, which can be done using Yum.

Run java -version to test that Java is present and that its version is Java1.6+. (The cluster examples deploy components to test this, but early checking is convenient). Hadoop requires you to run Sun's Java, although we have not encountered many problems with JRockit.

Fixing the networking

  1. If the VM is to be run on a desktop, then DHCP and DNS configuration for the VM will let the host and VM talk to each other without any conf
  2. If the VM is to be used on a laptop, the only reliable way to maintain communications between host and VM is to run the VM on a private host-only networking.
  3. Both types of machine should not be running IPv6 until Hadoop and your network infrastructure supports it.
  4. Both types of machine <i>must not</i> have any entries in /etc/hosts matching the machine name to localhost, 127.0.0.1, ::1 or similar.

Disabling IPv6

To disable IPv6, add two lines to /etc/modprobe.conf

Unknown macro: {code}

alias net-pf-10 off
alias ipv6 off

Then update-modules and reboot.

Connecting the host and test machines

  1. Pick a name for the test machine, e.g. "rosyth". Make sure it is not on the external network of the development machine
  2. Pick an IP Address on the machine-only subnet (the one assigned by DHCP is a good start). Here: 192.168.66.129
  3. Determine the IP address of the host on the subnet. Here 192.168.66.1
  4. In the VM, set up the machine with a static IP address set to the chosen address - 192.168.66.129, and a subnet mask such as 255.255.255.0
  5. Edit its /etc/hosts table to contain both its local hostname to IP address mapping, but that of the host computer
    Unknown macro: {code}

    $ cat /etc/hosts
    127.0.0.1 localhost.localdomain localhost
    192.168.66.129 rosyth
    192.168.66.1 morzine

  6. Reboot the VM
  7. Test: ping the host from the VM
    Unknown macro: {code}

    [smartfrog@rosyth ~]$ ping 192.168.66.1
    PING 192.168.66.1 (192.168.66.1) 56(84) bytes of data.
    64 bytes from 192.168.66.1: icmp_seq=1 ttl=64 time=0.946 ms
    64 bytes from 192.168.66.1: icmp_seq=2 ttl=64 time=0.066 ms
    64 bytes from 192.168.66.1: icmp_seq=3 ttl=64 time=0.181 ms

  8. To reach the VM from the host, add the entry for rosyth into the physical machine's /etc/hosts table
  9. Test: ping the virtual host (rosyth) from the physical one (morzine)
    Unknown macro: {code}

    morzine:~> ping rosyth
    PING rosyth (192.168.66.129) 56(84) bytes of data.
    64 bytes from rosyth (192.168.66.129): icmp_seq=1 ttl=64 time=0.678 ms
    64 bytes from rosyth (192.168.66.129): icmp_seq=2 ttl=64 time=0.174 ms
    64 bytes from rosyth (192.168.66.129): icmp_seq=3 ttl=64 time=0.147 ms

  10. Finally, start SmartFrog with the -d option for network diagnostics
    Unknown macro: {code}

    [smartfrog@rosyth ~]$ sfDaemon -d
    ...

    -------------------------------------------
    Network
    -------------------------------------------
    Network test localhost: hostname 'rosyth', ip '192.168.66.129', [Successful], 0ms
    Network test remotehost (http://www.smartfrog.org/): [Failed], Failed to resolve remote hostname 'www.smartfrog.org', 1ms, java.net.UnknownHostException: www.smartfrog.org
    ...

    This shows that the machine knows who it is, and is listening on a network port accessible by the host machine, but that it has no external network access, as not even DNS is working.

For troubleshooting SmartFrog, look at Troubleshooting

Get SmartFrog at SourceForge.net. Fast, secure and Free Open Source software downloads