Setting up a Hadoop test VM
This documents how to bring up an RHEL5 or CentOS VMWare image.
Getting vmware modules working
To get full VMWare integration, you need to install the VMWare tools. This comes with prebuilt modules for many Linux releases, but usually after a couple of system updates, the tools stop working and you need to reinstall them.
Run
Unknown macro: {code}
vmware-config-tools.pl
This will try and rebuild and reinstall the modules. If it fails to find headers, install the kernel development tools:
Unknown macro: {code}
yum install kernel-devel kernel-headers gcc
vmware-config-tools.pl
After configuring, reboot.
Installing Java
In theory, Hadoop runs with only the JRE, not the JDK. This means that all you need is the jre RPM, which can be done using Yum.
Run java -version to test that Java is present and that its version is Java1.6+. (The cluster examples deploy components to test this, but early checking is convenient). Hadoop requires you to run Sun's Java, although we have not encountered many problems with JRockit.
Fixing the networking
- If the VM is to be run on a desktop, then DHCP and DNS configuration for the VM will let the host and VM talk to each other without any conf
- If the VM is to be used on a laptop, the only reliable way to maintain communications between host and VM is to run the VM on a private host-only networking.
- Both types of machine should not be running IPv6 until Hadoop and your network infrastructure supports it.
- Both types of machine <i>must not</i> have any entries in /etc/hosts matching the machine name to localhost, 127.0.0.1, ::1 or similar.
Disabling IPv6
To disable IPv6, add two lines to /etc/modprobe.conf
Unknown macro: {code}
alias net-pf-10 off
alias ipv6 off
Then update-modules and reboot.
Connecting the host and test machines
- Pick a name for the test machine, e.g. "rosyth". Make sure it is not on the external network of the development machine
- Pick an IP Address on the machine-only subnet (the one assigned by DHCP is a good start). Here: 192.168.66.129
- Determine the IP address of the host on the subnet. Here 192.168.66.1
- In the VM, set up the machine with a static IP address set to the chosen address - 192.168.66.129, and a subnet mask such as 255.255.255.0
- Edit its /etc/hosts table to contain both its local hostname to IP address mapping, but that of the host computer
Unknown macro: {code}
$ cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.66.129 rosyth
192.168.66.1 morzine
- Reboot the VM
- Test: ping the host from the VM
Unknown macro: {code}
[smartfrog@rosyth ~]$ ping 192.168.66.1
PING 192.168.66.1 (192.168.66.1) 56(84) bytes of data.
64 bytes from 192.168.66.1: icmp_seq=1 ttl=64 time=0.946 ms
64 bytes from 192.168.66.1: icmp_seq=2 ttl=64 time=0.066 ms
64 bytes from 192.168.66.1: icmp_seq=3 ttl=64 time=0.181 ms
- To reach the VM from the host, add the entry for rosyth into the physical machine's /etc/hosts table
- Test: ping the virtual host (rosyth) from the physical one (morzine)
Unknown macro: {code}
morzine:~> ping rosyth
PING rosyth (192.168.66.129) 56(84) bytes of data.
64 bytes from rosyth (192.168.66.129): icmp_seq=1 ttl=64 time=0.678 ms
64 bytes from rosyth (192.168.66.129): icmp_seq=2 ttl=64 time=0.174 ms
64 bytes from rosyth (192.168.66.129): icmp_seq=3 ttl=64 time=0.147 ms
- Finally, start SmartFrog with the -d option for network diagnostics
Unknown macro: {code}
[smartfrog@rosyth ~]$ sfDaemon -d
...
-------------------------------------------
Network
-------------------------------------------
Network test localhost: hostname 'rosyth', ip '192.168.66.129', [Successful], 0ms
Network test remotehost (http://www.smartfrog.org/): [Failed], Failed to resolve remote hostname 'www.smartfrog.org', 1ms, java.net.UnknownHostException: www.smartfrog.org
...
This shows that the machine knows who it is, and is listening on a network port accessible by the host machine, but that it has no external network access, as not even DNS is working.
For troubleshooting SmartFrog, look at Troubleshooting