Tuesday, March 16, 2004

MPICH Cluster Setup

This is a test to setup a cluster with two nodes using my home machines.

/etc/hosts (Redhat 9.0 with kernel 2.4.20) files:
Dell Inspiron 8100 (master node): 192.182.1.2 node1
Dell Dimension L600 (secondary node): 192.182.1.3 node2

Download the MPICH 1.2.5.2 from http://www-unix.mcs.anl.gov/mpi/mpich/ , follow the instruction to install on both machines. MPICH uses rsh or ssh to communicate with each other. The default is rsh. If you like to use ssh (secure shell) instead, you should configure with following parameters in the MPICH install directory:

[root]# ./configure --with-device=ch_p4 --prefix=/usr/local/mpich --rsh=ssh
[root]# make

After installation, add the /mpich_install_dir/bin and /mpich_install_dir/util to your $PATH environment. To let the master node ( laptop node1) know the other secondary nodes, add all nodes in the file /mpich_install_dir/util/machines/machines.LINUX:

[huang]$ cat machines.LINUX
# Change this file to contain the machines that you want to use
# to run MPI jobs on. The format is one host name per line, with either
# hostname
# or
# hostname:n
# where n is the number of processors in an SMP. The hostname should
# be the same as the result from the command "hostname"
#localhost.localdomain
node1
node2

To enable rsh (remote shell), edit the /etc/xinetd.d/rsh, change the line of "disable = yes" to "disable = no". To be convenient, I also enable the rlogin service. After the modification, you have to restart the xinetd daemon:

[root]# /etc/rc.d/init.d/xinetd restart

To let the node1 (master node) be able to run the programs in node2 automatically without password prompt, add .rhosts file in user's home directory of node2:

[huang] $ cat ~/.rhosts
node1 huang

Also, the /etc/hosts.allow and /etc/hosts.deny files must be correctly configured to allow the rsh service. For simplicity reason, I add following line on the /etc/hosts.allow file to accept all services between two machines:

ALL: node1 node2 192.182.1.0/255.255.255.0, 127.0.0.1

To allow the super user root to use the rsh and rlogin services, add another line on file /etc/securetty:

rsh, rlogin, rexec, pts/0, pts/1

The authentication file /etc/pam.d/rsh should also be modified:

[root]# cat /etc/pam.d/rsh
#%PAM-1.0
# For root login to succeed here with pam_securetty, "rsh" must be
# listed in /etc/securetty.
auth sufficient /lib/security/pam_nologin.so
auth optional /lib/security/pam_securetty.so
auth sufficient /lib/security/pam_env.so
auth sufficient /lib/security/pam_rhosts_auth.so
account sufficient /lib/security/pam_stack.so service=system-auth
session sufficient /lib/security/pam_stack.so service=system-auth

We could verify the rsh service in master node (node1):

[huang]$ rsh node2 "ps -ef"

Then the running processes in node2 will be showed on node1. Now it's ready to run parallel programs. There are some sample programs in /mpich_install_dir/examples/basic directory, enter the directory and compile the source code with command make (in both machines), e.g., cpi is MPI program to compute the PI value:

[huang]$ mpirun -np 2 cpi
Process 0 of 2 on node1
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.001943
Process 1 of 2 on node2

Make sure the executable files in each machine must be in the same directory structure. We could also specify a configure file instead of using the default machines.LINUX configuration:

[huang]$ cat my.conf
node1 0 /home/huang/cpi
node2 1 /huang/cpi
[huang]$ mpirun -p4pg my.conf cpi
Process 0 of 1 on node1
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.002097
Process 1 of 2 on node2
P4 procgroup file is my.conf.
[huang]$

Enjoy the power of parallel computing!