Dev Bytes: HPC

Showing posts with label HPC. Show all posts

Sunday, August 13, 2006

HPCBench Now Supports Linux Kernel 2.6.X

I just updated the HPCBench utility last week. It now can work in latest Linux distributions with kernel 2.6.x.

You can visit http://hpcbench.sourceforge.net for more information about HPCBench.

Friday, October 08, 2004

HPCBench Now Open Source

Part of my work for High Performance Computing (HPC) analysis, I wrote a C/MPICH toolkit named HPCBench to evaluate the network performance for Linux-based clusters. Now it becomes an open-source project in SourceForge (http://hpcbench.sf.net).

Overview

Hpcbench is a Linux-based network benchmark evaluating the high performance networks such as Gigabit Ethernet, Myrinet and QsNet. Hpcbench measures the network latency and achievable throughput between two ends. Hpcbench is able to log the kernel information for each test, which includes the CPU and memory usage, interrupts, swapping, paging, context switches, network cards' statistics, etc.

Hpcbench consists of three independent packages that test UDP, TCP and MPI communications respectively. A kernel resources tracing tool "sysmon" is also included, whose output is similar to that of vmstat, but has more information of network statistics.

Programming language: C, MPI.
Recommended OS and compiler: Linux kernel 2.4 and gcc.

Features

UDP communication:

Microsecond resolution
Roundtrip time test (UDP ping)
Throughput test
Unidirectional and Bidirectional test
UDP traffic generator (can run in single mode)
Fixed size and exponential test
Log throughputs and process resource usage of each test
Log system resources information of client and server (Linux only)
Create plot configuration file for gnuplot
Configurable message size
Other tunable parameters:
- Port number
- Client and server's UDP socket buffer size
- Message size
- Packet (datagram) size
- Data size of each read/write
- QoS (TOS) type (Pre-defined six levels)
- Test time
- Test repetition
- Maximum throughput restriction (Unidirectional and UDP traffic generator)

TCP communication:

Microsecond resolution
Roundtrip Time test (TCP ping)
Throughput test
Unidirectional and Bidirectional test
Blocking and non-blocking test
Fixed size and exponential test
Linux sendfile() test
Log throughputs and process resource usage of each test
Log system resources information of client and server (Linux only)
Create plot configuration file for gnuplot
Configurable message size
Other tunable parameters:
- Port number
- Client and server's TCP socket buffer (window) size
- Message size
- Data size of each read/write
- Iteration of read/write
- MTU (MSS) setting
- TCP socket's TCP_NODELAY option setting
- TCP socket's TCP_CORK option setting
- QoS (TOS) type (Pre-defined six levels)
- Test time
- Test repetition

MPI communication:

Microsecond resolution
Roundtrip Time test (MPI ping)
Throughput test
Unidirectional and Bidirectional test
Blocking and non-blocking test
Fixed size and exponential test
Log throughputs and process resource usage of each test
Log system resources information of two processes (nodes) (Linux only)
Create plot configuration file for gnuplot
Tunable parameters:
- Message size
- Test time
- Test repetition

Monday, June 21, 2004

Performance Analysis of High Performance Computing Networks

After a lot of writings, reviews, rewriting, my thesis is finally completed! Thanks Dr. Bauer and Dr. Katchabaw's help.

(Oct. 2004 updated). The full thesis can be downloaded here.

Following is the table of context of my work:

Network Performance Measurement and Analysis in High Performance Computing Environments

Chapter 1 Introduction.

Chapter 2 Background. 3
2.1 HPC history and Its Convergence to Cluster Computing.
2.2 HPC networking.
2.2.1 High Performance Network Technologies.
2.2.2 Networking of HPC Clusters.
2.3 Message Passing Interface (MPI
2.3.1 MPI Introduction.
2.3.2 MPICH.
2.4 Job Management System.
2.4.1 Goals of JMS.
2.4.2 LSF (Load Share Facility).
2.5 File Systems in HPC Clusters
2.5.1 Storage Networking.
2.5.2 Cluster File Systems.
2.5.3 Network Storage in SHARCNET.
2.6 Test-bed specifications

Chapter 3 Implementation of Hpcbench.
3.1 A Survey of Network Measurement Tools.
3.2 Metrics.
3.3 Communication Model.
3.4 Timers and Timing.
3.5 Iteration Estimation and Communication Synchronization.
3.6 System Resource Tracing.
3.7 UDP Communication Measurement Considerations.
3.8 An overivew of Hpcbench.

Chapter 4 Investigation of Gigabit Ethernet in HPC Clusters.
4.1 A Closer Look at Gigabit Ethernet
4.1.1 Protocol Properties.
4.1.2 Interrupts Coalescence and Jumbo Frame Size.
4.1.3 Data Buffers and Zero-Copy Technique.
4.2 Network Performance Analysis of Gigabit Ethernet
4.2.1 Examining Network Protocols Communication Internal
4.2.1.1 Alpha SMP Architecture.
4.2.1.2 Intel Xeon SMP Architecture.
4.2.2 Network Performance vs. Computer Performance.
4.2.3 Blocking and Non-blocking Communication.
4.2.4 Local Communication.
4.2.5 Network Protocols Latency.
4.2.6 TCP/IP Communication Throughput
4.3 A Comparison with Myrinet and Quadrics Interconnects

Chapter 5 Conclusions and Future Work.
5.1 Summary and Conclusions
5.2 Future Work.

Reference

Tuesday, March 16, 2004

MPICH Cluster Setup

This is a test to setup a cluster with two nodes using my home machines.

/etc/hosts (Redhat 9.0 with kernel 2.4.20) files:
Dell Inspiron 8100 (master node): 192.182.1.2 node1
Dell Dimension L600 (secondary node): 192.182.1.3 node2

Download the MPICH 1.2.5.2 from http://www-unix.mcs.anl.gov/mpi/mpich/ , follow the instruction to install on both machines. MPICH uses rsh or ssh to communicate with each other. The default is rsh. If you like to use ssh (secure shell) instead, you should configure with following parameters in the MPICH install directory:

[root]# ./configure --with-device=ch_p4 --prefix=/usr/local/mpich --rsh=ssh
[root]# make

After installation, add the /mpich_install_dir/bin and /mpich_install_dir/util to your $PATH environment. To let the master node ( laptop node1) know the other secondary nodes, add all nodes in the file /mpich_install_dir/util/machines/machines.LINUX:

[huang]$ cat machines.LINUX
# Change this file to contain the machines that you want to use
# to run MPI jobs on. The format is one host name per line, with either
# hostname
# or
# hostname:n
# where n is the number of processors in an SMP. The hostname should
# be the same as the result from the command "hostname"
#localhost.localdomain
node1
node2

To enable rsh (remote shell), edit the /etc/xinetd.d/rsh, change the line of "disable = yes" to "disable = no". To be convenient, I also enable the rlogin service. After the modification, you have to restart the xinetd daemon:

[root]# /etc/rc.d/init.d/xinetd restart

To let the node1 (master node) be able to run the programs in node2 automatically without password prompt, add .rhosts file in user's home directory of node2:

[huang] $ cat ~/.rhosts
node1 huang

Also, the /etc/hosts.allow and /etc/hosts.deny files must be correctly configured to allow the rsh service. For simplicity reason, I add following line on the /etc/hosts.allow file to accept all services between two machines:

ALL: node1 node2 192.182.1.0/255.255.255.0, 127.0.0.1

To allow the super user root to use the rsh and rlogin services, add another line on file /etc/securetty:

rsh, rlogin, rexec, pts/0, pts/1

The authentication file /etc/pam.d/rsh should also be modified:

[root]# cat /etc/pam.d/rsh
#%PAM-1.0
# For root login to succeed here with pam_securetty, "rsh" must be
# listed in /etc/securetty.
auth sufficient /lib/security/pam_nologin.so
auth optional /lib/security/pam_securetty.so
auth sufficient /lib/security/pam_env.so
auth sufficient /lib/security/pam_rhosts_auth.so
account sufficient /lib/security/pam_stack.so service=system-auth
session sufficient /lib/security/pam_stack.so service=system-auth

We could verify the rsh service in master node (node1):

[huang]$ rsh node2 "ps -ef"

Then the running processes in node2 will be showed on node1. Now it's ready to run parallel programs. There are some sample programs in /mpich_install_dir/examples/basic directory, enter the directory and compile the source code with command make (in both machines), e.g., cpi is MPI program to compute the PI value:

[huang]$ mpirun -np 2 cpi
Process 0 of 2 on node1
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.001943
Process 1 of 2 on node2

Make sure the executable files in each machine must be in the same directory structure. We could also specify a configure file instead of using the default machines.LINUX configuration:

[huang]$ cat my.conf
node1 0 /home/huang/cpi
node2 1 /huang/cpi
[huang]$ mpirun -p4pg my.conf cpi
Process 0 of 1 on node1
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.002097
Process 1 of 2 on node2
P4 procgroup file is my.conf.
[huang]$

Enjoy the power of parallel computing!