Supercomputer
This document provides (or will provide) the following information:
- A very brief overview of some selected cluster platforms
- How I approached the task of building my own supercomputer using Kerrighed
- Detailed instructions to build your own supercomputer using Kerrighed
- Detailed instructions to benchmark your Kerrighed supercomputer
- Detailed instructions to do more advanced things with your Kerrighed supercomputer
This document does not answer the following questions:
- What is a supercomputer?
- What are the basic concepts of parallel programing?
- Why do you have so many computers stacked up in your basement?
- Where is the restroom?
Contents
Purpose
I have a bunch of older machines. There's nothing wrong with them, and they're not really all that old. I have a need for a multi-CPU system for some development work and for running Virtual Machines. Cluster computing and parallel processing have always interested me, so why not build a supercomputer in my basement?
Choosing A Cluster Platform
There are quite a few open source cluster platforms. My requirements include:
- Must be free (as in no monetary cost), preferably Open Source
- Must not require any special compilers, procedures, or programming languages - I want to run programs and have them execute somewhere in the cloud automatically, and transparently.
- Must support a graphical environment - I want to have a multi-monitor setup (starting with 2, maybe up to 6 at a later date) to fit all the programs I'm running in parallel and to impress the less technically savvy.
- Preferably Linux based - since I know it, and it is no monetary cost
- Preferably supports running virtual machines that can use cores from multiple cluster nodes simultaneously, or at least allows some way to specify that the virtual machine's process should consume all available resources on one cluster node
- Preferably using a recent kernel, if kernel patches are required
- Preferably supported by a reasonably active community, if open source
NOTE: The analysis below was undertaken in mid-November 2009. Things change over time. Do not let my findings discourage you from looking into the systems I have not used. Most of these are really neat, but do not meet my specific requirements for one reason or another.
Kerrighed
Kerrighed will be used for this project. Reasons why others were dismissed are discussed in each section.
Kerrighed is a Single Server Image (SSI) system, meaning that all the nodes share the same file system. Kerrighed combines process management date from all nodes, which means we can use normal tools (ps, top) to see everything that is happening on the cluster as if it were all one one single machine. Process IDs are unique throughout the cluster, all memory is shared, all processors are visible from all nodes. Processes can migrate, or move, to other nodes automatically to balance the load.
How it measures up against the requirements:
- It's free and open source
- Normal processes are migrated - there is no need for special compilers or toolkits
- There does not seem to be anything in the way of running a graphical environment, but it may be difficult to set up only one or two nodes with high-powered graphics cards - this requires further investigation
- It's Linux based - Kerrighed is a set of kernel patches and modules with some supporting tools
- It seems like it's possible to run virtual machines, assuming the virtual machine server can be compiled on the same kernel version as Kerrighed. I'm not sure if it is possible to span a single VM across multiple physical systems yet - this requires further research and experimentation. Even if I can't, so long as VMs can be run, that is acceptable.
- Currently using kernel 2.6.20, which is fine
- Updates as recent as last month
OpenSSI
- Latest Release: August 2006
- Dismissed because it does not support recent kernels right now
openMosix
- Project terminated on March 1, 2008
MOSIX
- Not free or open source
- Free only for researchers and students
LinuxPMI
- Continuation of openMosix
- Work on updating for newer kernels is ongoing
- Dismissed because documentation is lacking, and it doesn't seem like the stuff for newer kernels is ready yet
PelicanHPC
- A Knoppix-based LiveCD for creating clusters quickly
- Dismissed because I want a full installation, not a LiveCD
- Seems like an active project, and uses a recent kernel
Chromium
- Dismissed because it is specific to graphics and rendering, not general purpose computing
Sun Grid Engine
- Dismissed because it doesn't support moving normal processes around the cluster - all "jobs" are MPI based, and must be explicitly started using a special tool
- Jobs can be moved around, and there is load balancing
- There are features to move jobs off of systems that are actively being used as workstations
SLURM
Simple Linux Utility for Resource Management
- Similar looking to Sun Grid Engine
OpenNebula
- A virtual machine based "cloud computing" system
- Dismissed because there doesn't seem to be a way to have a single VM use processors from more than one physical system simultaneously. (Meaning I would have a bunch of single-CPU virtual machines with no parallel computation features)
- Supports expanding resource usage to the Amazon Cloud
Eucalyptus
- Similar concept to OpenNebula
- More commercialized
Ubuntu Cloud
- Based on Eucalyptus
- Fairly commercialized product, minimal documentation
Scyld
- Beowulf style cluster
- Not free or open source
Implementation Plan
- Stage 1 - Proof of Concept (Started 14-Nov-2009)
- This stage will use minimal hardware (1 storage node, 2 cluster nodes, 10/100 networking) to demonstrate that the solution can meet the requirements set forward. An installation procedure and power up/down procedures will also be produced during this stage.
- Stage 2 - Basic Implementation
- This stage will use expanded hardware (maximum number of cluster nodes available - probably about 6, Gigabit networking) to demonstrate the full potential of the cluster. This stage will start with a fresh install, using the procedure created in the Proof-of-concept. The setup created here will be expanded into the final implementation. Procedures for managing processes throughout the cluster will be investigated and developed. A benchmark procedure will be investigated, developed, and demonstrated. Virtual machine host installation and usage procedures will be investigated, developed, and demonstrated.
- Stage 3 - Expanded Implementation
- This stage will build on the implementation from Stage 2 - no new computing hardware will be added. The graphical environment will be installed and tested. Multi-monitor support will be added. Scripts will be created to automatically power up all nodes, to initialize the cluster, to shutdown all nodes, and for other tasks.
- Stage 4 - Performance Evaluation
- Benchmark tests will be run to test the system's capability. If problems are found, they will be addressed and the tests will be run again. The affected procedures will be updated accordingly.
Procedures
Setup Overview and Network Configuration
There will be one storage node and an arbitrary number of cluster nodes. The cluster nodes will reside on their own private network, managed by the storage node.
- Storage node
- The storage node will not run a Kerrighed kernel. Its responsibilities are only to provide the nodes with network addresses, the PXE boot image, and the shared file system. This system will have 2 network cards - one connected to the external network, the other to the cluster network. DHCP and PXE services will only be provided to the cluster network. For my setup, the storage node will be named "frank". Its IP address on the external network will be assigned by DHCP somewhere in the 192.168.80.0/24 subnet. Its IP address on the cluster network will be 192.168.81.1.
- Cluster node
- Each cluster node must be capable of PXE booting. They should all be able to run x86/i386 operating systems, since that's what the storage node will be providing.
- External Network
- The external network must provide internet access, to make the install process go more quickly and easily. The storage node will be the only system in the cluster that is connected directly to the external network switch. For my setup, the external network will use the 192.168.80.0/24 subnet.
- Cluster Network
- All cluster nodes will be connected to a private cluster network. The storage node will provide IP addresses to this network using DHCP. The storage node will act as a gateway to the external network for the cluster nodes. For my setup, the cluster network will use the 192.168.81.0/24 subnet.
Installation and Configuration
There is quite a bit of assumed knowledge here. Read all the directions before you start, and make sure you understand what is going on.
Operating System for the Storage Node
- Install a Linux distribution - this step takes a while... Have a sandwich ready.
- I've used Debian Lenny, typical install (first CD only)
- It doesn't technically matter what we use, but these directions are tested with Lenny, and use Debian packages
- The cluster nodes will run Debian, regardless of what you pick to run the storage node with.
- Configure the network by making /etc/network/interfaces look something like this:
# The loopback network interface auto lo iface lo inet loopback # The external network interface allow-hotplug eth0 iface eth0 inet dhcp # The cluster network interface auto eth1 iface eth1 inet static address 192.168.81.1 netmask 255.255.255.0
Get Software for the Storage Node
- Get all the packages we need by running the command below.
- DHCP Server to provide IP addresses and trigger PXE booting on the cluster network
- TFTP (trivial FTP) to provide the boot images to the cluster nodes
- NFS (Network File System) to provide the shared file system, which will be used by all cluster nodes (uses portmap for RPC calls)
- syslinux, a bootloader for our cluster nodes
- debootstrap, a program to make a Linux installation when provided a web URL
apt-get install dhcp3-server tftpd-hpa portmap syslinux nfs-kernel-server nfs-common debootstrap
Configure TFTP Server
- Make /etc/default/tftpd-hpa look like this (if it doesn't already):
RUN_DAEMON="yes" OPTIONS="-l -s /var/lib/tftpboot"
- Make /etc/inetd.conf contain this line (if it doesn't already):
tftp dgram udp wait root /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot
- Copy the PXE bootloader to the TFTP directory by running:
cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot
- Make the configuration directory:
mkdir /var/lib/tftpboot/pxelinux.cfg
- Set up a default configuration
- This will be what is used if no other configurations apply - since we're not making any, this will be the case all the time
- Make /var/lib/tftpboot/pxelinux.cfg/default look like this (substituting in your storage node's IP, of course):
LABEL linux KERNEL vmlinuz-2.6.20-krg APPEND console=tty1 root=/dev/nfs nfsroot=192.168.81.1:/nfsroot/kerrighed ip=dhcp rw session_id=1
Configure DHCP Server
- Set the interface that the DHCP server will run on - we don't want to conflict with DHCP on the external network...
- Make /etc/default/dhcp3-server look something like this (substituting your external interface):
INTERFACES="eth1"
- Configure the DHCP server to provide network configuration and boot images
- Make /etc/dhcp3/dhcpd.conf look like this:
option domain-name-servers 192.168.80.1; # change this to your DNS server default-lease-time 86400; max-lease-time 604800; authoritative; subnet 192.168.81.0 netmask 255.255.255.0 { # change this to whatever subnet you like range 192.168.81.101 192.168.81.200; # change this to whatever range you like filename "pxelinux.0"; # we copied this into the TFTP server earlier next-server 192.168.81.1; option subnet-mask 255.255.255.0; option broadcast-address 192.168.81.255; option routers 192.168.81.1; # we'll come back to this later too... } # if you want to set specific IPs for certain machines, uncomment and modify this to you needs: #host node1 { # fixed-address 192.168.81.101; # pick an address # hardware ethernet FF:FF:FF:FF:FF:FF; # put the machine's MAC address here #}
Configure NFS
- Make a place for the shared file system
mkdir /nfsroot/kerrighed
- Add the following line to /etc/exports (substitute your storage node's IP)
/nfsroot/kerrighed 192.168.81.0/255.255.255.0(rw,no_subtree_check,async,no_root_squash)
- Export the filesystem
exportfs -avr
Operating System for Cluster Nodes
- Make a Debian installation for the cluster nodes
debootstrap –arch i386 lenny /nfsroot/kerrighed http://ftp.us.debian.org/debian
- Copy the apt sources list into the new system so we can install new stuff on cluster nodes
cp /etc/apt/sources.list /nfsroot/kerrighed/etc/apt/sources.list
- chroot into the new system
chroot /nfsroot/kerrighed
- Set the root password
passwd
- Get a /proc directory
mount -t proc none /proc
- Avoid lots of annoying warnings
export LC_ALL=C
- Get all the stuff we'll need to talk to the storage node and let people use the cluster nodes
apt-get update && apt-get install dhcp3-common nfs-common nfsbooted openssh-server
- Make the "localhost" hostname work by putting the following into /etc/hosts
127.0.0.1 localhost
- Have the filesystem mount automatically for us
ln -sf /etc/network/if-up.d/mountnfs /etc/rcS.d/S34mountnfs
- Add a user so we can log in to the cluster nodes
adduser
- Configure the network by making /etc/network/interfaces look like this:
auto lo iface lo inet loopback iface eth0 inet dhcp
Compile the Kerrighed Kernel
(Still chroot'ed into the cluster node image)
- Get the packages that we'll need to build the Kerrighed kernel
apt-get install automake autoconf libtool pkg-config gawk rsync bzip2 libncurses5 libncurses5-dev wget lsb-release xmlto patchutils xutils-dev build-essential
- Get in the source directory
cd /usr/src
- Get the Kerrighed source, decompress it, and make it easier to find later
wget http://gforge.inria.fr/frs/download.php/23356/kerrighed-2.4.1.tar.gz
gzip -dc kerrighed-2.4.1.tar.gz | tar xf -
ln -s kerrighed-2.4.1/ kerrighed
- Get the Linux 2.6.20 source (this is the ONLY version Kerrighed 2.4.1 works with), and extract it
wget -O /usr/src/linux-2.6.20.tar.bz2 http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.tar.bz2
tar jxf linux-2.6.20.tar.bz2
- Get into the Kerrighed source
cd kerrighed
- Run the configure script to configure things correctly
./configure
- Get into the kernel directory, and run defconfig
cd kernel
make defconfig
- Run menuconfig to set all the options that we need
make menuconfig
- While in menuconfig, do the following:
- Make sure any network cards that you need to support are installed in the kernel - NOT as modules. You can find this under Device drivers --> Network device support
- Make sure NFS filesystem support is compiled into the kernel - NOT as a module. You can find this under File Systems --> Network File Systems
- Make sure Loadable module support --> Automatic kernel module loading is activated
- Exit menuconfig
- Go to the Kerrighed directory, and make the kernel
cd ..
make kernel
- Assuming there were no errors, it's time for the big one. You'll have time for a slice of pie...
make
- Once that's done, do the wrapup work to install everything in the right places
make kernel-install
make install
ldconfig
- Exit the chroot
exit
Configure Kerrighed
- Make the Kerrighed kernel avaliable via TFTP
cp /nfsroot/kerrighed/boot/vmlinuz-2.6.20-krg /var/lib/tftpboot/
- Make sure the following line appears in /nfsroot/kerrighed/etc/default/kerrighed
ENABLE=true
- Make /nfsroot/kerrighed/etc/kerrighed_nodes look like this:
session=1 #Value can be 1 – 254 nbmin=2 #2 nodes starting up with the Kerrighed kernel. 192.168.81.101:1:eth0 192.168.81.102:2:eth0
Power Up
- If you just finished the installation procedure, either reboot, or set up the network and start all the servers - like this:
ifconfig eth1 192.168.81.1
/etc/init.d/tftpd-hpa start
/etc/init.d/dhcp3-server start
/etc/init.d/portmap start
/etc/init.d/nfs-kernel-server start
- Enable PXE boot on each cluster node (usually a BIOS setting)
- Make sure the nodes are connected to the correct switch
- Turn on the nodes - they should boot
Initialize Cluster
Power Down
Performance Benchmark
Hardinfo
FLOPS
http://linux.maruhn.com/sec/flops.html
Linpack
Used by TOP500 http://www.top500.org/project/linpack http://www.netlib.org/lapack/ http://www.netlib.org/benchmark/linpackjava/
HPC Challenge Benchmark
dhrystone
whetstone
Kerrighed test directory
http://lxr.kerlabs.com/kerrighed/source/tests/benchmark/?v=devel-kdfs
BogoMIPS
Script Installation
Journal
November 2009
- Started this page, outlined the plans
- Wrote up analysis of cluster platforms
- First cluster nodes PXE booted! Nov 17
- Started to add information about benchmarking software packages
References
- How to deploy kerrighed nodes massively using DRBL
- Ubuntu Clustering thread on forums
- Easy Ubuntu Clustering specification
- Ubuntu Kerrighed Cluster Guide (on edubuntu wiki)
- Kerrighed Installation @ In da Wok
- Kerrighed 2.4.0 Install guide
- How to Set Up a High Performance Cluster (HPC) Using Debian Lenny and Kerrighed
- Setting up a server for PXE network booting