Supercomputer

From JimboWiki
Jump to: navigation, search

This document provides (or will provide) the following information:

  • A very brief overview of some selected cluster platforms
  • How I approached the task of building my own supercomputer using Kerrighed
  • Detailed instructions to build your own supercomputer using Kerrighed
  • Detailed instructions to benchmark your Kerrighed supercomputer
  • Detailed instructions to do more advanced things with your Kerrighed supercomputer

This document does not answer the following questions:

  • What is a supercomputer?
  • What are the basic concepts of parallel programing?
  • Why do you have so many computers stacked up in your basement?
  • Where is the restroom?

Purpose

I have a bunch of older machines. There's nothing wrong with them, and they're not really all that old. I have a need for a multi-CPU system for some development work and for running Virtual Machines. Cluster computing and parallel processing have always interested me, so why not build a supercomputer in my basement?

Choosing A Cluster Platform

There are quite a few open source cluster platforms. My requirements include:

  • Must be free (as in no monetary cost), preferably Open Source
  • Must not require any special compilers, procedures, or programming languages - I want to run programs and have them execute somewhere in the cloud automatically, and transparently.
  • Must support a graphical environment - I want to have a multi-monitor setup (starting with 2, maybe up to 6 at a later date) to fit all the programs I'm running in parallel and to impress the less technically savvy.
  • Preferably Linux based - since I know it, and it is no monetary cost
  • Preferably supports running virtual machines that can use cores from multiple cluster nodes simultaneously, or at least allows some way to specify that the virtual machine's process should consume all available resources on one cluster node
  • Preferably using a recent kernel, if kernel patches are required
  • Preferably supported by a reasonably active community, if open source

NOTE: The analysis below was undertaken in mid-November 2009. Things change over time. Do not let my findings discourage you from looking into the systems I have not used. Most of these are really neat, but do not meet my specific requirements for one reason or another.

Kerrighed

Kerrighed will be used for this project. Reasons why others were dismissed are discussed in each section.

Kerrighed is a Single Server Image (SSI) system, meaning that all the nodes share the same file system. Kerrighed combines process management date from all nodes, which means we can use normal tools (ps, top) to see everything that is happening on the cluster as if it were all one one single machine. Process IDs are unique throughout the cluster, all memory is shared, all processors are visible from all nodes. Processes can migrate, or move, to other nodes automatically to balance the load.

How it measures up against the requirements:

  • It's free and open source
  • Normal processes are migrated - there is no need for special compilers or toolkits
  • There does not seem to be anything in the way of running a graphical environment, but it may be difficult to set up only one or two nodes with high-powered graphics cards - this requires further investigation
  • It's Linux based - Kerrighed is a set of kernel patches and modules with some supporting tools
  • It seems like it's possible to run virtual machines, assuming the virtual machine server can be compiled on the same kernel version as Kerrighed. I'm not sure if it is possible to span a single VM across multiple physical systems yet - this requires further research and experimentation. Even if I can't, so long as VMs can be run, that is acceptable.
  • Currently using kernel 2.6.20, which is fine
  • Updates as recent as last month

OpenSSI

OpenSSI

  • Latest Release: August 2006
  • Dismissed because it does not support recent kernels right now

openMosix

openMosix

  • Project terminated on March 1, 2008

MOSIX

MOSIX

  • Not free or open source
  • Free only for researchers and students

LinuxPMI

LinuxPMI

  • Continuation of openMosix
  • Work on updating for newer kernels is ongoing
  • Dismissed because documentation is lacking, and it doesn't seem like the stuff for newer kernels is ready yet

PelicanHPC

PellicanHPC

  • A Knoppix-based LiveCD for creating clusters quickly
  • Dismissed because I want a full installation, not a LiveCD
  • Seems like an active project, and uses a recent kernel

Chromium

Chromium

  • Dismissed because it is specific to graphics and rendering, not general purpose computing

Sun Grid Engine

Grid Engine

  • Dismissed because it doesn't support moving normal processes around the cluster - all "jobs" are MPI based, and must be explicitly started using a special tool
  • Jobs can be moved around, and there is load balancing
  • There are features to move jobs off of systems that are actively being used as workstations

SLURM

Simple Linux Utility for Resource Management

  • Similar looking to Sun Grid Engine

OpenNebula

OpenNebula

  • A virtual machine based "cloud computing" system
  • Dismissed because there doesn't seem to be a way to have a single VM use processors from more than one physical system simultaneously. (Meaning I would have a bunch of single-CPU virtual machines with no parallel computation features)
  • Supports expanding resource usage to the Amazon Cloud

Eucalyptus

Eucalyptus

  • Similar concept to OpenNebula
  • More commercialized

Ubuntu Cloud

Ubuntu Cloud

  • Based on Eucalyptus
  • Fairly commercialized product, minimal documentation

Scyld

Scyld

  • Beowulf style cluster
  • Not free or open source

Implementation Plan

Stage 1 - Proof of Concept (Started 14-Nov-2009, finished 22-Nov-2009)
This stage will use minimal hardware (1 storage node, 2 cluster nodes, 10/100 networking) to demonstrate that the solution can meet the requirements set forward. An installation procedure and power up/down procedures will also be produced during this stage.
Stage 2 - Basic Implementation (Started 22-Nov-2009)
This stage will use expanded hardware (maximum number of cluster nodes available - probably about 6, Gigabit networking) to demonstrate the full potential of the cluster. This stage will start with the installation used in the Proof of Concept stage. The setup created here will be expanded into the final implementation. Procedures for managing processes throughout the cluster will be investigated and developed. A benchmark procedure will be investigated, developed, and demonstrated.
Stage 3 - Expanded Implementation
This stage will build on the implementation from the Basic Implementation stage - no new computing hardware will be added. Virtual machine host installation and usage procedures will be investigated, developed, and demonstrated. The graphical environment will be installed and tested. Multi-monitor support will be added. Scripts will be created to automatically power up all nodes, to initialize the cluster, to shutdown all nodes, and for other tasks.
Stage 4 - Performance Evaluation
Benchmark tests will be run to test the system's capability. If problems are found, they will be addressed and the tests will be run again. The affected procedures will be updated accordingly.

Procedures

Setup Overview and Network Configuration

There will be one storage node and an arbitrary number of cluster nodes. The cluster nodes will reside on their own private network, managed by the storage node.

Storage node
The storage node will not run a Kerrighed kernel. Its responsibilities are only to provide the nodes with network addresses, the PXE boot image, and the shared file system. This system will have 2 network cards - one connected to the external network, the other to the cluster network. DHCP and PXE services will only be provided to the cluster network. For my setup, the storage node will be named "frank". Its IP address on the external network will be assigned by DHCP somewhere in the 192.168.80.0/24 subnet. The external network will be connected to the eth0 network adapter. The storage node's IP address on the cluster network will be 192.168.81.1. The cluster network will be connected to the eth1 network adapter.
Cluster node
Each cluster node must be capable of PXE booting. They should all be able to run x86/i386 operating systems, since that's what the storage node will be providing.
External Network
The external network must provide internet access, to make the install process go more quickly and easily. The storage node will be the only system in the cluster that is connected directly to the external network switch. For my setup, the external network will use the 192.168.80.0/24 subnet.
Cluster Network
All cluster nodes will be connected to a private cluster network. The storage node will provide IP addresses to this network using DHCP. The storage node will act as a gateway to the external network for the cluster nodes. For my setup, the cluster network will use the 192.168.81.0/24 subnet.

Installation and Configuration

There is quite a bit of assumed knowledge here. Read all the directions before you start, and make sure you understand what is going on.

Most of this procedure happens on the storage node, as root. The only exception is when you use SSH to connect to a cluster node - even still, you will probably be at the keyboard of the storage node.

Operating System for the Storage Node

  1. Install a Linux distribution - this step takes a while... Have a sandwich ready.
    • I've used Debian Lenny, typical install (first CD only)
    • It doesn't technically matter what we use, but these directions are tested with Lenny, and use Debian packages
    • The cluster nodes will run Debian, regardless of what you pick to run the storage node with.
  2. Configure the network by making /etc/network/interfaces look something like this:
# The loopback network interface
auto lo
iface lo inet loopback

# The external network interface
allow-hotplug eth0
iface eth0 inet dhcp

# The cluster network interface
auto eth1
iface eth1 inet static
	address 192.168.81.1
	netmask 255.255.255.0

Get Software for the Storage Node

  1. Get all the packages we need by running the command below.
    • DHCP Server to provide IP addresses and trigger PXE booting on the cluster network
    • TFTP (trivial FTP) to provide the boot images to the cluster nodes
    • NFS (Network File System) to provide the shared file system, which will be used by all cluster nodes (uses portmap for RPC calls)
    • syslinux, a bootloader for our cluster nodes
    • debootstrap, a program to make a Linux installation when provided a web URL
apt-get install dhcp3-server tftpd-hpa portmap syslinux nfs-kernel-server nfs-common debootstrap

Configure TFTP Server

  1. Make /etc/default/tftpd-hpa look like this (if it doesn't already):
RUN_DAEMON="yes"
OPTIONS="-l -s /var/lib/tftpboot"
  1. Make /etc/inetd.conf contain this line (if it doesn't already):
    • tftp           dgram   udp     wait    root  /usr/sbin/in.tftpd /usr/sbin/in.tftpd -s /var/lib/tftpboot
  2. Copy the PXE bootloader to the TFTP directory by running:
    • cp /usr/lib/syslinux/pxelinux.0 /var/lib/tftpboot
  3. Make the configuration directory:
    • mkdir /var/lib/tftpboot/pxelinux.cfg
  4. Set up a default configuration
    • This will be what is used if no other configurations apply - since we're not making any, this will be the case all the time
    • Make /var/lib/tftpboot/pxelinux.cfg/default look like this (substituting in your storage node's IP, of course):
LABEL linux
KERNEL vmlinuz-2.6.20-krg
APPEND console=tty1 root=/dev/nfs nfsroot=192.168.81.1:/nfsroot/kerrighed ip=dhcp rw session_id=1

Configure as a Gateway

Install the Network Gateway script, per instructions

Configure DHCP Server

  1. Set the interface that the DHCP server will run on - we don't want to conflict with DHCP on the external network...
    • Make /etc/default/dhcp3-server look something like this (substituting your external interface):
    • INTERFACES="eth1"
  2. Configure the DHCP server to provide network configuration and boot images
    • Make /etc/dhcp3/dhcpd.conf look like this:
option domain-name-servers 192.168.80.1; # change this to your DNS server
default-lease-time 86400;
max-lease-time 604800;
authoritative;

subnet 192.168.81.0 netmask 255.255.255.0 { # change this to whatever subnet you like
        range 192.168.81.101 192.168.81.200; # change this to whatever range you like
        filename "pxelinux.0"; # we copied this into the TFTP server earlier
        next-server 192.168.81.1;
        option subnet-mask 255.255.255.0;
        option broadcast-address 192.168.81.255;
        option routers 192.168.81.1; # This means all the cluster nodes will use the storage node as a gateway
}

# if you want to set specific IPs for certain machines, uncomment and modify this to you needs:
#host node1 {
#        fixed-address 192.168.81.101; # pick an address
#        hardware ethernet FF:FF:FF:FF:FF:FF; # put the machine's MAC address here
#}

Configure NFS

  1. Make a place for the shared file system
    • mkdir /nfsroot/kerrighed
  2. Add the following line to /etc/exports (substitute your storage node's IP)
    • /nfsroot/kerrighed 192.168.81.0/255.255.255.0(rw,no_subtree_check,async,no_root_squash)
  3. Export the filesystem
    • exportfs -avr

Operating System for Cluster Nodes

  1. Make a Debian installation for the cluster nodes
    • debootstrap –arch i386 lenny /nfsroot/kerrighed http://ftp.us.debian.org/debian
  2. Copy the apt sources list into the new system so we can install new stuff on cluster nodes
    • cp /etc/apt/sources.list /nfsroot/kerrighed/etc/apt/sources.list
  3. chroot into the new system
    • chroot /nfsroot/kerrighed
  4. Set the root password
    • passwd
  5. Get a /proc directory
    • mount -t proc none /proc
  6. Avoid lots of annoying warnings
    • export LC_ALL=C
  7. Get all the stuff we'll need to talk to the storage node and let people use the cluster nodes
    • apt-get update && apt-get install dhcp3-common nfs-common nfsbooted openssh-server
  8. Make the "localhost" hostname work by putting the following into /etc/hosts
    • 127.0.0.1 localhost
  9. Have the filesystem mount automatically for us
    • ln -sf /etc/network/if-up.d/mountnfs /etc/rcS.d/S34mountnfs
  10. Add a user so we can log in to the cluster nodes
    • adduser
  11. Configure the network by making /etc/network/interfaces look like this:
auto lo
iface lo inet loopback
iface eth0 inet dhcp

Compile the Kerrighed Kernel

(Still chroot'ed into the cluster node image)

  1. Get the packages that we'll need to build the Kerrighed kernel
    • apt-get install automake autoconf libtool pkg-config gawk rsync bzip2 libncurses5 libncurses5-dev wget lsb-release xmlto patchutils xutils-dev build-essential
  2. Get in the source directory
    • cd /usr/src
  3. Get the Kerrighed source, decompress it, and make it easier to find later
    • wget http://gforge.inria.fr/frs/download.php/23356/kerrighed-2.4.1.tar.gz
    • gzip -dc kerrighed-2.4.1.tar.gz | tar xf -
    • ln -s kerrighed-2.4.1/ kerrighed
  4. Get the Linux 2.6.20 source (this is the ONLY version Kerrighed 2.4.1 works with), and extract it
    • wget -O /usr/src/linux-2.6.20.tar.bz2 http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.20.tar.bz2
    • tar jxf linux-2.6.20.tar.bz2
  5. Get into the Kerrighed source
    • cd kerrighed
  6. Run the configure script to configure things correctly
    • ./configure
  7. Get into the kernel directory, and run defconfig
    • cd kernel
    • make defconfig
  8. Run menuconfig to set all the options that we need
    • make menuconfig
  9. While in menuconfig, do the following:
    • Make sure any network cards that you need to support are installed in the kernel - NOT as modules. You can find this under Device drivers --> Network device support
    • Make sure NFS filesystem support is compiled into the kernel - NOT as a module. You can find this under File Systems --> Network File Systems
    • Make sure Loadable module support --> Automatic kernel module loading is activated
    • Exit menuconfig
  10. Go to the Kerrighed directory, and make the kernel
    • cd ..
    • make kernel
  11. Assuming there were no errors, it's time for the big one. You'll have time for a slice of pie...
    • make
  12. Once that's done, do the wrapup work to install everything in the right places
    • make kernel-install
    • make install
    • ldconfig
  13. Exit the chroot
    • exit

Configure Kerrighed

  1. Make the Kerrighed kernel avaliable via TFTP
    • cp /nfsroot/kerrighed/boot/vmlinuz-2.6.20-krg /var/lib/tftpboot/
  2. Make a place that we can mount configfs
    • mkdir /config
  3. Mount the configfs by adding the following line to /etc/fstab (configfs is used by the Kerrighed scheduler)
    • configfs /config configfs defaults 0 0
  4. Make sure the following line appears in /nfsroot/kerrighed/etc/default/kerrighed
    • ENABLE=true
  5. Make /nfsroot/kerrighed/etc/kerrighed_nodes look like this:
session=1 #Value can be 1 – 254
nbmin=2 #2 nodes starting up with the Kerrighed kernel.
192.168.81.101:1:eth0
192.168.81.102:2:eth0

Power Up

  1. If you just finished the installation procedure, either reboot, or set up the network and start all the servers - like this:
    • ifconfig eth1 192.168.81.1
    • /etc/network/if-up.d/gateway
    • /etc/init.d/tftpd-hpa start
    • /etc/init.d/dhcp3-server start
    • /etc/init.d/portmap start
    • /etc/init.d/nfs-kernel-server start
  2. Enable PXE boot on each cluster node (usually a BIOS setting)
  3. Make sure the nodes are connected to the correct switch
  4. Turn on the nodes - they should boot

Initialize Cluster

  1. Get a shell connection to a node
    • ssh {user you made earlier}@192.168.81.101
  2. Do all this to start up the cluster, and allow processes launched in this session to migrate around
krgadm cluster start
krgcapset -d +CAN_MIGRATE
krgcapset -k $$ -d +CAN_MIGRATE
krgcapset -d +USE_REMOTE_MEMORY

Power Down

It's pretty simple...

krgadm cluster poweroff

That will shut down all the nodes. Shut down the storage node in the normal way (if you want).

Performance Benchmark

Hardinfo

FLOPS

http://linux.maruhn.com/sec/flops.html

Linpack

Used by TOP500 http://www.top500.org/project/linpack http://www.netlib.org/lapack/ http://www.netlib.org/benchmark/linpackjava/

HPC Challenge Benchmark

http://icl.cs.utk.edu/hpcc/

dhrystone

whetstone

Kerrighed test directory

http://lxr.kerlabs.com/kerrighed/source/tests/benchmark/?v=devel-kdfs

BogoMIPS

Script Installation

Network Gateway

Here's a script to make the storage node act as a network gateway so the cluster nodes can have access to the external network and beyond. The script I suggest here is very restrictive - you may want to allow more incoming connections.

Save this on the storage node in /etc/network/if-up.d/gateway (I didn't write this script[1], I just swapped the interfaces)

#!/bin/sh

PATH=/usr/sbin:/sbin:/bin:/usr/bin

#
# delete all existing rules.
#
iptables -F
iptables -t nat -F
iptables -t mangle -F
iptables -X

# Always accept loopback traffic
iptables -A INPUT -i lo -j ACCEPT


# Allow established connections, and those not coming from the outside
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -m state --state NEW -i ! eth0 -j ACCEPT
iptables -A FORWARD -i eth0 -o eth1 -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow outgoing connections from the LAN side.
iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT

# Masquerade.
iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE

# Don't forward from the outside to the inside.
iptables -A FORWARD -i eth0 -o eth0 -j REJECT

# Enable routing.
echo 1 > /proc/sys/net/ipv4/ip_forward

And allow execution, like so:

chmod +x /etc/network/if-up.d/gateway

Next time the network comes up, this script will run. You can run it manually to apply the changes immediately.

VMWare Installation

  1. Log in to one of the nodes as root
  2. Download VMWare Server
  3. Follow the normal installation procedure
  4. Now a server instance will run on each node - we need to make it run on only one node...

Journal

November 2009

  • Started this page, outlined the plans
  • Wrote up analysis of cluster platforms
  • First cluster nodes PXE booted! Nov 17
  • Started to add information about benchmarking software packages
  • Wrote installation, configuration, startup, and shutdown guides

References

  1. http://www.debian-administration.org/articles/23