Supercomputer

Purpose

I have a bunch of older machines. There's nothing wrong with them, and they're not really all that old. I have a need for a multi-CPU system for some development work and for running Virtual Machines. Cluster computing and parallel processing have always interested me, so why not build a supercomputer in my basement?

Choosing A Cluster Platform

There are quite a few open source cluster platforms. My requirements include:

Must be free (as in no monetary cost), preferably Open Source
Must not require any special compilers, procedures, or programming languages - I want to run programs and have them execute somewhere in the cloud automatically, and transparently.
Must support a graphical environment - I want to have a multi-monitor setup (starting with 2, maybe up to 6 at a later date) to fit all the programs I'm running in parallel and to impress the less technically savvy.
Preferably Linux based - since I know it, and it is no monetary cost
Preferably supports running virtual machines that can use cores from multiple cluster nodes simultaneously, or at least allows some way to specify that the virtual machine's process should consume all available resources on one cluster node
Preferably using a recent kernel, if kernel patches are required
Preferably supported by a reasonably active community, if open source

NOTE: The analysis below was undertaken in mid-November 2009. Things change over time. Do not let my findings discourage you from looking into the systems I have not used. Most of these are really neat, but do not meet my specific requirements for one reason or another.

Kerrighed

Kerrighed will be used for this project. Reasons why others were dismissed are discussed in each section.

Kerrighed is a Single Server Image (SSI) system, meaning that all the nodes share the same file system. Kerrighed combines process management date from all nodes, which means we can use normal tools (ps, top) to see everything that is happening on the cluster as if it were all one one single machine. Process IDs are unique throughout the cluster, all memory is shared, all processors are visible from all nodes. Processes can migrate, or move, to other nodes automatically to balance the load.

How it measures up against the requirements:

It's free and open source
Normal processes are migrated - there is no need for special compilers or toolkits
There does not seem to be anything in the way of running a graphical environment, but it may be difficult to set up only one or two nodes with high-powered graphics cards - this requires further investigation
It's Linux based - Kerrighed is a set of kernel patches and modules with some supporting tools
It seems like it's possible to run virtual machines, assuming the virtual machine server can be compiled on the same kernel version as Kerrighed. I'm not sure if it is possible to span a single VM across multiple physical systems yet - this requires further research and experimentation. Even if I can't, so long as VMs can be run, that is acceptable.
Currently using kernel 2.6.20, which is fine
Updates as recent as last month

OpenSSI

Latest Release: August 2006
Dismissed because it does not support recent kernels right now

openMosix

Project terminated on March 1, 2008

MOSIX

Not free or open source
Free only for researchers and students

LinuxPMI

Continuation of openMosix
Work on updating for newer kernels is ongoing
Dismissed because documentation is lacking, and it doesn't seem like the stuff for newer kernels is ready yet

PelicanHPC

PellicanHPC

A Knoppix-based LiveCD for creating clusters quickly
Dismissed because I want a full installation, not a LiveCD
Seems like an active project, and uses a recent kernel

Chromium

Dismissed because it is specific to graphics and rendering, not general purpose computing

Sun Grid Engine

Grid Engine

Dismissed because it doesn't support moving normal processes around the cluster - all "jobs" are MPI based, and must be explicitly started using a special tool
Jobs can be moved around, and there is load balancing
There are features to move jobs off of systems that are actively being used as workstations

SLURM

Simple Linux Utility for Resource Management

Similar looking to Sun Grid Engine

OpenNebula

A virtual machine cluster management system
Dismissed because there doesn't seem to be a way to have a single VM use processors from more than one physical system simultaneously. (Meaning I would have a bunch of single-CPU virtual machines with no parallel computation features)

Scyld

Not free or open source

Implementation Plan

Stage 1 - Proof of Concept: This stage will use minimal hardware (1 storage server, 2 cluster nodes, 10/100 networking) to demonstrate that the solution can meet the requirements set forward. An installation procedure and power up/down procedures will also be produced during this stage.

Stage 2 - Basic Implementation: This stage will use expanded hardware (maximum number of cluster nodes available - probably about 6, Gigabit networking) to demonstrate the full potential of the cluster. This stage will start with a fresh install, using the procedure created in the Proof-of-concept. The setup created here will be expanded into the final implementation. Procedures for managing processes throughout the cluster will be investigated and developed. A benchmark procedure will be investigated, developed, and demonstrated.

Stage 3 - Expanded Implementation: This stage will build on the implementation from Stage 2 - no new computing hardware will be added. The graphical environment will be installed and tested. Multi-monitor support will be added. Scripts will be created to automatically power up all nodes, to initialize the cluster, to shutdown all nodes, and for other tasks.

Stage 4 - Performance Evaluation: Benchmark tests will be run to test the system's capability. If problems are found, they will be addressed and the tests will be run again. The affected procedures will be updated accordingly.

Supercomputer

Contents

Purpose

Choosing A Cluster Platform

Kerrighed

OpenSSI

openMosix

MOSIX

LinuxPMI

PelicanHPC

Chromium

Sun Grid Engine

SLURM

OpenNebula

Scyld

Implementation Plan

Procedures

Installation and Configuration

Power Up

Initialize Cluster

Power Down

Performance Benchmark

Script Installation

Journal

November 2009

References

Navigation menu

Views

Personal tools

jimbowiki navigation

jimbodude.net navigation

Search

Tools