Software RAID on FC6
The document details the steps I had to take to build a software RAID array using mdadm. The document was originally written for Fedora Core 6, but has been updated to cover a wider range of distributions. The process was very tedious and I could not find any one document that gave me enough information to complete the project. I will include all parts of the process from selecting hardware and cabling to building the array and setting it to mount automatically. I will assume the reader has a good knowledge of computer hardware and knows at least a little about SCSI and how to run Linux operating systems. For more information about RAID levels see this. Feel free to e-mail me with specific questions. |
NOTE: For this document (and others on this site) Linux commands starting with "$" can be run by any non-root user, those beginning with "#" must be run as root.
Contents
- 1 Creating the Array
- 1.1 Selecting hard drives, controllers, and cabling
- 1.2 Cabling Up and Powering On
- 1.3 Getting the Operating System to Recognize the New Drives
- 1.4 Formatting the Drives Properly
- 1.5 Building the Array
- 1.6 Set Array to be Available On Boot
- 1.7 Create Filesystem on Array
- 1.8 Set up mdadm.conf
- 1.9 Test mount
- 1.10 Add a Line to fstab
- 2 Managing the array
Creating the Array
Selecting hard drives, controllers, and cabling
You can use IDE drives and still follow these instructions perfectly. In fact, it would be much easier if you did use IDE, but SCSI is more reliable, faster, and just plain cooler. If you're interested in going IDE, I'll assume you know how to select, connect, and cable those.
Now, when selecting SCSI drives, go for whatever rotational speed and size spec you feel you need. If you're planning on going for RAID 5, you will need at least (3) drives. I'll discuss RAID types later. Once you select the size and speeds of your drives, it's time to select a connector type. Your three choices are 50-pin, 68-pin, or 80-pin (SCA). I would recommend the 68 or 80-pin configuration for both speed and simplicity. I am using (3) Seagate 181.8GB 7200RPM 80-pin devices.
To run your drives you will need a SCSI controller. Generally you can get a simple PCI controller, unless you have special requirements. I bought a very simple Startech.com PCI controller. It is obviously useless to get a controller with onboard RAID, as we are going to use software for the RAID array - so save your money and get the cheap one. Also, most motherboards that have "onboard RAID" do not actually implement RAID using hardware. Instead, the Windows driver will handle most of the work - effectively making software RAID.
You will also need the proper cable to run your drives. I'd recommend you get a cable with a terminator to make controller configuration easier. Make sure there are enough or more than enough connectors for all your devices. For 50-pin devices, buy a 50-pin cable. For 68 and 80-pin devices, buy a 68-pin cable. 80-pin devices are meant to be put into a hot-swap tray, so you will also need to get an SCA adapter for each device. The SCA adapter provides all the information that a hot-swap rack would including the device ID. It also gives you the ability to hot swap drives after running a few commands. A great place to get cabling and SCA adapters is STSI
Cabling Up and Powering On
If you're using IDE drives, this doesn't apply to you - just plug them in and go. If you're using a PCI IDE adapter, you might want to check the adapter configuration screen to make sure all the drives were recognized. This information is usually output to the screen during boot, so you might not even have to go into the config.
For the SCSI users, the fun begins. Plug in all your drives, using the SCA adapters if you have 80-pin devices. You will need to set the device id on each drive. This is usually done with some jumpers or switches on the drive unless you're using SCA drives. In that case, you would normally configure them in the hot-swap rack by their position, but if you're like me and too cheap to get one of those, you will set it with a jumper on each SCA adapter. Remember that the controller usually uses an ID of 7, so don't set any drive to that ID. Don't forget to connect the power to each drive!
Once everything is properly cabled, boot up the system. You should see the SCSI adapter card initialize in the boot. Press whatever shortcut key gets you into the device configuration. Poke around in there to make sure all the drives are properly recognized. If they're not, check the jumper settings.
Getting the Operating System to Recognize the New Drives
If you are doing a new install of your favorite Linux distribution (i.e. you're not adding the array to an existing installation), the installation process should take care of the rest of the process for you. (I say should, because it didn't for me on my most recent install.) If that's the case, simply follow the on-screen instructions and you'll be all set, stop reading now, bye bye, have fun.
If you already have Linux installed, and the hardware isn't automatically detected on boot, you're in for a treat. First thing to do is get your SCSI controller driver running. In my case, the driver is "initio". You'll have to look around and figure out what yours is. Usually, you can guess it by looking at the manufacturer of the chip on the adapter, Google-ing around, or you can ask for help on a forum or IRC channel. Once you have figured out your driver and installed or compiled it, you should start it by running:
#/sbin/modprobe initio
replacing "initio" with your own driver name. This should automatically load all dependant modules for SCSI operation. Now, check to see if your drives were initialized by running:
#ls /dev/sd*
Hopefully, this will return an item for each of your new drives. If not, there was a problem, try again.
Adding the Driver Permenantly
Now it's time to add the driver to the kernel. There's different ways to do this depending on your distribution. Pick one of the following sections that fits your needs.
RPM Users
RPM distributions (Red Hat, Fedora Core, etc) do it by running:
#/sbin/new-kernel-pkg --mkinitrd --depmod --install `uname -r`
This command will update the currently running kernel with the new driver. If you use yum or rpm to upgrade your kernel, this change will apply to any newer kernel you install. If you don't believe me and want to see all the neat things that run when you install a new kernel run:
#rpm -q --scripts kernel
You might want to restart your system at this point to make sure that the proper driver has been installed in the kernel and that your devices are listed without calling modprobe, though it shouldn't make a difference.
APT Users
Sorry, haven't gotten here yet! Google for depmod and your distribution name.
Formatting the Drives Properly
Now this is where things get interesting, but not too complex. In order to make software RAID work, we need to make partitions on the devices before the array is built. You can use the bare drive block devices, but using partitions has some benefits and avoids some scary warnings. So, use parted if you like command lines or gpared if you like GUIs to create a partition on each device. You do not need to format the partitions to any file system. Blank partitions is what you need - the process of creating blank partitions is practically instantaneous.
If you want to create partitions that use the whole drive, and your drives are larger than 2TB you can do something like this to set your drive up to use GPT and create an empty partition (substituting "sda" for your drive, of course):
# parted -s -a optimal /dev/sda mklabel gpt -- mkpart primary 1 -1
Note that once a drive is configured for GPT, you cannot view information about it with fdisk - you must use parted (or gparted).
Building the Array
At this point, we are ready to build the array. This is fairly simple using mdadm, which is included in most distributions, including FC. The command to build the array looks like:
#/sbin/mdadm --create --level=5 --raid-devices=3 --spare-devices=0 --name=store /dev/md0 /dev/sd[abc]1
You will need to chage the level item to match your desired RAID level. See this if you need a description of RAID levels. If you want spare devices to be automatically controlled by mdadm, note how many devices should be spares. name is the simple name of the array which can be set to anything you like; it is optional. /dev/md0 is the array device; this cannot be changed unless you have already used md0. /dev/sd[abc]1 tells mdadm to use sda1, sdb1, and sdc1 to create the array. Once you run this command, the array will be created. Check the array status by running:
#/sbin/mdadm --detail /dev/md0
The array may be listed as damaged or recreating. This is normal. To create the array for the first time mdadm marks some drives as "bad" to force creation of the array. Monitor the status of the array and wait until it is done building before moving to the next step.
For more information about mdadm, see the man page.
Set Array to be Available On Boot
For older systems, in order to make the array available on boot, the "FD" flag must be set on each partition. Mdadm would detect the "FD" flags and initialize the array automatically. To set the "FD" flag simply run:
#/sbin/parted /dev/sda set 1 raid on
on each of your devices - substitute /dev/sda with your device.
You should probably not have to do this - follow the mdadm.conf instructions below instead. If that doesn't work, try setting FD as described here.
Create Filesystem on Array
Non-LVM
For partitions under 2TB:
#/sbin/mkfs -t ext4 /dev/md0
Should take care of that.
For partitions over 2TB:
# parted -s -a optimal /dev/md0 mklabel gpt -- mkpart primary ext4 1 -1
Substitute ext4 for whatever filesystem you want and md0 for your raid device in either case.
LVM
LVM has some pretty serious benefits, you should probably use it. Google around to learn about it and to get install instructions for your distro.
Make a partition on your raid device and set the "lvm" flag on.
# parted -s -a optimal /dev/md0 mklabel gpt -- mkpart primary 1 -1 -- set 1 lvm on # parted -s /dev/md0 set 1 lvm on
This creates a new device called /dev/md0p1. This device will become our LVM Physical Volume (PV) - to do that, run:
# pvcreate /dev/md0p1
You can now check out the work you did with:
# pvdisplay
Now we create a Volume Group (VG). This one will be a bit boring, since it only has one device right now:
# vgcreate vgsomename /dev/md0p1
Substitute whatever name you want for "vgsomename". It's nice to start all the names with vg so you remember they are VGs.
Now you can create as many Logical Volumes (LV) as you want. Why would you want more than one? You can grow and shrink them, and also move them around to other physical devices. More LVs provides more flexibility to do that sort of thing.
# lvcreate -L 100G -n lvsomename vgsomename
-L is the size. If you leave off the "G", the unit is megabytes. Also, read about the "-l"/"--extents" argument, which allows you to specify sizes in percentages of free space. -n is the name, customize it to what you like. Putting "lv" at the front of the name makes it obvious that we're talking about an LV. Substitute your VG name for vgsomename.
Now make file systems on each of your LVs:
# mkfs -t ext4 /dev/vgsomename/lvsomename
For the remainder of this page, you will see /dev/md0 referenced. You should use your LV path(s) instead wherever we are accessing data (mounting drives). Use md0 only when we're managing the raid array (adding / removing disks, checking rebuild status, etc).
Here is a pretty good LVM reference.
Set up mdadm.conf
We now need to set up the mdadm.conf file. This is not really required, but makes troubleshooting easier down the road and allows us to have the system send alerts via e-mail. The easiest way to do this is to start with the detail output, like this:
#/sbin/mdadm --detail --scan >> /etc/mdadm.conf
Edit the conf file with your favorite editor (in this case I will use vim):
#vim /etc/mdadm.conf
Review the syntax on the man page and correct the file. You can use "#" for comments. I also recommend that you define a "MAILADDR" for mdadm to send mail to in the case of a failure. My mdadm.conf looks something like:
DEVICE /dev/sd[abc]1 # /dev/md0 is known by it's UID. ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371 MAILADDR root@somedomain.com
Test mount
Now the array is completely configured and will be setup on boot without issue. Time to mount it:
#mount -t ext3 /dev/md0 /store
Substitute whatever filesystem you chose for "ext3", whatever your mdadm device is for "/dev/md0" (if this is your only mdadm array, it will be "/dev/md0"), and wherever you want to mount to for "/store". Assuming that goes well, you can start using your array for storage.
Add a Line to fstab
Now we should make the array mount on boot so you don't have to run mount every time you want to use the array.
I recommend that you reboot and re-test the mount before adding an entry to fstab - I had a problem where the array didn't initialize so when fstab was run, I was forced into a file recovery mode, which was unpleasant at best. Once you can prove that the array properly initialized on boot, it is safe to add an entry in fstab.
Once you take care of that, time to add to fstab:
#vim /etc/fstab
The line that needs to be added looks like:
/dev/md0 /store ext4 defaults 0 0
Substitute your mdadm device for "/dev/md0", where you want it mounted for "/store", and the file system you chose for "ext3". The rest should stay the same unless you know what you're doing.
If you want a bit more certainty that things will be mounted properly, you can use the UUID instead of the block device name. This will make the mount happen even if the array comes up as a different name, which can happen if mdadm.conf is not loaded, or you didn't create one. This is also helpful for LVM volumes that you might rename.
To get the UUID of your device"
# blkid /dev/md0
Replace the fstab entry described above with something more like this:
UUID=00000000-0000-0000-0000-000000000000 /store ext4 defaults 0 0
Replacing that UUID with what you found from blkid, and the rest as described above. </pre>
That's it! Should work.
Managing the array
This section deals with things you do after the array is set up.
Dealing with Failed Devices
Oh no! A drive is dead! That's ok, we've got RAID!
For this section, replace "/dev/md0" with your array device, "/dev/sda1" with your failed device, and "/dev/sdd1" with your replacement device. The syntax of your version of mdadm may be slightly different - run "man mdadm" to get a detailed description of mdadm usage if any of this complains about bad arguments.
Check the array to see what's going on:
#/sbin/mdadm --detail /dev/md0
Remove the failed device from the array:
#/sbin/mdadm --manage /dev/md0 --fail /dev/sda1 #/sbin/mdadm --manage /dev/md0 --remove /dev/sda1
Plug in the new drive and get it ready. You'll need to partition it - see the setup directions for more information. Make sure the replacement partition is at least as big as the failed partition or the rebuild will fail. Once that's done, add the new device to the array:
#/sbin/mdadm --manage /dev/md0 --add /dev/sdd1
Migrating the Array to a New System or Installation
Look into the "--assemble" option. You may have to use "--add" if only some drives come up. It's pretty simple to use - just tell it the right md device and what partitions to assemble into it. For example:
#/sbin/mdadm --assemble /dev/md0 /dev/sd[abc]1
Adding Spares
SCSI Hot-Swap
This page is incomplete
More work needs to be done on this page, so if something is missing, don't be surprised
1 Managing Array Section is basic
2 Missing directions for APT users