Software raid: setup, management, and administration

Below is a full, detailed, production-ready guide for Software RAID Setup, Management, Monitoring, and Drive-Failure Recovery.

This applies to most Linux server environments including cPanel/WHM, AlmaLinux/Rocky, Ubuntu/Debian, and generic VPS/dedicated servers—and uses mdadm, the standard Linux software RAID manager.

Table of Contents

⭐ Software RAID Setup, Management & Administration

(with Full Failure-Recovery Procedures)

🔍 Introduction to Software RAID

Software RAID uses the OS kernel (via mdadm) to create and manage arrays of multiple disks for redundancy, performance, or both.

Common RAID Levels

Level	Min Disks	Purpose	Fault Tolerance	Notes
RAID 0	2	Stripe	0	Performance only; not recommended for production
RAID 1	2	Mirror	1 disk	Most common for servers (OS partitions)
RAID 5	3	Stripe + parity	1 disk	Good compromise; slow rebuild; not recommended on large disks
RAID 6	4	Stripe + dual parity	2 disks	Better for large disks
RAID 10	4	Striped Mirrors	1+	Best performance + redundancy

mdadm software RAID is extremely common on WHM/cPanel dedicated servers and Linux VPS.

🧰 Install Required Tools

RHEL / AlmaLinux / Rocky

sudo dnf install mdadm smartmontools -y

Debian / Ubuntu

sudo apt install mdadm smartmontools -y

🛠️ Create a New RAID Array

Identify the Disks
```
lsblk
fdisk -l
```
Assume disks /dev/sdb and /dev/sdc.

Prepare Disks (create partitions)

Use GPT for modern layouts:

parted /dev/sdb mklabel gpt
parted /dev/sdb mkpart primary 0% 100%
parted /dev/sdc mklabel gpt
parted /dev/sdc mkpart primary 0% 100%

Mark partitions as RAID:

sudo parted /dev/sdb set 1 raid on
sudo parted /dev/sdc set 1 raid on

Partitions become:

/dev/sdb1
/dev/sdc1

🧱 Create RAID Arrays

RAID 1 (Mirror)

mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1

RAID 5 Example

mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sd[bcd]1

📦 Add Filesystem & Mount

mkfs.ext4 /dev/md0
mkdir /mnt/raid
mount /dev/md0 /mnt/raid

🔖 Persist RAID Assembly

Write array definition:

mdadm --detail --scan >> /etc/mdadm.conf

Or on Debian-based:

mdadm --detail --scan >> /etc/mdadm/mdadm.conf
update-initramfs -u

📊 Monitoring RAID Health
1. Check RAID Status
```
cat /proc/mdstat
```
  Sample output:
```
md0 : active raid1 sdb1[0] sdc1[1]
      976630336 blocks [2/2] [UU]
```
  - UU = both disks healthy
  - _U = left disk failed
  - U_ = right disk failed
2. Detailed View
```
mdadm --detail /dev/md0
```
🚨 Replacing a Failed Drive (RAID1/5/6/10)

This is the most important part for production systems.

Symptoms of Failed Disk
- cat /proc/mdstat shows _U or U_
- Server logs show I/O errors
- SMART failures: smartctl -a /dev/sdX

🧹 Step-by-Step Drive Failure Recovery

Assume:

Array: /dev/md0
Bad disk: /dev/sdb1
Replacement disk: /dev/sdd

Identify the Faulty Drive

mdadm --detail /dev/md0

You’ll see something like:

Number  Major  Minor  RaidDevice State
   0     8       17        0      faulty   /dev/sdb1
   1     8       33        1      active   /dev/sdc1

Mark Drive as Failed
```
mdadm --fail /dev/md0 /dev/sdb1
```
Remove the Failed Drive
```
mdadm --remove /dev/md0 /dev/sdb1
```

Prepare the New Drive

If whole disk:

parted /dev/sdd mklabel gpt
parted /dev/sdd mkpart primary 0% 100%
parted /dev/sdd set 1 raid on

Add New Drive to the Array
```
mdadm --add /dev/md0 /dev/sdd1
```
Rebuild begins automatically.

Monitor progress:
```
watch cat /proc/mdstat
```

🔄 Rebuilding the Array

Expected output during rebuild:

[>....................]  rebuild = 5.3% (103424/1953512448) finish=120.5min speed=26000K/sec

🧾 Clone Partition Table Automatically (Optional Best Practice)

If your drives must match exactly:
```
sfdisk -d /dev/sdc | sfdisk /dev/sdd
```
Then add the partition:
```
mdadm --add /dev/md0 /dev/sdd1
```
⚡ Hot Spare Setup (Automatic Recovery)

Add a spare disk:
```
mdadm --add /dev/md0 /dev/sde1
```
Verify:
```
Spare Devices : 1
```
If a disk fails, mdadm automatically pulls in the spare.
🛡️ SMART Monitoring

Schedule SMART tests:

Create /etc/cron.weekly/smartcheck:
```
#!/bin/bash
smartctl -t short /dev/sda
smartctl -t short /dev/sdb
```
🔐 Email Alerts for RAID Failure

Install mdadm mail alerts:

Edit /etc/mdadm.conf:
```
MAILADDR admin@yourdomain.com
```
Restart:
```
systemctl restart mdmonitor
```
🩺 Advanced Diagnostics
- Check current RAID bitmap (helps fast rebuild)
```
mdadm --detail /dev/md0 | grep -i bitmap
```
- Verify stripes (RAID5/6)
```
echo check > /sys/block/md0/md/sync_action
```
💣 Troubleshooting Scenarios
1. Scenario A: RAID shows “degraded” even after rebuild
  
  Force re-add disk:
```
mdadm --add /dev/md0 /dev/sdd1 --force
```
2. Scenario B: md0 will not assemble on boot
```
mdadm --assemble --scan
```
3. Scenario C: Accidentally removed the wrong disk
  
  Re-add it:
```
mdadm --add /dev/md0 /dev/sdb1
```
4. Scenario D: Superblock errors
  
  Zero superblock before reuse:
```
mdadm --zero-superblock /dev/sdd1
```

📦 Backup mdadm metadata (critical!)

Save RAID definition:

mdadm --detail --scan > /root/mdadm.backup

Save disk partition tables:

sfdisk -d /dev/sda > /root/sda.part
sfdisk -d /dev/sdb > /root/sdb.part

🧹 Full Cleanup Commands (Destroy RAID)

umount /mnt/raid
mdadm --stop /dev/md0
mdadm --remove /dev/md0
mdadm --zero-superblock /dev/sd[bcd]1

📘 Summary: Best Practices for Software RAID Administration

✔ Always use RAID 1 or RAID 10 for critical servers
✔ Keep at least one hot spare on RAID 5/6/10
✔ Enable email alerts
✔ Monitor smartctl logs weekly
✔ Run periodic RAID checks
✔ Save /etc/mdadm.conf after any modification
✔ Use identical disks whenever possible
✔ Keep a replacement drive on hand

Conclusion

You now know about software RAID setup, management, and administration.

Add as Preferred Source on Google

⭐ Software RAID Setup, Management & Administration

(with Full Failure-Recovery Procedures)

🔍 Introduction to Software RAID

Common RAID Levels

🧰 Install Required Tools

RHEL / AlmaLinux / Rocky

Debian / Ubuntu

🛠️ Create a New RAID Array

Identify the Disks

Prepare Disks (create partitions)

🧱 Create RAID Arrays

RAID 1 (Mirror)

RAID 5 Example

📦 Add Filesystem & Mount

🔖 Persist RAID Assembly

📊 Monitoring RAID Health

Check RAID Status

Detailed View

🚨 Replacing a Failed Drive (RAID1/5/6/10)

Symptoms of Failed Disk

🧹 Step-by-Step Drive Failure Recovery

Identify the Faulty Drive

Mark Drive as Failed

Remove the Failed Drive

Prepare the New Drive

Add New Drive to the Array

🔄 Rebuilding the Array

🧾 Clone Partition Table Automatically (Optional Best Practice)

⚡ Hot Spare Setup (Automatic Recovery)

🛡️ SMART Monitoring

🔐 Email Alerts for RAID Failure

🩺 Advanced Diagnostics

Check current RAID bitmap (helps fast rebuild)

Verify stripes (RAID5/6)

💣 Troubleshooting Scenarios

Scenario A: RAID shows “degraded” even after rebuild

Scenario B: md0 will not assemble on boot

Scenario C: Accidentally removed the wrong disk

Scenario D: Superblock errors

📦 Backup mdadm metadata (critical!)

🧹 Full Cleanup Commands (Destroy RAID)

📘 Summary: Best Practices for Software RAID Administration

Conclusion

Related Posts

Editorial Staff

Services

Categories

Recent Posts

Save on VPS Hosting!

We’re Social!

VPS Hosting Blog | Dedicated Servers | Reseller Hosting