
Below is a full, detailed, production-ready guide for Software RAID Setup, Management, Monitoring, and Drive-Failure Recovery.
This applies to most Linux server environments including cPanel/WHM, AlmaLinux/Rocky, Ubuntu/Debian, and generic VPS/dedicated servers—and uses mdadm, the standard Linux software RAID manager.
⭐ Software RAID Setup, Management & Administration
(with Full Failure-Recovery Procedures)
-
🔍 Introduction to Software RAID
Software RAID uses the OS kernel (via mdadm) to create and manage arrays of multiple disks for redundancy, performance, or both.
Common RAID Levels
Level Min Disks Purpose Fault Tolerance Notes RAID 0 2 Stripe 0 Performance only; not recommended for production RAID 1 2 Mirror 1 disk Most common for servers (OS partitions) RAID 5 3 Stripe + parity 1 disk Good compromise; slow rebuild; not recommended on large disks RAID 6 4 Stripe + dual parity 2 disks Better for large disks RAID 10 4 Striped Mirrors 1+ Best performance + redundancy mdadm software RAID is extremely common on WHM/cPanel dedicated servers and Linux VPS.
-
🧰 Install Required Tools
-
RHEL / AlmaLinux / Rocky
sudo dnf install mdadm smartmontools -y
-
Debian / Ubuntu
sudo apt install mdadm smartmontools -y
-
-
🛠️ Create a New RAID Array
-
Identify the Disks
lsblk fdisk -l
Assume disks
/dev/sdband/dev/sdc. -
Prepare Disks (create partitions)
Use GPT for modern layouts:
parted /dev/sdb mklabel gpt parted /dev/sdb mkpart primary 0% 100% parted /dev/sdc mklabel gpt parted /dev/sdc mkpart primary 0% 100%
Mark partitions as RAID:
sudo parted /dev/sdb set 1 raid on sudo parted /dev/sdc set 1 raid on
Partitions become:
/dev/sdb1 /dev/sdc1
-
-
🧱 Create RAID Arrays
-
RAID 1 (Mirror)
mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1
-
RAID 5 Example
mdadm --create /dev/md0 --level=5 --raid-devices=3 /dev/sd[bcd]1
-
-
📦 Add Filesystem & Mount
mkfs.ext4 /dev/md0 mkdir /mnt/raid mount /dev/md0 /mnt/raid
-
🔖 Persist RAID Assembly
Write array definition:
mdadm --detail --scan >> /etc/mdadm.conf
Or on Debian-based:
mdadm --detail --scan >> /etc/mdadm/mdadm.conf update-initramfs -u
-
📊 Monitoring RAID Health
-
Check RAID Status
cat /proc/mdstat
Sample output:
md0 : active raid1 sdb1[0] sdc1[1] 976630336 blocks [2/2] [UU]UU= both disks healthy_U= left disk failedU_= right disk failed
-
Detailed View
mdadm --detail /dev/md0
-
-
🚨 Replacing a Failed Drive (RAID1/5/6/10)
This is the most important part for production systems.
Symptoms of Failed Disk
cat /proc/mdstatshows_UorU_- Server logs show I/O errors
- SMART failures:
smartctl -a /dev/sdX
-
🧹 Step-by-Step Drive Failure Recovery
Assume:
- Array:
/dev/md0 - Bad disk:
/dev/sdb1 - Replacement disk:
/dev/sdd
-
Identify the Faulty Drive
mdadm --detail /dev/md0
You’ll see something like:
Number Major Minor RaidDevice State 0 8 17 0 faulty /dev/sdb1 1 8 33 1 active /dev/sdc1
-
Mark Drive as Failed
mdadm --fail /dev/md0 /dev/sdb1
-
Remove the Failed Drive
mdadm --remove /dev/md0 /dev/sdb1
-
Prepare the New Drive
If whole disk:
parted /dev/sdd mklabel gpt parted /dev/sdd mkpart primary 0% 100% parted /dev/sdd set 1 raid on
-
Add New Drive to the Array
mdadm --add /dev/md0 /dev/sdd1
Rebuild begins automatically.
Monitor progress:
watch cat /proc/mdstat
- Array:
-
🔄 Rebuilding the Array
Expected output during rebuild:
[>....................] rebuild = 5.3% (103424/1953512448) finish=120.5min speed=26000K/sec
-
🧾 Clone Partition Table Automatically (Optional Best Practice)
If your drives must match exactly:
sfdisk -d /dev/sdc | sfdisk /dev/sdd
Then add the partition:
mdadm --add /dev/md0 /dev/sdd1
-
⚡ Hot Spare Setup (Automatic Recovery)
Add a spare disk:
mdadm --add /dev/md0 /dev/sde1
Verify:
Spare Devices : 1
If a disk fails, mdadm automatically pulls in the spare.
-
🛡️ SMART Monitoring
Schedule SMART tests:
Create
/etc/cron.weekly/smartcheck:#!/bin/bash smartctl -t short /dev/sda smartctl -t short /dev/sdb
-
🔐 Email Alerts for RAID Failure
Install
mdadmmail alerts:Edit
/etc/mdadm.conf:MAILADDR admin@yourdomain.com
Restart:
systemctl restart mdmonitor
-
🩺 Advanced Diagnostics
-
Check current RAID bitmap (helps fast rebuild)
mdadm --detail /dev/md0 | grep -i bitmap
-
Verify stripes (RAID5/6)
echo check > /sys/block/md0/md/sync_action
-
-
💣 Troubleshooting Scenarios
-
Scenario A: RAID shows “degraded” even after rebuild
Force re-add disk:
mdadm --add /dev/md0 /dev/sdd1 --force
-
Scenario B: md0 will not assemble on boot
mdadm --assemble --scan
-
Scenario C: Accidentally removed the wrong disk
Re-add it:
mdadm --add /dev/md0 /dev/sdb1
-
Scenario D: Superblock errors
Zero superblock before reuse:
mdadm --zero-superblock /dev/sdd1
-
-
📦 Backup mdadm metadata (critical!)
Save RAID definition:
mdadm --detail --scan > /root/mdadm.backup
Save disk partition tables:
sfdisk -d /dev/sda > /root/sda.part sfdisk -d /dev/sdb > /root/sdb.part
-
🧹 Full Cleanup Commands (Destroy RAID)
umount /mnt/raid mdadm --stop /dev/md0 mdadm --remove /dev/md0 mdadm --zero-superblock /dev/sd[bcd]1
📘 Summary: Best Practices for Software RAID Administration
✔ Always use RAID 1 or RAID 10 for critical servers
✔ Keep at least one hot spare on RAID 5/6/10
✔ Enable email alerts
✔ Monitor smartctl logs weekly
✔ Run periodic RAID checks
✔ Save /etc/mdadm.conf after any modification
✔ Use identical disks whenever possible
✔ Keep a replacement drive on hand
Conclusion
You now know about software RAID setup, management, and administration.









