quinta-feira, junho 23, 2016

Replacing a failed disk in a mdadm RAID

Replacing a failed disk in a mdadm RAID | tjansson.dk


linux mdadm replace disk raid 5

linux mdadm replace disk raid 6

centos linux mdadm replace disk raid 5

centos linux mdadm replace disk raid 6

short version
But, here are the steps to replace a disk...
First fail the disk
#sudo mdadm --manage /dev/md0 --fail /dev/sdb1
Then, remove it from the array
#sudo mdadm --manage /dev/md0 --remove /dev/sdb1
Then, replace it with a new one...
#sudo mdadm --manage /dev/md0 --add /dev/sdb1
f the disk is not available anymore, you can just fdisk the new drive for linux raid and then add it to the array with the last command I gave above, and mdadm will add it in, and start to resync the array. Hope that helps.

command list:

Faulty disks recovery
  • umount /dev/md0
  • mdadm --stop /dev/md0
  • mdadm --assemble  /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 
    • if command above does not show all your drivers than you have a problem, should the force...
  • mdadm —assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
    • remove faulty disk if necessary, and add new ones
  • mdadm --manage /dev/md0 --add /dev/sdc1
  • mdadm --manage /dev/md0 --add /dev/sdd1
    • Example error when driver in array "Error: mdadm: Cannot open /dev/sdb1: Device or resource busy" (already in array)
  • cat /proc/mdstat (watch -n1 cat /proc/mdstat)

$ mdadm --stop /dev/md0
mdadm: stopped /dev/md0
$ mdadm --assemble  /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/md0 has been started with 4 drives.

cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda1[0] sdd1[5] sdc1[4] sdb1[1]
      3906764800 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/15 pages [0KB], 65536KB chunk

unused devices:

Take my RAID for example:
root@mark21:/tmp/etc/udev# fdisk -l /dev/sda

Disk /dev/sda: 640.1 GB, 640135028736 bytes
255 heads, 63 sectors/track, 77825 cylinders, total 1250263728 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000ffc4

Device Boot      Start         End      Blocks   Id  System
/dev/sda1            2048  1240233983   620115968   fd  Linux raid autodetect

root@mark21:/tmp/etc/udev# dumpe2fs /dev/sda1
dumpe2fs 1.41.14 (22-Dec-2010)
dumpe2fs: Bad magic number in super-block while trying to open /dev/sda
Couldn't find valid filesystem superblock.
That you were able to recreate the RAID set at all is extremely lucky, but that doesn't change the fundamental flaws in your deployment. This will happen again.

What I would recommend is:
  1. Backup everything on that raid set
  2. Destroy the array and erase the md superblock from each device (man mdadm)
  3. Zero out those disks: dd if=/dev/zero of=/dev/sdX bs=1M count=100
  4. Create partitions on sda, sdc, sdd, & sdf that span 99% of the disk [0]
  5. Tag those partitions as type fd linux-raid wiki
  6. never ever format these partitions with any sort of filesystem
  7. Create a new RAID 5: mdadm --create /dev/md0 -v -f -l 5 -n 4 /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1
  8. Update new UUID in /etc/mdadm.conf
  9. Live happily ever after

mdadm --examine /dev/sdb /dev/sdc /dev/sdd /dev/sde
mdadm --examine /dev/sdb /dev/sdc /dev/sdd /dev/sda
cat /proc/mdstat
mdadm --stop /dev/md0
mdadm --assemble --scan
mdadm --examine /dev/sdb /dev/sdc /dev/sdd /dev/sda
mdadm --examine /dev/sda1
mdadm --examine /dev/sdb1
mdadm --examine /dev/sdc1
mdadm --examine /dev/sdb1
mdadm --assemble --scan -v
mdadm --examine /dev/sda1 |more
mdadm --examine /dev/sdc1 |more
mdadm --examine /dev/sda1 |more
mdadm --examine /dev/sdc1 |more
mdadm --examine /dev/sda1 |more
mdadm --examine /dev/sdb1 |more
mdadm --examine /dev/sdd1 |more
mdadm --assemble --scan -v
mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm --detail /dev/md0
cat /proc/mdstat
mdadm --detail /dev/md0
mdadm --examine --scan
mdadm --detail /dev/md0
vim /etc/mdadm.conf
mdadm -E --scan
dmesg |grep md
cat /proc/mdstat
mdadm -E /dev/sd[a-d]1
mdadm -E /dev/sd[a]1
mdadm -E /dev/sd[b]1
mdadm -E /dev/sdc1
mdadm -E /dev/sdd1
mdadm -E /dev/sda1
mdadm --detail /dev/md0
mdadm --manage /dev/md0 --add /dev/sdc1
mdadm --manage /dev/md0 --add /dev/sdd1
mdadm --detail /dev/md0


Replacing a failed disk in a mdadm RAID


I have a RAID5 with 4 disks, see Rebuilding and updating my Linux NAS and HTPC server, and from my daily digest emails of the system I discovered that one of my disk had issues. I found the following in dmesg:
[ 8347.726688] ata6.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x0
[ 8347.726694] ata6.00: irq_stat 0x40000008
[ 8347.726698] ata6.00: failed command: READ FPDMA QUEUED
[ 8347.726705] ata6.00: cmd 60/08:38:78:10:00/00:00:17:00:00/40 tag 7 ncq 4096 in
[ 8347.726705]          res 41/40:00:78:10:00/00:00:17:00:00/40 Emask 0x409 (media error) <F>
[ 8347.726709] ata6.00: status: { DRDY ERR }
[ 8347.726711] ata6.00: error: { UNC }
[ 8347.731152] ata6.00: configured for UDMA/133
[ 8347.731180] sd 5:0:0:0: [sde] Unhandled sense code
[ 8347.731183] sd 5:0:0:0: [sde]  
[ 8347.731185] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8347.731188] sd 5:0:0:0: [sde]  
[ 8347.731190] Sense Key : Medium Error [current] [descriptor]
[ 8347.731194] Descriptor sense data with sense descriptors (in hex):
[ 8347.731195]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[ 8347.731204]         17 00 10 78 
[ 8347.731208] sd 5:0:0:0: [sde]  
[ 8347.731211] Add. Sense: Unrecovered read error - auto reallocate failed
[ 8347.731214] sd 5:0:0:0: [sde] CDB: 
[ 8347.731216] Read(10): 28 00 17 00 10 78 00 00 08 00
[ 8347.731224] end_request: I/O error, dev sde, sector 385880184
[ 8347.731227] end_request: I/O error, dev sde, sector 385880184
[ 8347.731241] ata6: EH complete
[ 8348.531767] raid5_end_read_request: 2 callbacks suppressed
[ 8348.531779] md/raid:md0: read error corrected (8 sectors at 385878128 on sde1)
[ 8348.531785] md/raid:md0: read error corrected (8 sectors at 385878136 on sde1)
[ 8348.534558] md/raid:md0: read error corrected (8 sectors at 385878080 on sde1)
[ 8348.534560] md/raid:md0: read error corrected (8 sectors at 385878088 on sde1)
[ 8348.534562] md/raid:md0: read error corrected (8 sectors at 385878096 on sde1)
[ 8348.534563] md/raid:md0: read error corrected (8 sectors at 385878104 on sde1)
[ 8348.534564] md/raid:md0: read error corrected (8 sectors at 385878112 on sde1)
[20132.633534] md: md0: data-check done.

Investigating the bad drive

To further investigate the disk in question (/dev/sde) I looked into the S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology) status of the sick drive:
# smartctl -i /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD10EARS-003BB1
Serial Number:    WD-WCAV5K430328
LU WWN Device Id: 5 0014ee 2afe6f748
Firmware Version: 80.00A80
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Mon Dec  2 22:09:37 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
This didn’t really tell me anything, so I started a “long” self-test with the following command. The long self-test takes about 2 hours – alternatively there is a short, but less thorough self-test that takes around 2 minutes:
smartctl -t long /dev/sde
The output of a self-test can be found with the following command. In my case it was clear the the drive indeed was in trouble.
# smartctl -l selftest /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10-3-amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     23574         267040872
I ordered a 3TB WD RED disk (especially made for NAS operations) to replace it. It is much larger and initially I will not be able to utilize the 3TB, but once all the old 1TB disks eventually fails and I have replaced them all with 3TB disks, I can grow the raid.

Removing the faulty disk

A important part of a RAID setup is the ability to cope with the failure of a faulty disk. The enclosure I have does not support hot-swap and the disk have no separate lights for each disk, so I need a way to find out which of the disks to replace. Finding the serial number of the disk is fairly easy:
# hdparm -i /dev/sde | grep SerialNo
 Model=WDC WD10EARS-003BB1, FwRev=80.00A80, SerialNo=WD-WCAV5K430328
and luckily the Western Digital disks I have came with a small sticker which shows the serial on the disk. So now I know the serial number of the faulty disk, so before shutting down and replacing the disk I marked as failed in madam and removed from the raid:
mdadm --manage /dev/md0 --fail /dev/sde1
mdadm --manage /dev/md0 --remove /dev/sde1

Adding the new drive

Having replaced the faulty disk and inserted the new disk I found the serial on the back and compared it to the serial of /dev/sde to make sure I was about to format the right disk:
# hdparm -i /dev/sde | grep SerialNo
Model=WDC WD30EFRX-68EUZN0, FwRev=80.00A80, SerialNo=WD-WMC4N1096166
Partitioning disk over 2TB does not work with MSDOS filetable so I needed to use parted (instead of fdisk to partition the disk correctly). The “-a optimal” makes parted use the optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal performance.
# parted -a optimal /dev/sde 
(parted) mklabel gpt
(parted) mkpart primary 1 -1
(parted) set 1 raid on                                                    
(parted) print                                                                
Model: ATA WDC WD30EFRX-68E (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Number  Start   End     Size    File system  Name     Flags
 1      1049kB  3001GB  3001GB               primary  raid
(parted) quit                                                             
Information: You may need to update /etc/fstab.
Now the disk was ready for inclusion in the raid:
mdadm --manage /dev/md0 --add /dev/sde1
Over the next 3 hours I could monitor the rebuild using the following command:
[root@kelvin ~][20:43]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[5] sdc1[1] sdb1[3] sdd1[4]
      2930280960 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  0.5% (4893636/976760320) finish=176.9min speed=91536K/sec
      bitmap: 4/8 pages [16KB], 65536KB chunk
unused devices: <none>

Monitoring health of the raid

I have several systems in place to monitor the health of my raid (among other things):
  • logwatch – monitors my /var/log/messages for anything out of the ordinary and mails me the output on a daily basis.
  • mdadm – mdadm will mail me if a disk has completely failed or the raid for some other reason fails. A complete resync is done every week.
  • smartd – I have smartd running “short” tests every night and long tests every second week. Reports are mailed to me.
  • munin – graphical and historical monitoring of performance and all stats of the server.

Nenhum comentário: