RAID5 gone after swapping defective drive

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • RAID5 gone after swapping defective drive

      Hey everyone,

      I've already seen that I'm not the only one with this kind of problem but I don't want to hijack other threads so I'm making my own.
      I recently started getting emails with an increasing error log on one of my drive which resulted in an "OfflineUncorrectableSector" so I turned the system off, waited for the new drive to arrive and plugged it in today.
      After starting up again I got the following email:

      monit alert -- Status failed mountpoint_media_... wrote:

      Status failed Service mountpoint_media_0ccf9178-985e-4e03-a859-e717a89a20dd


      Date: Fri, 09 Aug 2019 17:50:43
      Action: alert
      Host: NAS
      Description: status failed (1) -- /media/0ccf9178-985e-4e03-a859-e717a89a20dd is not a mountpoint


      Your faithful employee,
      Monit

      And when I wanted to rebuild the raid it wasn't showing in the Raid section, though all drives are visible in the Drives section.

      Below some outputs that I found in another thread to provide, though please treat me as a Linux noob, I'm still learning :)
      I have 3x 2TB WD red

      cat /proc/mdstat

      Source Code

      1. Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
      2. md127 : inactive sdc[2](S) sdb[1](S)
      3. 5860271024 blocks super 1.2
      4. unused devices: <none>

      blkid

      Source Code

      1. /dev/sdb: UUID="25f41e2c-c766-3a75-462d-a5746b47a522" UUID_SUB="cb760801-4799-3a1d-5a12-60d9d7e07abf" LABEL="NAS:Raid5x1" TYPE="linux_raid_member"
      2. /dev/sdc: UUID="25f41e2c-c766-3a75-462d-a5746b47a522" UUID_SUB="c2b22e85-6da0-f2d1-806a-b3b6c54cc381" LABEL="NAS:Raid5x1" TYPE="linux_raid_member"
      3. /dev/sdd1: UUID="e9cc3846-3bd3-4099-8f55-ff16e09e4c32" TYPE="ext4" PARTUUID="000df838-01"
      4. /dev/sdd5: UUID="de8db28c-13c5-408f-9ce1-9c3ddc625c4a" TYPE="swap" PARTUUID="000df838-05"

      fdisk -l | grep Disk

      Source Code

      1. The primary GPT table is corrupt, but the backup appears OK, so that will be used.Disk /dev/sda: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
      2. Disk /dev/sdb: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
      3. Partition 1 does not start on physical sector boundary.
      4. Disk /dev/sdc: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
      5. Disk identifier: D305B4BA-D562-4DEE-9B34-8EA95FBC8337
      6. Disk /dev/sdd: 28 GiB, 30016659456 bytes, 58626288 sectors
      7. Disk identifier: 0x000df838

      cat /etc/mdadm/mdadm.conf


      Source Code: cat /etc/mdadm/mdadm.conf

      1. # mdadm.conf
      2. #
      3. # Please refer to mdadm.conf(5) for information about this file.
      4. #
      5. # by default, scan all partitions (/proc/partitions) for MD superblocks.
      6. # alternatively, specify devices to scan, using wildcards if desired.
      7. # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
      8. # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
      9. # used if no RAID devices are configured.
      10. DEVICE partitions
      11. # auto-create devices with Debian standard permissions
      12. CREATE owner=root group=disk mode=0660 auto=yes
      13. # automatically tag new arrays as belonging to the local system
      14. HOMEHOST <system>
      15. # definitions of existing MD arrays
      16. ARRAY /dev/md/Raid5x1 metadata=1.2 spares=0 name=NAS:Raid5x1 UUID=25f41e2c:c7663a75:462da574:6b47a522
      17. # instruct the monitoring daemon where to send mail alerts
      18. MAILADDR my.email@addre.ss
      Display All


      mdadm --detail --scan --verbose

      Source Code

      1. INACTIVE-ARRAY /dev/md127 num-devices=2 metadata=1.2 name=NAS:Raid5x1 UUID=25f41e2c:c7663a75:462da574:6b47a522
      2. devices=/dev/sdb,/dev/sdc



      I hope someone can help me :)
    • mdadm, software raid is not hot swap aware you have to tell it what to do.

      So currently your array is inactive with /dev/sd[bc] as 2 drives from your 3 drive Raid 5.

      If the new drive shows under storage -> disks select it then wipe from the menu (short wipe will be sufficient)

      Then run mdadm --stop /dev/md127 then mdadm --add /dev/md127 /dev/sda I will assume that sda is the new drive.

      If that returns no errors mdadm --assemble --verbose --force /dev/md127 /dev/sd[abc]

      Check cat /proc/mdstat to check the progress, when finished omv-mkconf mdadm
      Raid is not a backup! Would you go skydiving without a parachute?
    • Hi,
      thanks for your quick reply, sadly I only got to the second step.
      Wiping and stopping the raid worked but I couldn't run the -add command

      Source Code

      1. root@NAS:~# mdadm --stop /dev/md127
      2. mdadm: stopped /dev/md127
      3. root@NAS:~# mdadm --add /dev/md127 /dev/sda
      4. mdadm: error opening /dev/md127: No such file or directory
    • yup, sda should be the new drive, at least according to the S/N in the web interface. sdd is the boot SSD.

      Source Code

      1. Disk /dev/sdb: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
      2. Units: sectors of 1 * 512 = 512 bytes
      3. Sector size (logical/physical): 512 bytes / 4096 bytes
      4. I/O size (minimum/optimal): 4096 bytes / 4096 bytes
      5. The primary GPT table is corrupt, but the backup appears OK, so that will be used.
      6. Disk /dev/sdc: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
      7. Units: sectors of 1 * 512 = 512 bytes
      8. Sector size (logical/physical): 512 bytes / 4096 bytes
      9. I/O size (minimum/optimal): 4096 bytes / 4096 bytes
      10. Disklabel type: gpt
      11. Disk identifier: D305B4BA-D562-4DEE-9B34-8EA95FBC8337
      12. Device Start End Sectors Size Type
      13. /dev/sdc1 34 262177 262144 128M Microsoft reserved
      14. Partition 1 does not start on physical sector boundary.
      15. Disk /dev/sdd: 28 GiB, 30016659456 bytes, 58626288 sectors
      16. Units: sectors of 1 * 512 = 512 bytes
      17. Sector size (logical/physical): 512 bytes / 512 bytes
      18. I/O size (minimum/optimal): 512 bytes / 512 bytes
      19. Disklabel type: dos
      20. Disk identifier: 0x000df838
      21. Device Boot Start End Sectors Size Id Type
      22. /dev/sdd1 * 2048 56145919 56143872 26,8G 83 Linux
      23. /dev/sdd2 56147966 58626047 2478082 1,2G 5 Extended
      24. /dev/sdd5 56147968 58626047 2478080 1,2G 82 Linux swap / Solaris
      25. Disk /dev/sda: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
      26. Units: sectors of 1 * 512 = 512 bytes
      27. Sector size (logical/physical): 512 bytes / 4096 bytes
      28. I/O size (minimum/optimal): 4096 bytes / 4096 bytes
      Display All
      Screenshot for easier reading because colours
      [Blocked Image: https://i.imgur.com/hQSc7sO.png]
    • None that's up to date, but I could pull a new one if I plugged in the old hdd, then the raid is still recognised and everything can be accessed. So weird it's doing this now, I had a faulty drive before and rebuilding with the new one worked just as intended. the new one back then is the one that's now dying. So after I do that I just wipe everything and rebuild from scratch?
    • You could put the old drive back, the array should/may come up clean/degraded or clean either way get a back up done at least of everything important.

      Then remove the failing drive from the array using delete on the menu, in the GUI it's important to make a note of each drive reference i.e. sda, sdb, sdc and use that with the information in Storage -> Disks which gives each drives Model and Serial Number that way you don't pull the wrong drive.

      Then come back, it should be possible to sort this out, but it may require the array re sync a few times.
      Raid is not a backup! Would you go skydiving without a parachute?