RAID5 - Reassemble an inactive array after a disk failure

    • RAID5 - Reassemble an inactive array after a disk failure

      New

      Hello,

      I had a disk failure on my RAID5 recently while I was on holidays with no spare time to dedicate to that issue. The array was logically in degraded mode from that moment but was still running fine with 3 disks until a second event occured yesterday :

      Source Code

      1. This is an automatically generated mail message from mdadm
      2. running on OMV
      3. A Fail event had been detected on md device /dev/md/XPENOLOGY:2.
      4. It could be related to component device /dev/sdb5.
      5. Faithfully yours, etc.
      6. P.S. The /proc/mdstat file currently contains the following:
      7. Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
      8. md2 : active raid5 sdb5[0](F) sdc5[1] sdd[3]
      9. 17566977984 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/2] [_U_U]
      10. unused devices: <none>
      Display All



      I don't really understand why I received this message as this device is OK, the HP Gen 8 RAID controller does not detect any error on it and SMART tests are good. At this time I turned off the server and physically unplugged the first drive which was faulty then I restarted it but now my RAID5 is marked as inactive and is no more visible by OMV although the three remaining physical disks are present in the GUI.

      Source Code

      1. cat /proc/mdstat
      2. Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
      3. md2 : inactive sdc5[1](S) sdd[3](S) sdb[4](S)
      4. 17576636912 blocks super 1.2
      5. unused devices: <none>

      Source Code

      1. blkid
      2. /dev/sda1: UUID="98a223a7-01be-4910-871e-42449f27132e" TYPE="ext4" PARTUUID="0009b7e4-01"
      3. /dev/sda5: UUID="0a6e6abe-bd60-4323-b742-c8ae0d8ef1c2" TYPE="swap" PARTUUID="0009b7e4-05"
      4. /dev/sdb: UUID="394b8163-356b-6262-52f6-72d24c3bc33f" UUID_SUB="57584ed1-3398-6977-2e38-530ef2b968e2" LABEL="XPENOLOGY:2" TYPE="linux_raid_member"
      5. /dev/sdc1: UUID="e2ec8542-ea1a-e93a-3017-a5a8c86610be" TYPE="linux_raid_member" PARTUUID="3daedda2-ab78-4a06-b185-c980acdfe091"
      6. /dev/sdc2: UUID="d5d5bcfe-89b5-e8c4-2cf3-e5bf6a1edd70" TYPE="linux_raid_member" PARTUUID="42e1c209-98ba-48ee-9afc-247152902ac1"
      7. /dev/sdc5: UUID="394b8163-356b-6262-52f6-72d24c3bc33f" UUID_SUB="fe533fdf-afea-192c-5fd5-11c6102a15ca" LABEL="XPENOLOGY:2" TYPE="linux_raid_member" PARTUUID="dec4d9c2-d2ec-4513-b951-1bb8771fc52f"
      8. /dev/sr0: UUID="2015-06-29-06-52-36-00" LABEL="OpenMediaVault" TYPE="iso9660" PTUUID="78a03a04" PTTYPE="dos"
      9. /dev/sdd: UUID="394b8163-356b-6262-52f6-72d24c3bc33f" UUID_SUB="3db782f1-473c-081f-d8dc-6213c974b7db" LABEL="XPENOLOGY:2" TYPE="linux_raid_member"

      Source Code

      1. fdisk -l | grep "Disk "
      2. The primary GPT table is corrupt, but the backup appears OK, so that will be used.
      3. Partition 2 does not start on physical sector boundary.
      4. Partition 5 does not start on physical sector boundary.
      5. Disk /dev/sda: 16 GiB, 17179869184 bytes, 33554432 sectors
      6. Disk identifier: 0x0009b7e4
      7. Disk /dev/sdb: 5,5 TiB, 6001141571584 bytes, 11720979632 sectors
      8. Disk identifier: ABE4B6E0-89EC-489A-9BEA-6B1D38D42C8B
      9. Partition 2 does not start on physical sector boundary.
      10. Partition 5 does not start on physical sector boundary.
      11. Disk /dev/sdc: 5,5 TiB, 6001141571584 bytes, 11720979632 sectors
      12. Disk identifier: E26F72F4-EF94-48E8-9022-404A6317EB4D
      13. Disk /dev/sdd: 5,5 TiB, 6001141571584 bytes, 11720979632 sectors
      Display All

      Source Code

      1. cat /etc/mdadm/mdadm.conf
      2. # mdadm.conf
      3. #
      4. # Please refer to mdadm.conf(5) for information about this file.
      5. #
      6. # by default, scan all partitions (/proc/partitions) for MD superblocks.
      7. # alternatively, specify devices to scan, using wildcards if desired.
      8. # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
      9. # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
      10. # used if no RAID devices are configured.
      11. DEVICE partitions
      12. # auto-create devices with Debian standard permissions
      13. CREATE owner=root group=disk mode=0660 auto=yes
      14. # automatically tag new arrays as belonging to the local system
      15. HOMEHOST <system>
      16. # definitions of existing MD arrays
      17. ARRAY /dev/md/XPENOLOGY:2 metadata=1.2 spares=0 name=XPENOLOGY:2 UUID=394b8163:356b6262:52f672d2:4c3bc33f
      18. # instruct the monitoring daemon where to send mail alerts
      19. MAILADDR someaddress@somedomain.com
      20. MAILFROM root
      Display All

      Source Code

      1. mdadm --examine /dev/sdb
      2. /dev/sdb:
      3. Magic : a92b4efc
      4. Version : 1.2
      5. Feature Map : 0xa
      6. Array UUID : 394b8163:356b6262:52f672d2:4c3bc33f
      7. Name : XPENOLOGY:2
      8. Creation Time : Sun Jun 5 13:29:00 2016
      9. Raid Level : raid5
      10. Raid Devices : 4
      11. Avail Dev Size : 11720977584 (5589.00 GiB 6001.14 GB)
      12. Array Size : 17566977984 (16753.18 GiB 17988.59 GB)
      13. Used Dev Size : 11711318656 (5584.39 GiB 5996.20 GB)
      14. Data Offset : 2048 sectors
      15. Super Offset : 8 sectors
      16. Recovery Offset : 163875928 sectors
      17. Unused Space : before=1960 sectors, after=9658928 sectors
      18. State : clean
      19. Device UUID : 57584ed1:33986977:2e38530e:f2b968e2
      20. Update Time : Wed Aug 7 23:20:21 2019
      21. Bad Block Log : 512 entries available at offset 72 sectors - bad blocks present.
      22. Checksum : b1a144c8 - correct
      23. Events : 26486
      24. Layout : left-symmetric
      25. Chunk Size : 64K
      26. Device Role : Active device 2
      27. Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
      28. mdadm --examine /dev/sdc5
      29. /dev/sdc5:
      30. Magic : a92b4efc
      31. Version : 1.2
      32. Feature Map : 0x0
      33. Array UUID : 394b8163:356b6262:52f672d2:4c3bc33f
      34. Name : XPENOLOGY:2
      35. Creation Time : Sun Jun 5 13:29:00 2016
      36. Raid Level : raid5
      37. Raid Devices : 4
      38. Avail Dev Size : 11711318656 (5584.39 GiB 5996.20 GB)
      39. Array Size : 17566977984 (16753.18 GiB 17988.59 GB)
      40. Data Offset : 2048 sectors
      41. Super Offset : 8 sectors
      42. Unused Space : before=1968 sectors, after=0 sectors
      43. State : clean
      44. Device UUID : fe533fdf:afea192c:5fd511c6:102a15ca
      45. Update Time : Mon Aug 12 18:57:28 2019
      46. Checksum : 9560b95 - correct
      47. Events : 26567
      48. Layout : left-symmetric
      49. Chunk Size : 64K
      50. Device Role : Active device 1
      51. Array State : .A.A ('A' == active, '.' == missing, 'R' == replacing)
      52. mdadm --examine /dev/sdd
      53. /dev/sdd:
      54. Magic : a92b4efc
      55. Version : 1.2
      56. Feature Map : 0x0
      57. Array UUID : 394b8163:356b6262:52f672d2:4c3bc33f
      58. Name : XPENOLOGY:2
      59. Creation Time : Sun Jun 5 13:29:00 2016
      60. Raid Level : raid5
      61. Raid Devices : 4
      62. Avail Dev Size : 11720977584 (5589.00 GiB 6001.14 GB)
      63. Array Size : 17566977984 (16753.18 GiB 17988.59 GB)
      64. Used Dev Size : 11711318656 (5584.39 GiB 5996.20 GB)
      65. Data Offset : 2048 sectors
      66. Super Offset : 8 sectors
      67. Unused Space : before=1960 sectors, after=9658928 sectors
      68. State : clean
      69. Device UUID : 3db782f1:473c081f:d8dc6213:c974b7db
      70. Update Time : Mon Aug 12 18:57:28 2019
      71. Bad Block Log : 512 entries available at offset 72 sectors
      72. Checksum : 6e16750f - correct
      73. Events : 26567
      74. Layout : left-symmetric
      75. Chunk Size : 64K
      76. Device Role : Active device 3
      77. Array State : .A.A ('A' == active, '.' == missing, 'R' == replacing)
      Display All
      My configuration :

      My RAID5 contains 4 6To drives, 1 of them is faulty and has been removed physically (formerly /dev/sde)

      What I've tried to do until now :

      Source Code

      1. mdadm --stop /dev/md2
      2. mdadm: stopped /dev/md2
      3. mdadm --assemble /dev/md2 /dev/sd[bd] /dev/sdc5 --verbose --force --run
      4. mdadm: looking for devices for /dev/md2
      5. mdadm: /dev/sdb is identified as a member of /dev/md2, slot 2.
      6. mdadm: /dev/sdd is identified as a member of /dev/md2, slot 3.
      7. mdadm: /dev/sdc5 is identified as a member of /dev/md2, slot 1.
      8. mdadm: no uptodate device for slot 0 of /dev/md2
      9. mdadm: added /dev/sdb to /dev/md2 as 2 (possibly out of date)
      10. mdadm: added /dev/sdd to /dev/md2 as 3
      11. mdadm: added /dev/sdc5 to /dev/md2 as 1
      12. mdadm: failed to RUN_ARRAY /dev/md2: Input/output error
      13. mdadm: Not enough devices to start the array.
      Display All
      I thought the force mode would do the trick, especially with this little difference on the events counter (26486 vs 26567) but no luck with that.


      What's the next step then ? I've read some stuff about assume-clean switch that could work but I'm not sure about that. Could it be a good idea to do a mdadm --zero-superblock /dev/sdb ?

      Any help greatly appreciated :)

      Mazz
    • New

      That's not good indeed.

      THere is definititely something weird with my /dev/sdb disk because before my second crash it was mapped to the array with partition /dev/sdb5 and now OMV tries to use it as /dev/sdb.

      Source Code

      1. fdisk -l /dev/sdb
      2. The primary GPT table is corrupt, but the backup appears OK, so that will be used.
      3. Disk /dev/sdb: 5,5 TiB, 6001141571584 bytes, 11720979632 sectors
      4. Units: sectors of 1 * 512 = 512 bytes
      5. Sector size (logical/physical): 512 bytes / 4096 bytes
      6. I/O size (minimum/optimal): 262144 bytes / 262144 bytes
      7. Disklabel type: gpt
      8. Disk identifier: ABE4B6E0-89EC-489A-9BEA-6B1D38D42C8B
      9. Device Start End Sectors Size Type
      10. /dev/sdb1 2048 4982527 4980480 2,4G Linux RAID
      11. /dev/sdb2 4982528 9176831 4194304 2G Linux RAID
      12. /dev/sdb5 9453280 11720773983 11711320704 5,5T Linux RAID
      13. Partition 2 does not start on physical sector boundary.
      14. Partition 5 does not start on physical sector boundary.
      Display All


      The partition still exists however but mdadm doesn't know it anymore :

      Source Code

      1. mdadm --examine /dev/sdb5
      2. mdadm: cannot open /dev/sdb5: No such file or directory

      Is there something I can do to force mdadm tu use /dev/sdb5 instead of /dev/sdb ?
    • New

      Mazz wrote:

      mdadm: added /dev/sdb to /dev/md2 as 2 (possibly out of date)
      That is problem and as you have suggested --zero-superblock then add it back to the array would fix, however I don't believe that these are repairable unless you delete them.
      1. Partition 2 does not start on physical sector boundary.
      2. Partition 5 does not start on physical sector boundary
      M understanding is that OMV uses the whole disk for Raid configuration, I can only assume judging by this mdadm --assemble /dev/md2 /dev/sd[bd] /dev/sdc5 --verbose --force --run the original array was created via the cli.

      The other problem is that blkid does not list /dev/md2.

      Looking at the above you've had a failed drive which by physically removing it then rebooting returned the array as inactive which is the way mdadm behaves, a drive needs to be failed then removed from the array using mdadm. Obviously there has been some corruption on /dev/sdb which is why mdadm does not see /sdb5.

      You've lost the array, Raid 5 will only allow for one drive failure.
      Raid is not a backup! Would you go skydiving without a parachute?