6x 3TB RAID6 - one disk dying, what to do?

  • Hello forum,


    I'm running OMV 5.6.21 with 6x 3TB drives in a RAID 6.

    I have three backups (1 full backup, 2 backups from important data).


    One of the RAID6 drives is reporting:

    • 197 Current_Pending_Sector -O--CK 200 200 000 - 27
    • 198 Offline_Uncorrectable ----CK 200 200 000 - 1

    The drive was a long time stable at 13 pending sectors but today I get the message that the amount of sectors doubled.


    Can I shutdown the NAS and remove the drive and the array runs then as RAID 5?

    Or what would you do?


    Thanks in advance!

  • yes, you can stop your NAS , remove dying HD and restart your NAS, and your data must be accesible.

  • If the drive hasn't failed i.e. the raid still shows as clean then all this can be done in the GUI, do not just remove the drive mdadm (software raid) is not hot swap

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 5x amd64 running on an HP N54L Microserver

  • Thanks for your answer and the hint I found the option in the GUI to remove drives from the RAID array. Does it make sense to remove the suspicious drive yet or should I wait until die GUI shows me in the RAID section anything different to "clean"? In the SMART section the suspicious drive has already a red light.

  • Does it make sense to remove the suspicious drive yet or should I wait until die GUI shows me in the RAID section anything different to "clean"

    Personal choice I suppose, if mdadm fails the drive the raid will show as clean/degraded, TBH the drive needs replacing, might as well do it now rather than later

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 5x amd64 running on an HP N54L Microserver

  • Ok I tried today to remove the drive via WebGUI. After clicking on remove, I got the following error message:


    Code
    devices: The value {"1":"\/dev\/sdb","2":"\/dev\/sdc","3":"\/dev\/sdd","4":"\/dev\/sdf","5":"\/dev\/sdg"} is not an array.

    Details shows:


    As email OMV send me:


    I have turned off the NAS and removed the NOK drive.

    Is there anyway that OMV rebuild the array as a RAID6 with 5 drives? With the 6 drives I had enough free space, so I assumed that this is somehow possible.

  • After clicking on remove, I got the following error message:

    Never seen that before, the error message is not finding /dev/sda, what's the output of the following;


    cat /proc/mdstat

    blkid

    mdadm --detail /dev/md0

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 5x amd64 running on an HP N54L Microserver

  • Here the output:

    cat /proc/mdstat

    Code
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md0 : active (auto-read-only) raid6 sdb[1] sdc[2] sdg[5] sdd[3] sdf[4]
    11720536064 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [_UUUUU]
    bitmap: 1/22 pages [4KB], 65536KB chunk
    unused devices: <none>


    blkid

    Code
    /dev/sdb: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="e4c70246-20e7-f375-1127-a24618676fea" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
    /dev/sdc: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="2166f5e2-05b8-62b0-c4b6-d6567b98197e" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
    /dev/sdd: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="ad00b629-3f9f-a24d-404d-4ed9dcbe10fe" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
    /dev/sde1: SEC_TYPE="msdos" UUID="4F9B-6731" TYPE="vfat"
    /dev/sdf: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="cedd2161-b2ab-6a87-3600-8d19b7fcb708" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
    /dev/sdg: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="ba49dc21-6550-7d59-83b9-43b1f6d183fb" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
    /dev/sda2: UUID="d9a339ff-32d3-4af5-ac78-d051f898aacd" UUID_SUB="ab689f8b-4ad6-49d6-990a-25d9e1e5bd47" TYPE="btrfs" PARTUUID="2c6ef4dc-990d-4370-bc29-933a43994e80"
    /dev/sda3: UUID="2cdf4bd5-ce54-4023-8e84-2190040f56a6" TYPE="swap" PARTUUID="b8ab6da8-b910-4ae9-a610-70167339c057"
    /dev/md0: LABEL="NASRAID6" UUID="f2a2f310-a710-4716-80b2-490e1a20a232" UUID_SUB="2bda88d3-0a3d-467e-aafd-4a0fc50ab08a" TYPE="btrfs"
    /dev/sda1: PARTUUID="08dfaaf8-3078-498a-86f5-06520089ca74"


    mdadm --detail /dev/md0

  • active (auto-read-only)

    The raid is in an auto read only state, running mdadm --readwrite /dev/md0 should correct that, the output from mdstat and mdadm --detail shows /dev/sda as removed, which shows in the email you received (F) failed, so it's somewhat confusing.


    In Raid Management, select the raid, on the menu click remove, this should display a dialog of the listed drives within the array, if mdadm had removed the drive, assuming /dev/sda it would not be in the list, therefore the error, OMV uses the complete block device (drive) to create an array.


    Added to that the output of blkid shows /dev/sda with at least 3 partitions with no information associated to dev/sda1, but dev/sda2 shows a file system of btrfs the same as the raid, along with a swap file on /dev/sda3. This might suggest that a partition was added to the array at some point, which is possible from the cli but not from within OMV's GUI.


    Either way the array is in a clean/degraded state and requires a new drive to be added.


    What's the output of fdisk -l | grep "Disk "


    The output information would also suggest the machine has been rebooted.

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 5x amd64 running on an HP N54L Microserver

  • Hi geaves,


    thanks for your answer!

    In Raid Management, select the raid, on the menu click remove, this should display a dialog of the listed drives within the array

    There I have selected "sda" an clicked on remove. After that I got the error for #9. But webGUI showed me that sda was removed, so I shut down the NAS and removed sda and rebooted.


    What's the output of fdisk -l | grep "Disk "


    mdadm --readwrite /dev/md0[/tt] should correct that

    Code
    mdadm: failed to set writable for /dev/md0: Device or resource busy


    Either way the array is in a clean/degraded state and requires a new drive to be added.

    When I don't want to replace the drive, what is the best way? Rebuild the RAID?

  • After that I got the error for #9. But webGUI showed me that sda was removed, so I shut down the NAS and removed sda and rebooted.

    That would explain the change in the drive identifier's /dev/sd?

    When I don't want to replace the drive, what is the best way? Rebuild the RAID?

    You will have too, no choice, so there are two options;


    1) Maintain the current system and rebuild, to do that you will have to;


    a) Remove SMB shares related to the array

    b) Remove shared folders related to the array

    c) Remove the array


    2) Reinstall


    This is the lesser of the two and you could install OMV6, but check out this thread omv-extras plugins - porting progress to OMV 6.x first


    If you went down this route you would have to wipe your drives before doing anything with them

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 5x amd64 running on an HP N54L Microserver

  • Hi geaves,

    2) Reinstall

    I reinstalled OMV6. So far so good. Some questions where I hope that you can help me:


    1. Is there anyway to display temperature und fanspeed in the dashboard?
    2. You mentioned the omv-extras plugins. Is the installation still done with?!?:
    Code
    wget -O - https://github.com/OpenMediaVault-Plugin-Developers/packages/raw/master/install | bash

    Thank you!

  • Happy new year to everyone!

    2) Reinstall

    I have upgraded to OMV6 but need your help with this issue. I got today two mails from the NAS:

    But when I check the GUI it tells me that the array is clean, but in the details I see that 4 working device no spare device.


    But I added /dev/sda with the GUI Storage --> Raid Management --> Recover "Add hot spares", nevertheless this seems not to work.

    I already used the GUI button remove to remove /dev/sda, mada a quick erase of /dev/sda added it again a s hot spare. During rebuild process the details showed 1 device as spare device but after finishing all devices are active devices.


    What I'm doing wrong?


    Thanks in advance for your support.

  • What I'm doing wrong

    TBH I'm not sure, this -> Recover "Add hot spares" must be something new or a change, previously Recover would just add a new/replacement drive to the array. I always used hot spares when running hardware raid, if a drive failed the controller would automatically initiate the spare into the array and fire off an email with the information.

    During rebuild process the details showed 1 device as spare device but after finishing all devices are active devices.

    This is where I'm not sure what's going, if you erased/wiped the drive then used Recover (add hot spares) mdadm knows there should be 4 drives in your initial array. It knows you only have 3 active drives, so adding that as a spare mdadm will pick up that there is a drive available to add to the array to replace the missing drive.


    At this moment I would monitor emails, this may go away as basically it's telling you that the spare it had is missing, that's due to the fact it's been added. But it may also barf at the fact there was a spare and now there isn't.

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 5x amd64 running on an HP N54L Microserver

  • Thanks for your answer.

    TBH I'm not sure, this -> Recover "Add hot spares" must be something new or a change

    Okay probably I have a misunderstanding, I don't know.

    I used the "+" and in the next dialog where I selected /dev/sda there the text says "add hot spares / recover raid device"

    Just for info: Each disc has 2.73 TiB capacity


    This info email "spare is missing" seems to be sent everytime I boot the NAS.


    Does it make sense to report this as "bug" anywhere?

  • I used the "+" and in the next dialog where I selected /dev/sda there the text says "add hot spares / recover raid device

    The GUI has changed since OMV5, that had a Recover button, but I'm guessing it's one of the same

    Does it make sense to report this as "bug" anywhere?

    Github there's probably a way of suppressing the email


    This info email "spare is missing" seems to be sent everytime I boot the NAS

    The only time I reboot is when there is a kernel update, otherwise mine is on 24/7

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 5x amd64 running on an HP N54L Microserver

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!