A disk with "warning" message is missing in RAID5

  • Hello,

    about 4 years ago I set up the NAS with 6TB disks, which I configured in RAID5.

    Since a few days I've been getting a warning message from the daily checks done by OMV, so I investigated and noticed that RAID 5 is on 5 disks (instead of all 6 installed ones); also, the missing disk has a "Warning" with "Bad sectors".


    I guess I need to replace the disk, but what would be the correct procedure?

    Could you point me to a guide?


    Below you can see - from OMV GUI - the RAID configuration (labeled as "degraded."


    Here you can see the detail of the missing disk, namely "sdb":


    Finally, here is some data taken from CLI to help understand the overall situation:

    Code
    root@diynas-omv:/srv/dev-disk-by-label-RAID5/# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10] 
    md0 : active raid5 sdc[4] sdd[0] sdf[1] sde[5] sda[2]
          29301969920 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [UUU_UU]
          bitmap: 40/44 pages [160KB], 65536KB chunk
    unused devices: <none>


    Code
    root@diynas-omv:/srv/dev-disk-by-label-RAID5/# blkid
    /dev/sdf: UUID="add77410-4af9-84b4-4a1b-f4c9a53296ae" UUID_SUB="540cd822-1281-aead-7f6f-ca4f0842dc80" LABEL="DIYNAS-OMV:RAID5" TYPE="linux_raid_member"
    /dev/nvme0n1p5: UUID="21dbc86c-cfc1-4b4a-a9ed-566388d70285" TYPE="swap" PARTUUID="5b2a273a-05"
    /dev/nvme0n1p1: UUID="975fe369-9535-4459-80f8-f86795bdcb26" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="5b2a273a-01"
    /dev/sdd: UUID="add77410-4af9-84b4-4a1b-f4c9a53296ae" UUID_SUB="9e5d7091-039a-77b5-77b8-c3450cc7cac8" LABEL="DIYNAS-OMV:RAID5" TYPE="linux_raid_member"
    /dev/sdb: UUID="add77410-4af9-84b4-4a1b-f4c9a53296ae" UUID_SUB="f6d205c8-bc1a-5cd1-6a8a-7458a4b0c817" LABEL="DIYNAS-OMV:RAID5" TYPE="linux_raid_member"
    /dev/md0: LABEL="RAID5" UUID="b143f942-1f22-42e0-b92d-4e4c557ff36c" BLOCK_SIZE="4096" TYPE="ext4"
    /dev/sde: UUID="add77410-4af9-84b4-4a1b-f4c9a53296ae" UUID_SUB="f5ddc7d4-d721-6796-e75e-733251246271" LABEL="DIYNAS-OMV:RAID5" TYPE="linux_raid_member"
    /dev/sdc: UUID="add77410-4af9-84b4-4a1b-f4c9a53296ae" UUID_SUB="1e112d20-45bd-32d0-9ebf-69bea2a73f9b" LABEL="DIYNAS-OMV:RAID5" TYPE="linux_raid_member"
    /dev/sda: UUID="add77410-4af9-84b4-4a1b-f4c9a53296ae" UUID_SUB="93903863-bbf5-8435-c86f-c1dc338c5bb7" LABEL="DIYNAS-OMV:RAID5" TYPE="linux_raid_member"




    Code
    root@diynas-omv:/srv/dev-disk-by-label-RAID5/# mdadm --detail --scan --verbose
    ARRAY /dev/md0 level=raid5 num-devices=6 metadata=1.2 name=DIYNAS-OMV:RAID5 UUID=add77410:4af984b4:4a1bf4c9:a53296ae
       devices=/dev/sda,/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf


    Thank you very much!

    • Official Post

    You can double check the drive removal by running mdadm --detail /dev/md0 from the cli


    If you see 'removed' next to /dev/sdb then mdadm has remove the disk from the array, all you need to do is to replace the drive;


    Shut down and remove the failed drive

    Install a new drive -> start up

    Storage Disks -> select the drive and do a quick/short wipe, this will prepare the drive for OMV to use

    Raid Management/Md - Plugin there should be an option 'recover', click on that and your new drive should be displayed, select it and click OK the drive will then be added to the array

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 8x amd64 running on an HP N54L Microserver



  • Confirmed, the drive is already removed, as you can see below.



    Since this situation occurs, my NAS it stops and turns itself off.

    When turned on (manually), I'm receiving mails like this one:


    I suppose this situation depends on the harddrive damaged, am I right?

    Then... before proceeding to remove it, I'm just asking to myself how can I recognize the right drive to remove! Any idea also here?

    • Official Post

    Since this situation occurs, my NAS it stops and turns itself off.

    :/ it shouldn't, so there could be another issue, i.e. hardware, intermittent power


    When turned on (manually), I'm receiving mails like this one:

    That's normal if you have notifications enabled

    I suppose this situation depends on the harddrive damaged, am I right?

    Not necessarily based upon your first quote about your machine stopping, mdadm could see that as a consistent fault and remove the drive


    One option would be to run a SMART test on that drive, you should also check out this section in the user guide


    Then... before proceeding to remove it, I'm just asking to myself how can I recognize the right drive to remove!

    Using OMV make a note of each drives serial number -> Storage -> Disks

  • Thank you.


    I've set SMART tasks to monitor my drives:

    • short self-test > weekly
    • long self- test > monthly


    However, I am a bit concerned because now the drive that is reported "corrupted" is /dev/sda, whereas until a few days ago it was /dev/sdb.

    I'm afraid both will have to be changed....


    Currently, my RAID is made with 6TB drives (3x Seagate "ST6000NM0115-1YZ110", 3x Seagate "ST6000NM0095"): considering the cost of HDDs, should I gradually replace them with more modern and larger but (hopefully) less expensive HDDs?

  • It's not uncommon for multiple disks to fail one after the other in a RAID. They were usually purchased at the same time from the same batch and are subjected to the same load in the RAID. So why shouldn't they fail sooner? With six drives, you usually use a spare drive. And you usually have at least one spare disk in stock. RAID 5 doesn't mean it will last forever. It mean, it is easy to maintain without downtime - if you care right.
    And by the way, 54° are much too warm for a Harddisk and constantly maybe a reason for dying drives.

    NAS 7.7.23 | D2550 @ 1.86GHz | Ram/Disk: 4 GB / 5,5TB + 8TB | Raid5 + btfrs + Bcache + k3s
    & 3x k3s-worker [ J4105 @ 1.50GHz | Ram/Disk: 8 GB / 256 GB SSD ] & 4x k3s-worker [pi 4 8gb / 64GB SSD]

    Edited 2 times, last by Rd65 ().

    • Official Post

    However, I am a bit concerned because now the drive that is reported "corrupted" is /dev/sda, whereas until a few days ago it was /dev/sdb

    If you are restarting your server (shutdown/reboot) then the drive references can change, this makes it more difficult to identify which drive is the problem, unless you have written information of each drive in your system

    considering the cost of HDDs, should I gradually replace them with more modern and larger but (hopefully) less expensive HDDs

    Personally I would not run Raid5 with more than 4 drives I would use Raid6, this is a personal choice


    You'll only need larger drives if you require more space and replacing then 'gradually' would mean the array cannot 'grow' to it's new size until the last drive has been replaced.


    Yes, replace them with conventional drives


    When you buy new drives make sure they are CMR not SMR, whilst SMR are cheaper per data storage they can be an issue when using Raid

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!