6x 3TB RAID6 - one disk dying, what to do?

4k8uiMg3pYTJVtFQ5QsF · 14. Dezember 2021

Hello forum,

I'm running OMV 5.6.21 with 6x 3TB drives in a RAID 6.

I have three backups (1 full backup, 2 backups from important data).

One of the RAID6 drives is reporting:

197 Current_Pending_Sector -O--CK 200 200 000 - 27
198 Offline_Uncorrectable ----CK 200 200 000 - 1

The drive was a long time stable at 13 pending sectors but today I get the message that the amount of sectors doubled.

Can I shutdown the NAS and remove the drive and the array runs then as RAID 5?

Or what would you do?

Thanks in advance!

raulfg3 · 14. Dezember 2021

yes, you can stop your NAS , remove dying HD and restart your NAS, and your data must be accesible.

geaves · 14. Dezember 2021

Zitat von 4k8uiMg3pYTJVtFQ5QsF

Can I shutdown the NAS and remove the drive and the array runs then as RAID 5

Are you replacing the drive?

4k8uiMg3pYTJVtFQ5QsF · 14. Dezember 2021

Zitat von geaves

Are you replacing the drive?

When possible I would not replace the drive.

geaves · 14. Dezember 2021

If the drive hasn't failed i.e. the raid still shows as clean then all this can be done in the GUI, do not just remove the drive mdadm (software raid) is not hot swap

4k8uiMg3pYTJVtFQ5QsF · 14. Dezember 2021

Thanks for your answer and the hint I found the option in the GUI to remove drives from the RAID array. Does it make sense to remove the suspicious drive yet or should I wait until die GUI shows me in the RAID section anything different to "clean"? In the SMART section the suspicious drive has already a red light.

geaves · 14. Dezember 2021

Zitat von 4k8uiMg3pYTJVtFQ5QsF

Does it make sense to remove the suspicious drive yet or should I wait until die GUI shows me in the RAID section anything different to "clean"

Personal choice I suppose, if mdadm fails the drive the raid will show as clean/degraded, TBH the drive needs replacing, might as well do it now rather than later

macom · 14. Dezember 2021

Zitat von 4k8uiMg3pYTJVtFQ5QsF

198 Offline_Uncorrectable ----CK 200 200 000 - 1

I would replace the drive as fast as possible.

4k8uiMg3pYTJVtFQ5QsF · 17. Dezember 2021

Ok I tried today to remove the drive via WebGUI. After clicking on remove, I got the following error message:

Code

devices: The value {"1":"\/dev\/sdb","2":"\/dev\/sdc","3":"\/dev\/sdd","4":"\/dev\/sdf","5":"\/dev\/sdg"} is not an array.

Details shows:

Code

Fehler #0:
OMV\Json\SchemaValidationException: devices: The value {"1":"\/dev\/sdb","2":"\/dev\/sdc","3":"\/dev\/sdd","4":"\/dev\/sdf","5":"\/dev\/sdg"} is not an array. in /usr/share/php/openmediavault/json/schema.inc:384
Stack trace:
#0 /usr/share/php/openmediavault/json/schema.inc(281): OMV\Json\Schema->validateArray(Object(stdClass), Array, 'devices')
#1 /usr/share/php/openmediavault/json/schema.inc(261): OMV\Json\Schema->validateType(Object(stdClass), Array, 'devices')
#2 /usr/share/php/openmediavault/config/datamodel.inc(155): OMV\Json\Schema->validate(Object(stdClass), 'devices')
#3 /usr/share/php/openmediavault/config/configobject.inc(190): OMV\Config\DataModel->validateProperty('devices', Array)
#4 /usr/share/openmediavault/engined/rpc/raidmgmt.inc(492): OMV\Config\ConfigObject->set('devices', Array)
#5 [internal function]: Engined\Rpc\RaidMgmt->remove(Array, Array)
#6 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#7 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('remove', Array, Array)
#8 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('RaidMgmt', 'remove', Array, Array, 1)
#9 {main}

Alles anzeigen

As email OMV send me:

Code

This is an automatically generated mail message from mdadm
running on nas-omv-netgear

A Fail event had been detected on md device /dev/md0.

It could be related to component device /dev/sda.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active raid6 sda[0](F) sdc[2] sdg[5] sdb[1] sdf[4] sdd[3]
      11720536064 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [_UUUUU]
      bitmap: 0/22 pages [0KB], 65536KB chunk

unused devices: <none>

Alles anzeigen

I have turned off the NAS and removed the NOK drive.

Is there anyway that OMV rebuild the array as a RAID6 with 5 drives? With the 6 drives I had enough free space, so I assumed that this is somehow possible.

geaves · 17. Dezember 2021

Zitat von 4k8uiMg3pYTJVtFQ5QsF

After clicking on remove, I got the following error message:

Never seen that before, the error message is not finding /dev/sda, what's the output of the following;

cat /proc/mdstat

blkid

mdadm --detail /dev/md0

4k8uiMg3pYTJVtFQ5QsF · 17. Dezember 2021

Here the output:

Zitat von geaves

cat /proc/mdstat

Code

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md0 : active (auto-read-only) raid6 sdb[1] sdc[2] sdg[5] sdd[3] sdf[4]
      11720536064 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [_UUUUU]
      bitmap: 1/22 pages [4KB], 65536KB chunk

unused devices: <none>

Zitat von geaves

blkid

Code

/dev/sdb: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="e4c70246-20e7-f375-1127-a24618676fea" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
/dev/sdc: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="2166f5e2-05b8-62b0-c4b6-d6567b98197e" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
/dev/sdd: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="ad00b629-3f9f-a24d-404d-4ed9dcbe10fe" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
/dev/sde1: SEC_TYPE="msdos" UUID="4F9B-6731" TYPE="vfat"
/dev/sdf: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="cedd2161-b2ab-6a87-3600-8d19b7fcb708" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
/dev/sdg: UUID="51ec32cf-f4a4-abdf-75b7-87de3239679f" UUID_SUB="ba49dc21-6550-7d59-83b9-43b1f6d183fb" LABEL="nas-omv-netgear:6x3TBWDREDRAID6" TYPE="linux_raid_member"
/dev/sda2: UUID="d9a339ff-32d3-4af5-ac78-d051f898aacd" UUID_SUB="ab689f8b-4ad6-49d6-990a-25d9e1e5bd47" TYPE="btrfs" PARTUUID="2c6ef4dc-990d-4370-bc29-933a43994e80"
/dev/sda3: UUID="2cdf4bd5-ce54-4023-8e84-2190040f56a6" TYPE="swap" PARTUUID="b8ab6da8-b910-4ae9-a610-70167339c057"
/dev/md0: LABEL="NASRAID6" UUID="f2a2f310-a710-4716-80b2-490e1a20a232" UUID_SUB="2bda88d3-0a3d-467e-aafd-4a0fc50ab08a" TYPE="btrfs"
/dev/sda1: PARTUUID="08dfaaf8-3078-498a-86f5-06520089ca74"

Zitat von geaves

mdadm --detail /dev/md0

Code

/dev/md0:
           Version : 1.2
     Creation Time : Wed Dec 30 18:04:11 2020
        Raid Level : raid6
        Array Size : 11720536064 (11177.57 GiB 12001.83 GB)
     Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
      Raid Devices : 6
     Total Devices : 5
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Fri Dec 17 17:59:58 2021
             State : clean, degraded 
    Active Devices : 5
   Working Devices : 5
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : nas-omv-netgear:6x3TBWDREDRAID6  (local to host nas-omv-netgear)
              UUID : 51ec32cf:f4a4abdf:75b787de:3239679f
            Events : 38338

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       16        1      active sync   /dev/sdb
       2       8       32        2      active sync   /dev/sdc
       3       8       48        3      active sync   /dev/sdd
       4       8       80        4      active sync   /dev/sdf
       5       8       96        5      active sync   /dev/sdg

Alles anzeigen

geaves · 17. Dezember 2021

Zitat von 4k8uiMg3pYTJVtFQ5QsF

active (auto-read-only)

The raid is in an auto read only state, running mdadm --readwrite /dev/md0 should correct that, the output from mdstat and mdadm --detail shows /dev/sda as removed, which shows in the email you received (F) failed, so it's somewhat confusing.

In Raid Management, select the raid, on the menu click remove, this should display a dialog of the listed drives within the array, if mdadm had removed the drive, assuming /dev/sda it would not be in the list, therefore the error, OMV uses the complete block device (drive) to create an array.

Added to that the output of blkid shows /dev/sda with at least 3 partitions with no information associated to dev/sda1, but dev/sda2 shows a file system of btrfs the same as the raid, along with a swap file on /dev/sda3. This might suggest that a partition was added to the array at some point, which is possible from the cli but not from within OMV's GUI.

Either way the array is in a clean/degraded state and requires a new drive to be added.

What's the output of fdisk -l | grep "Disk "

The output information would also suggest the machine has been rebooted.

4k8uiMg3pYTJVtFQ5QsF · 18. Dezember 2021

Hi geaves,

thanks for your answer!

Zitat von geaves

In Raid Management, select the raid, on the menu click remove, this should display a dialog of the listed drives within the array

There I have selected "sda" an clicked on remove. After that I got the error for #9. But webGUI showed me that sda was removed, so I shut down the NAS and removed sda and rebooted.

Zitat von geaves

What's the output of fdisk -l | grep "Disk "

Code

Disk /dev/sdb: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: WDC WD30EFRX-68A
Disk /dev/sdc: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: WDC WD30EFRX-68A
Disk /dev/sdd: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: WDC WD30EFRX-68E
Disk /dev/sde: 122,5 MiB, 128450560 bytes, 250880 sectors
Disk model: USB DISK        
Disk identifier: 0x00000000
Disk /dev/sdf: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: WDC WD30EFRX-68E
Disk /dev/sdg: 2,7 TiB, 3000592982016 bytes, 5860533168 sectors
Disk model: WDC WD30EFRX-68E
Disk /dev/sda: 298,1 GiB, 320072932864 bytes, 625142447 sectors
Disk model: Expansion       
Disk identifier: A1EA4DC8-22AD-4344-9881-578CBB79E73A
Disk /dev/md0: 10,9 TiB, 12001828929536 bytes, 23441072128 sectors

Alles anzeigen

Zitat von geaves

mdadm --readwrite /dev/md0[/tt] should correct that

Code

mdadm: failed to set writable for /dev/md0: Device or resource busy

Zitat von geaves

Either way the array is in a clean/degraded state and requires a new drive to be added.

When I don't want to replace the drive, what is the best way? Rebuild the RAID?

geaves · 18. Dezember 2021

Zitat von 4k8uiMg3pYTJVtFQ5QsF

After that I got the error for #9. But webGUI showed me that sda was removed, so I shut down the NAS and removed sda and rebooted.

That would explain the change in the drive identifier's /dev/sd?

Zitat von 4k8uiMg3pYTJVtFQ5QsF

When I don't want to replace the drive, what is the best way? Rebuild the RAID?

You will have too, no choice, so there are two options;

1) Maintain the current system and rebuild, to do that you will have to;

a) Remove SMB shares related to the array

b) Remove shared folders related to the array

c) Remove the array

2) Reinstall

This is the lesser of the two and you could install OMV6, but check out this thread omv-extras plugins - porting progress to OMV 6.x first

If you went down this route you would have to wipe your drives before doing anything with them

4k8uiMg3pYTJVtFQ5QsF · 21. Dezember 2021

Hi geaves,

Zitat von geaves

2) Reinstall

I reinstalled OMV6. So far so good. Some questions where I hope that you can help me:

Is there anyway to display temperature und fanspeed in the dashboard?
You mentioned the omv-extras plugins. Is the installation still done with?!?:

Code

wget -O - https://github.com/OpenMediaVault-Plugin-Developers/packages/raw/master/install | bash

Thank you!

Zoki · 21. Dezember 2021

Install of omv-extras is still the same.

4k8uiMg3pYTJVtFQ5QsF · 3. Januar 2022

Happy new year to everyone!

Zitat von geaves

2) Reinstall

I have upgraded to OMV6 but need your help with this issue. I got today two mails from the NAS:

Code

This is an automatically generated mail message from mdadm
running on nas

A SparesMissing event had been detected on md device /dev/md1.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10] 
md1 : active raid5 sdb[0] sdd[1] sde[2] sda[4]
      8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 2/22 pages [8KB], 65536KB chunk

unused devices: <none>

Alles anzeigen

But when I check the GUI it tells me that the array is clean, but in the details I see that 4 working device no spare device.

Code

   Version : 1.2
     Creation Time : Mon Dec 20 17:14:21 2021
        Raid Level : raid5
        Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
     Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sun Jan  2 13:41:09 2022
             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : nas:1  (local to host nas)
              UUID : 587b4360:79e7be24:3e7f1cc8:6841b435
            Events : 29508

    Number   Major   Minor   RaidDevice State
       0       8       16        0      active sync   /dev/sdb
       1       8       48        1      active sync   /dev/sdd
       2       8       64        2      active sync   /dev/sde
       4       8        0        3      active sync   /dev/sda

Alles anzeigen

But I added /dev/sda with the GUI Storage --> Raid Management --> Recover "Add hot spares", nevertheless this seems not to work.

I already used the GUI button remove to remove /dev/sda, mada a quick erase of /dev/sda added it again a s hot spare. During rebuild process the details showed 1 device as spare device but after finishing all devices are active devices.

What I'm doing wrong?

Thanks in advance for your support.

geaves · 3. Januar 2022

Zitat von 4k8uiMg3pYTJVtFQ5QsF

What I'm doing wrong

TBH I'm not sure, this -> Recover "Add hot spares" must be something new or a change, previously Recover would just add a new/replacement drive to the array. I always used hot spares when running hardware raid, if a drive failed the controller would automatically initiate the spare into the array and fire off an email with the information.

Zitat von 4k8uiMg3pYTJVtFQ5QsF

During rebuild process the details showed 1 device as spare device but after finishing all devices are active devices.

This is where I'm not sure what's going, if you erased/wiped the drive then used Recover (add hot spares) mdadm knows there should be 4 drives in your initial array. It knows you only have 3 active drives, so adding that as a spare mdadm will pick up that there is a drive available to add to the array to replace the missing drive.

At this moment I would monitor emails, this may go away as basically it's telling you that the spare it had is missing, that's due to the fact it's been added. But it may also barf at the fact there was a spare and now there isn't.

4k8uiMg3pYTJVtFQ5QsF · 3. Januar 2022

Thanks for your answer.

Zitat von geaves

TBH I'm not sure, this -> Recover "Add hot spares" must be something new or a change

Okay probably I have a misunderstanding, I don't know.

I used the "+" and in the next dialog where I selected /dev/sda there the text says "add hot spares / recover raid device"

Just for info: Each disc has 2.73 TiB capacity

This info email "spare is missing" seems to be sent everytime I boot the NAS.

Does it make sense to report this as "bug" anywhere?

geaves · 3. Januar 2022

Zitat von 4k8uiMg3pYTJVtFQ5QsF

I used the "+" and in the next dialog where I selected /dev/sda there the text says "add hot spares / recover raid device

The GUI has changed since OMV5, that had a Recover button, but I'm guessing it's one of the same

Zitat von 4k8uiMg3pYTJVtFQ5QsF

Does it make sense to report this as "bug" anywhere?

Github there's probably a way of suppressing the email

Zitat von 4k8uiMg3pYTJVtFQ5QsF

This info email "spare is missing" seems to be sent everytime I boot the NAS

The only time I reboot is when there is a kernel update, otherwise mine is on 24/7

Jetzt mitmachen!