Missing raid 5 array

  • Hi all,


    my Raid5 array is missing it happened yesterdy in the evening when the OMV was operating as a fileshare, without having done any updates of configchanges in the hours before.

    Suddenly the clients mouted ffileshares were gone and on the OMV there mow is a Red NFS warning in dashbord and in the fiel system view where /dev/md0 should be are only spacers "-" now flagged red "Missing".


    Here comes the reqired output of what I should provide with the missing raid question. (And Yes, I have full a Backup from last week ; on 6 TB USB-Drive)


    I tried this but do not really understand what it does... (Hope it din't make it worese) :

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

    Edited 2 times, last by heinso ().

  • heinso

    Added the Label OMV 7.x
  • I'm glad to hear you have a full back as it's never a good idea to execute a mdadm command you do not fully understand. Has it done any harm?


    Please post the output of these commands:


    cat /proc/mdstat


    mdadm -D /dev/md0


    mdadm -E /dev/sd[a-f]

  • ok, but I have a Raid 5 array with 4 devices not 5!

    Code
    # cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md0 : inactive sdb[2](S)
          1953382488 blocks super 1.2

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

    Edited once, last by heinso ().

  • If it was harmed additionally by my risky mdadm command action I can not say, cause i cannot access or see the missing array since yesterday evening.

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

  • What have you done to investigate the missing fifth disk? Have you assumed it has totally failed? Was it a 5th 2TB WD RED?


    Have you been regularly checking the disk's S.M.A.R.T data?


    The output of mdadm examine ( mdamd -E) shows at some stage two out 5 disks dropped out of the array. In this situation, I'd delete the array, wipe the disks , re-build the array, create a fresh filesystem on the array and restore data from your backup. The question is whether there would be room for all your data on a 4 disk RAID5.

  • smart is ok for all 4 WD drives in the array. It is a 4 drive Raid5 array not 5 drives. A Short "SMART Self Test" is sheduled every friday

    the array is /dev/sda, ...sdb, sdd and sdf. (/dev/sde ist the external USB BU)


    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

  • i got two warning emails yesterday when it happened:

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

  • maybe i did a mistake in by typing 5 insted of 4 at "devices=" and now there is a wrong deviec ammount info in the system...

    Code
    root@omv:~# mdadm --create /dev/md0 --assume-clean --level=5 --verbose --raid-devices=5 missing /dev/sdd /dev/sdb /dev/sda /dev/sdf

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

  • Whatever number of disks were in your array, your notifications in #7 above shows you lost two disk out of a RAID5 array. This, as I said above, is shown in the mdadm exmaine output. Hence your array is dead.

  • heinso

    Added the Label resolved
  • what a .... ! I have one HDD as cold standby. But as You say, if two die at the same time there is all gone with the wind.

    So I have buy at least one new HDD.


    THX a lot for Your express support

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

    Edited once, last by heinso ().

  • what a .... ! I have one HDD as cold standby. But as You say, if two die at the same time there is all gone with the wind.

    So I habe buy at least one new HDD.


    THX a lot for Your express support


    Disks can get kicked out of arrays for a variety or reasons, both if hardware is failing or failed and power glitches, etc. So the question is what state are those two dropped disk in, and can they be re-used? It's worth reviewing the disks "power on hours" before deciding if they should all be retired and what do next.

  • 53811 --- Power-on Hours (the are all the same age and use in the array) This means about 6 Years.

    In my Nas DocumentationI found that I first used them from Oct. 2018 on in a Qnap... So the are old donkeys.

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

  • Well you might just get another 10K hrs out those drives, but beyond that it's marginal. As you've got to delete the raid and wipe the individual drives. I do a badblocks test on each one before re-using it. It takes time, but you can start the tests in parallel if you make use of tmux. If the disks pass the badblocks test then I'd consider using them but plan to replace them and give thought to how and what size disks you need. You may not need/want to continuing used MD RAID. I'd also give thought to whether its worth increasing your full/incremental backup frequency.

  • Ok i killed all shares and links to the array and finally was able to remove the missing FS.

    After taht I (quick) wiped the disks and created a new Raid 5 Array out of them. Does it need about 4 to5 hrs to create this array out of 4 2TB drives or is OMV about to recover the old cntents?


    Code
    clean, resyncing (31.7% (620666996/1953382400) finish=174.6min speed=127202K/sec)

    When it's completely created, what is the best way to check the entire array by SMART before working again with it?

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

  • Yes to first question. Old filesystem is gone. But my advice as in #13 is to run a badblocks test o each individual before re-using them in an array, otherwise you're simply relying on the S.MA.R.T tests on each individual physical drive -- S.M.AR.T does not operate on the RAID device itself. A RAID internal consistency test, which is something else, should auto run on the 1st Sunday of every month unless you schedule it yourself. Once the rsync ends successfully just proceed to creating a new filesystem on it.

  • the new array has got the same UUID as the old one and seams to be in trouble right from the beginning.

    This is the Error notification popping up when i try to mount the array as a active File System:

    Code
    ID: mount_filesystem_mountpoint_914b82fe-da2f-4072-a88a-e32cc58dafd5
        Function: mount.mounted
            Name: /srv/dev-disk-by-uuid-6c116a79-c72e-455e-afff-e5ee658ea46d
          Result: False
         Comment: mount: /srv/dev-disk-by-uuid-6c116a79-c72e-455e-afff-e5ee658ea46d: mount(2) system call failed: Structure needs cleaning.
                         dmesg(1) may have more information after failed mount system call.
         Started: 19:55:33.192892
        Duration: 69.977 ms
         Changes:

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

  • It looks like the ext4 FS is now on the array (buildig it needed about 5hrs)


    But the pending config changes run into an error

  • Ok i killed all shares and links to the array and finally was able to remove the missing FS.

    After taht I (quick) wiped the disks and created a new Raid 5 Array out of them. Does it need about 4 to5 hrs to create this array out of 4 2TB drives or is OMV about to recover the old cntents?


    Code
    clean, resyncing (31.7% (620666996/1953382400) finish=174.6min speed=127202K/sec)

    When it's completely created, what is the best way to check the entire array by SMART before working again with it?

    Your problem started here. You needed to remove the "missing" FS AND properly "delete" the old RAID from the OMV system before creating a new one and not simply wipe the drives, which need at least 25% of a full wipe not a quick wipe. This is where using the CLI can leave your OMV system in an inconsistent state. Was the old "/dev/md0" still visible in the WebUI or not at this stage? If not, then I might have expected you to ask how to do this if "cat /proc/mdstat" show the presence on an inactive array on the system but was not shown in the WebUI.


    When you say "It looks like the ext4 FS is now on the array (buildig it needed about 5hrs)" there can be no filesystem on the array until re-build is complete and the change is applied via the WebUI without error. Only then, as a separate action you can create and mount a new filesystem selecting the MD device.


    Your screenshots all focus on the MD RAID Webui page, but appear to relate to file system actions. So I can only ask again, after the MD RAID re-uild you should need to apply the change. Did you apply the change and was it successful? Did you then progress to trying to create filesystem on this newly built array?

  • Sorry for not showing up since Dec 20th, but I was at my Mother's house helping a lot.. with no access to my NAS


    In the meantime I received a heap of HDD error mails...

    Any how, now it looks like my 4 WD reds are all gone. It looks like powered of, so they all copmletely diapeared from the system.

    I'll try to restart them with another powersuply.

    heinso


    Currently 8.1.1-1 (Synchrony), 64bit, on my ASRock Celeron J4105 NAS Build with a conventional 400W PSU, Raid 5 Array: 4x WD Red 2TB, 256 GB NVMe SSD on PCIe V1 as the system drive, 2x to SATA III-Adapter. 2x4GB RAM SO-DIMM 2400 and a 3,5", 6TB BU-HDD on the same SATA controller via PCIe as a full BU for the Raid Array . No Backup -> No Mercy

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!