raid5 disk failure.

  • Greetings all,
    I built an OMV4 box strictly for test purposes. No important data is on this machine.
    I threw in 5 hard drives of various sizes that I had laying around.
    I installed the OS on one and made a raid5 array with the other 4. Created a file system, some folders, and threw some test files on there.
    I then powered down the machine and unplugged one drive from the raid array to simulate a failure.
    Boot back up and the entire raid array is gone. Plug drive back in and the array comes back.
    My question is - Is this the expected behavior?
    Second question - Would I or should I need to recover the array somehow from the command line?

  • Is this the expected behavior?

    Sorry, I can't answer this since of course I do NOT use RAID-5 any more for a decade or even longer (single redundancy and large disks --> useless)


    IMO you should ask yourself why you're playing RAID and if the answer is data protection immediately stop and start to think about backup (that's data protection) and not availability or 'business continuity' (that's all what RAID is about)

    • Offizieller Beitrag

    My question is - Is this the expected behavior?

    Yes, software raid is not hot swappable/pluggable


    Second question - Would I or should I need to recover the array somehow from the command line?

    Yes, software raid uses mdadm to remove a drive it has to failed first, then removed, a new drive is then added and the raid will resync all done from the cli.

  • Sorry, I can't answer this since of course I do NOT use RAID-5 any more for a decade or even longer (single redundancy and large disks --> useless)
    IMO you should ask yourself why you're playing RAID and if the answer is data protection immediately stop and start to think about backup (that's data protection) and not availability or 'business continuity' (that's all what RAID is about)

    I don't think we need to jump on him too bad since he's just testing it and hasn't put any real data on it. It sound's like he just wants to get familiar with the raid behavior on OMV.
    That said, I wonder if we need a permanent banner on the front page: "Raid is not a backup."

  • can you give us output of cat /proc/mdstat ?

    Code
    root@openmediavault4:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [ra                                                                                                                     id10]
    md0 : inactive sdb[0](S) sdd[3](S) sdc[2](S)
          556247409 blocks super 1.2
    
    
    unused devices: <none>
  • I don't think we need to jump on him too bad since he's just testing it and hasn't put any real data on it. It sound's like he just wants to get familiar with the raid behavior on OMV.That said, I wonder if we need a permanent banner on the front page: "Raid is not a backup."

    Thank you jollyrogr.

  • Sorry, I can't answer this since of course I do NOT use RAID-5 any more for a decade or even longer (single redundancy and large disks --> useless)
    IMO you should ask yourself why you're playing RAID and if the answer is data protection immediately stop and start to think about backup (that's data protection) and not availability or 'business continuity' (that's all what RAID is about)

    The goal here was simply to see if I could create a RAID-5 array and recover from a failed disk.
    I was hoping that in the event of a disk failure, the data would remain available, while giving me an opportunity to replace the failed disk.
    What I'm seeing so far is that the entire array went down when one disk was lost.
    This has nothing to do with a backup plan which I understand should be implemented separately.

  • Code
    root@openmediavault4:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [ra                                                                                                                     id10]
    md0 : inactive sdb[0](S) sdd[3](S) sdc[2](S)
          556247409 blocks super 1.2
    
    
    unused devices: <none>

    mdadm --assemble --force /dev/md0 /dev/sda /dev/sdb /dev/sdc /dev/sdd
    try that

    • Offizieller Beitrag

    I was hoping that in the event of a disk failure, the data would remain available

    It would, because in OMV's GUI the raid would show as clean -degraded but with Raid 5 the data would still be accessible. In software raid simply pulling a drive from your system whilst shutdown, then switching it back on, will result in the raid being inaccessible. Which is exactly what you are seeing.


    The output from your cat /proc/mdstat shows the raid /dev/md0 as there, but inactive and all the drives that the raid has found have been marked as spare denoted by the (S)


    I hope that makes sense, it's probably as clear as mud, if you want to simulate a drive failure then you have to mark a drive as failed, but to do that the raid has to be stopped and I think unmounted.

  • In software raid simply pulling a drive from your system whilst shutdown, then switching it back on, will result in the raid being inaccessible

    Seriously? Is this expected behavior with mdraid's RAID-5 mode? Honest question since I don't know (and would've not expected such a behavior rendering the R in RAID somewhat useless)

    • Offizieller Beitrag

    Seriously? Is this expected behavior with mdraid's RAID-5 mode?

    With one drive missing, it should start degraded.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

    • Offizieller Beitrag

    With one drive missing, it should start degraded.

    I would agree, but I tried this when I first started using software raid as a test when I set up an iscsi backup in school, the output was the same inactive. The idea behind it was to utilise the previous server which had been replaced.

    • Offizieller Beitrag

    I would agree, but I tried this when I first started using software raid as a test when I set up an iscsi backup in school, the output was the same inactive. The idea behind it was to utilise the previous server which had been replaced.

    I just tested on a VM. Wiping the drive to simulate it being missing allowed the array to be assembled. But removing the drive caused the array to be inactive. Stopping the inactive array and assembling it allowed it to be started degraded.


    While it might be nice if it assembled degraded on boot (it should on subsequent reboots after manually assembling it once), I think a lot of people would never notice (especially if they have notifications turned off) leaving them in a bad position if another drive failed.


    When I removed the drive while the system was running, the array stayed active but dropped to degraded. This is what mdadm is meant to protect against.


    In my opinion, if you are turning a system with raid off (or sleep) all the time (or even rebooting a lot), then you don't need redundancy enough to need raid.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

    • Offizieller Beitrag

    Stopping the inactive array and assembling it allowed it to be started degraded.

    Exactly, but to the average home user they believe that raid is the option to have, but when something like that happens (which can also happen from a power outage) they think they have lost everything and the issue is with the o/s.

    • Offizieller Beitrag

    Exactly, but to the average home user they believe that raid is the option to have, but when something like that happens (which can also happen from a power outage) they think they have lost everything and the issue is with the o/s.

    But the home user is exactly who I wouldn't want it to auto-assemble for. They are typically using junk hardware and don't have notifications working. If they lost a second drive, they would be really mad that their data was gone.


    And if someone is using raid without a ups or without the ups configuring their system to shutdown when the batter is low, then they should never use raid.

    omv 7.0.4-2 sandworm | 64 bit | 6.5 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.10 | compose 7.1.2 | k8s 7.0-6 | cputemp 7.0 | mergerfs 7.0.3


    omv-extras.org plugins source code and issue tracker - github


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Here is what I have found so far which is basically confirming what others have posted above.
    4 drives in RAID-5 array = all is good
    power down box and unplug one drive = Raid is gone from GUI and 3 drives listed as spare from command line.
    power down box, plug 4th drive back in = Array comes back, all is good.
    Unplug sata cable from drive while powered up = Array goes into degraded state in GUI.


    At this point, it refused to let me recover without adding another drive, even though I still had 3 in the array.
    I had to wipe the 4th drive which had been unplugged earlier and then the GUI happily let me add it back during the recovery.


    Next up is to try to use the command that ness1602 listed above to try and rebuild the array with 3 drives.

  • One other interesting thing I forgot to mention:
    Unplug a drive while powered down and the array just goes away in the GUI with no indication of what happened.
    navigate to OMV from a windows box looking for samba shares and it shows all the shares as being there.
    Actually click on a folder to open it and it throws an error message that says "The share is inaccessible because a device has been removed"


    For some reason, I just found that amusing.

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!