Cannot recover 3 disk RAID5 array with two working disks

  • Hello,


    I had a 3 Disk RAID5 array, a few weeks ago one of the disks failed and the array was happily working in a degraded state with the two remaining disks.


    Today I received a replacement disk and powered off the system. On turning the system back on my RAID array in the UI was missing. From the command line I can see the following...


    Bash
    $ cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md0 : inactive sdc[1](S) sdb[0](S)
    7813772976 blocks super 1.2
    
    
    unused devices: <none>


    Raid Level should be 5, not 0.


    Both disks seem okay, but for some reason, sdc is marked as spare.



    I have tried to reassemble the array but have no joy:

    Code
    $ sudo /sbin/mdadm /dev/md0 --assemble /dev/sd[bc]
    mdadm: /dev/sdb is busy - skipping
    mdadm: /dev/sdc is busy - skipping
    $ sudo /sbin/mdadm --stop /dev/md0
    mdadm: stopped /dev/md0
    $ sudo /sbin/mdadm /dev/md0 --assemble --force /dev/sd[bc]
    mdadm: /dev/md0 assembled from 1 drive and 1 spare - not enough to start the array.


    I am assuming the issue is the array being incorrectly marked as 0 or that sdc is marked as Spare and not Active - both I cannot find any way to resolve.


    Im not sure how best to proceed, I'm at the point of making a fresh array and recovering from backups - but as mdadm --examine /dev/sd[cb] seems to report both disks are fine I feel I should be able to recover the array in a degraded state and then add the replacement drive (sdd).


    Any guidance would be much appreciated, thank you!

  • chente

    Hat das Thema freigeschaltet.
  • I was able to bring back the array with the following commands:

    Bash
    $ sudo /sbin/mdadm --create --level=5 --layout=ls --chunk=512 --raid-devices=3 --assume-clean /dev/md0 /dev/sdb /dev/sdc missing
    mdadm: /dev/sdb appears to be part of a raid array:
           level=raid5 devices=3 ctime=Thu Sep 30 21:10:09 2021
    mdadm: /dev/sdc appears to be part of a raid array:
           level=raid5 devices=3 ctime=Thu Sep 30 21:10:09 2021
    Continue creating array? y
    mdadm: Defaulting to version 1.2 metadata
    mdadm: array /dev/md0 started.

    Data was also recovered. I got some direction from the following post, while the answer did not work for me, the second answer gave details on how to create a new array while trying to keep data, output from mdadm --examine /dev/sd[cb] gave me the details I needed.


    I then readded the new drive sdd via the UI and started recovering the array from its degraded state.


    I think the issue was due to sdc being marked as a spare, I don't think my actions were the best and there must be a less risky option so will leave this unresolved to see if anyone has better suggestions for next time.

    • Offizieller Beitrag

    I think the issue was due to sdc being marked as a spare

    Yes it was, this has come up on here before with no reason why and I've had no real world experience of this


    I don't think my actions were the best

    You were very, very lucky, the --assume-clean switch is an absolute last resort, this either works or destroys your data

  • geaves yes it was, next step was to restore from backup so happy to try.


    Are you aware of any way to change the drive from spare to active?


    I believe the trigger for the issue was due to a power disruption to this drive, when I added the new drive it was not detected and I found some lose power cables to both these drives where causing the issue.

    • Offizieller Beitrag

    Are you aware of any way to change the drive from spare to active

    No none that I have found

    I believe the trigger for the issue was due to a power disruption to this drive,

    A lot of mdadm raid issues appear to stem from power issues, either an actual power loss or a hardware power loss, something must trigger mdadm into marking a drive as a spare when in actual fact the drive was active before the power failure/disruption.


    What you didn't post and what I forgot to ask for was in mdadm conf file (cat /etc/mdadm/mdadm.conf) that file should contain information regarding the array's setup. Did the information within that file under -> definition of arrays, contain --spares=1, if it did simply editing that file (which is not recommended) may have corrected the error.


    I've never experienced this in a real world scenario, because in commercial use we had a ups + backups, at home I would run backups/rsync every night, so a worst case situation I would restore from back up or even redeploy, as OMV can be set up and running in a day, it's the restore process that takes the time.


    BTW you don't have use /sbin to run mdadm commands :)

  • Zitat

    Did the information within that file under -> definition of arrays, contain --spares=1, if it did simply editing that file (which is not recommended) may have corrected the error.

    I did check this file and remembered it all seems fine, I cannot remember if it noted there was a spare or not, thanks for this pointer.


    Zitat

    because in commercial use we had a ups + backups, at home I would run backups/rsync every night, so a worst case situation I would restore from back up or even redeploy

    My only trouble was to save on backup costs I did not backup my media (TV, Film) collection as this was not important. Backups are nightly and I run on UPS, my mistake is poking around the inside while the server was powered on! :whistling:


    Zitat

    BTW you don't have use /sbin to run mdadm commands :)

    For some reason, I have to reference the executable directly on my machine, no idea why, it is easier to do it than find out why I can't call the program directly in my session ;)


    Thanks for feedback, shame mdadm has no way to update the device role

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!