File System Missing after fault

  • A fault in my OMV NAS with 2 x RAID 6 file systems caused one file system to be missing. Replacing the HBA and installing 5.5.11-1 does not recover the missing file system.

    The outputs below show that the RAID should exist and would appreciate guidance as to how to progress from here to recover it and mount it

    The 6 drives (sda to sdf) are present as raid members per the following

    Code
    sudo blkid
    /dev/sdg1: UUID="ebf605c7-8097-42be-8ceb-25d8d4108324" TYPE="ext4" PARTUUID="3c16e0a9-01"
    /dev/sdg5: UUID="980c955e-7a33-403b-83ff-e00cb853a637" TYPE="swap" PARTUUID="3c16e0a9-05"
    /dev/sdb: UUID="89becd69-b5a4-761c-30e7-7a30c3b6a22f" UUID_SUB="701a016a-0adf-8e41-df8e-2c697e035299" LABEL="OMV136.local:OMV136R629" TYPE="linux_raid_member"
    /dev/sdd: UUID="89becd69-b5a4-761c-30e7-7a30c3b6a22f" UUID_SUB="ebfb2860-5018-8314-56bb-becfbc323cbf" LABEL="OMV136.local:OMV136R629" TYPE="linux_raid_member"
    /dev/sde: UUID="89becd69-b5a4-761c-30e7-7a30c3b6a22f" UUID_SUB="5a652965-c823-284c-6a17-2b0611f927a8" LABEL="OMV136.local:OMV136R629" TYPE="linux_raid_member"
    /dev/sda: UUID="89becd69-b5a4-761c-30e7-7a30c3b6a22f" UUID_SUB="e9a610a9-7be9-4f82-0273-9527f3da4876" LABEL="OMV136.local:OMV136R629" TYPE="linux_raid_member"
    /dev/sdc: UUID="89becd69-b5a4-761c-30e7-7a30c3b6a22f" UUID_SUB="ce93d5d3-5dfc-a03f-583f-8f3429c8b676" LABEL="OMV136.local:OMV136R629" TYPE="linux_raid_member"
    /dev/sdf: UUID="89becd69-b5a4-761c-30e7-7a30c3b6a22f" UUID_SUB="bd7b4a88-2e2c-3e4f-90e3-bae51d5392c4" LABEL="OMV136.local:OMV136R629" TYPE="linux_raid_member"

    The array appears to exist per the following

    Code
    sudo mdadm --examine --scan
    ARRAY /dev/md/OMV136R629 metadata=1.2 UUID=89becd69:b5a4761c:30e77a30:c3b6a22f name=OMV136.local:OMV136R629

    Attempting to reassemble the array provides the following

    Examination of each drive provides the following. Each device contains the information for the Raid Level and the number of Devices.

    But the array state information shows AAAAAA for sda, .AAAAA for sdc, .A.AAA for sdb, sdd, sde, and sdf, and I do not know what the missing flag (dot) means

    Help would be much appreciated.

  • Thank you for the reply. OMV136R629 was the name/label that I had given to the file system


    Output of cat /proc/mdstat follows

    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]

    md127 : inactive sdb[1](S) sde[4](S) sdd[3](S) sdc[2](S) sdf[5](S) sda[0](S)

    46883366928 blocks super 1.2

  • You cannot (re)asemble the array as long as the drives are busy (part of the array)

    You have to stop and (re)assemble the array.


    mdadm --stop /dev/md127

    mdadm --assemble --verbose /dev/md127 /dev/sd[abcdef]


    After that, please have a look at the ouput of "cat /proc/mdstat" and post it here.


    Cheers,

    Thomas

  • Thank you. I have some results but no success as yet


    Stopping md127 succeeds

    Code
    ~# sudo mdadm --stop /dev/md127
    mdadm: stopped /dev/md127


    Assembling the array shows the following


    Running the mdadm examine scan command again shows the identity of the array as /dev/md/OMV136R629

    Code
    ~# sudo mdadm --examine --scan
    ARRAY /dev/md/OMV136R629 metadata=1.2 UUID=89becd69:b5a4761c:30e77a30:c3b6a22f name=OMV136.local:OMV136R629


    Trying an automatic assemble using mdadm assemble identified the 6 drives belonging to the array with 3 of the 6 being added


    Running the mdadm examine scan command again shows

    Code
    /dev# mdadm --examine --scan -v
    ARRAY /dev/md/OMV136R629 level=raid6 metadata=1.2 num-devices=6 UUID=89becd69:b5a4761c:30e77a30:c3b6a22f name=OMV136.local:OMV136R629
    devices=/dev/sde,/dev/sdc,/dev/sdd,/dev/sda,/dev/sdf,/dev/sdb


    In assembling the array, 3 drives are identified as "possibly out of date". What does this mean. Examining the results of the command

    sudo mdadm --examine /dev/sd* -v

    and extracting selected output lines gives the following. Devices sdb, sdd, sde each have the same number of events whereas sda, sdc, sdf have a different event count. What is this and does it have an impact upon automatic array assembly?


    Your further assistance would be much appreciated. Thank you

  • Taken from: https://raid.wiki.kernel.org/index.php/RAID_Recovery


    If your array won't assemble automatically, the first thing to check the reason for this

    (look into the logs using "dmesg" or check the log files).

    It's a frequent failure scenario that the event count of the devices do not match, which means mdadm won't assemble the array automatically.

    The event count is increased when writes are done to an array, so if the event count differs by less than 50,

    then the information on the drive is probably still ok.


    The higher difference, the more writes have been done to the filesystem and

    the greater the risk that the filesystem will have changed a lot since the

    differing event count drive was last in the array, and the higher the risk that your data is in jeopardy.


    In your case:

    /dev/sdb 15598

    /dev/sdd: 15598

    /dev/sde: 15598

    /dev/sda 15568

    /dev/sdf: 15572

    /dev/sdc: 15569


    So you might try to add the option "--force" to the mdadm command.


    mdadm --assemble --verbose --force /dev/md127 /dev/sd[abcdef]


    Good luck,

    Thomas

  • Thank you. I have proceed part way and assembled a degraded array that started with 5 of the 6 drives and then promptly failed one of the drives leaving 4 of the 6 drives for the RAID 6 array. The results are included below.

    As I have limited openmediavault and linux experience and am nervous about the next steps so as to not lose my data, I would appreciate your advice on the correct way to complete the array with initially 5 drives and them finally 6 drives.


    Forcing assembly which started with 5 of the 6 drives

    The log showing failure of one of the 5 devices and leaving 4 of the 6 devices


    I interpret this as device sdc having an uncorrectable read error so the drive is may be faulty and at the very least need to be tested to confirm this. The SMART devices page in the openmediavault web control panel does not show any SMART error status for sdc. (green dot)


    The output of cat /proc/mdstat shows the array with devices sdb, sdd, sde, and sdf, and device sdc in fault. Device sda is excluded.

    Code
    ~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md127 : active raid6 sdb[1] sde[5] sdf[4] sdd[3] sdc[2](F)
    31255576576 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/4] [_U_UUU]
    bitmap: 3/59 pages [12KB], 65536KB chunk
    unused devices: <none>


    The openmediavault web control panel shows /dev/md127 as a RAID 6 array that is clean but degraded

    After mounting the array, the storage, file systems page shows data. As the operating system is a fresh install, i will need to set up users and shares to access the data to determine if it is accessible.

    Device sda has not been included in the array.


    QUESTION

    So that I do not corrupt the now assembled array, what is the correct way to add device sda to achieve a 5 drive array.

    Do I need to wipe the drive and then attach it and how do I add the drive to the array?


    Thank you for getting me this far.

  • It appeared that may have achieved some success as the array was rebuilding with 5 of the 6 drives.

    But now this has failed.


    The array was operational with 4 of the 6 devices sdb, sdd, sde, and sdf.


    I used sudo smartctl -i /dev/sda and sudo smartctl -i /dev/sdc to check that these devices were not in the array against serial numbers shown within the browser control panel and then ran smartctl short test short tests on each of sda and sdc with both showing zero errors.


    I used wipefs -a /dev/sda and wipefs -a /dev/sdc to clear the drives.


    I used mdadm /dev/md127 --add /etc/sda (and the same for sdc) to add the devices and the file system is again showing that it is missing.

    I suspect that I should have used --re-add instead of --add


    Code
    ~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md127 : inactive sdc[6](S) sde[4](S) sdf[5](S) sda[7](S) sdd[3](S) sdb[1](S)
    46883366928 blocks super 1.2
    ~# mdadm --examine --scan -v
    ARRAY /dev/md/OMV136R629 level=raid6 metadata=1.2 num-devices=6 UUID=89becd69:b5a4761c:30e77a30:c3b6a22f name=OMV136.local:OMV136R629
    devices=/dev/sda,/dev/sdf,/dev/sdd,/dev/sde,/dev/sdc,/dev/sdb


    I stopped the array and forced rebuilding. It has now started with 4 drives with 1 drive rebuilding

    Querying the array shows device sda removed although the outputs above show that it is added


    The browser control panel shows the file system as being online and the RAID management page shows that it is clean, degraded, rebuilding and with 5 of the 6 drives.


    PROBLEM

    Now after 6.8%, rebuilding appears to have stalled, the RAID management page on the browser control panel only shows 4 devices sdb, sdc, sdd and sdf and the finish time has extended from 610 minutes to about 15,000 minutes. Device was the device rebuilding and device sde appears to be missing

    The monitor connected to the NAS shows scrolling data with the text md: super_written gets error =10

    The RAID and the file system are missing


    I now have the following error


    ~# mdadm --examine -v /dev/md127

    mdadm: No md superblock detected on /dev/md127.



    QUESTION

    What is occurring and what action do I now need to take to fix this?


    Thank you

  • To be honest, I don't know what's going on.

    There seems to be no logic behind the different error messages.

    From the kernel log I would suggest /dev/sdc to be damaged or have another issue.

    In case you have a backup, I would shutdown, check/replace the data cables,

    wipefs -a the drives and rebuild the array.


    In case you don't have a backup I would try (re)assemble the array,

    mount it and take a backup as long as is possible.


    Good luck,

    Thomas

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!