RAID 10 Filesystem Missing

  • Hi, I had some problems with RAID 10, I can't see the filesystem anymore. I shut down the server and put new SATA cables, and rebooted. Is there any hope of getting the RAID working again?


    Any help from you is appreciated.


    Thank you very much.



    Code
    ~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md0 : inactive sdc[4](S) sda[0](S) sdd[1](S) sdb[2](S)
          46875017216 blocks super 1.2
           
    unused devices: <none>
    Code
    ~# blkid
    /dev/sdb: UUID="8b767a7d-c52c-068d-c04f-1a3cfd8d4c5f" UUID_SUB="6c9c5433-6838-c39f-abfa-7807205a3238" LABEL="pandora:Raid4x12TBWdRed" TYPE="linux_raid_member"
    /dev/sdc: UUID="8b767a7d-c52c-068d-c04f-1a3cfd8d4c5f" UUID_SUB="28139ffa-50e0-60b6-54ee-d6af74613e6e" LABEL="pandora:Raid4x12TBWdRed" TYPE="linux_raid_member"
    /dev/sdd: UUID="8b767a7d-c52c-068d-c04f-1a3cfd8d4c5f" UUID_SUB="a6bb8aa8-4e9b-7f90-b105-45a9301acbce" LABEL="pandora:Raid4x12TBWdRed" TYPE="linux_raid_member"
    /dev/sda: UUID="8b767a7d-c52c-068d-c04f-1a3cfd8d4c5f" UUID_SUB="3904f2f1-fe1f-bde3-a965-d9dbe0074f66" LABEL="pandora:Raid4x12TBWdRed" TYPE="linux_raid_member"
    /dev/sde1: UUID="2218-DC43" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="09f69470-ba7b-4b6b-9456-c09f4c6ad2ee"
    /dev/sde2: UUID="87bfca96-9bee-4725-ae79-d8d7893d5a49" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="3c45a8f0-3106-4ba8-89bc-b15d22e81144"
    /dev/sde3: PARTUUID="fda4b444-cf82-4ae8-b916-01b8244acee3"
    Code
    ~# mdadm --detail --scan --verbose
    INACTIVE-ARRAY /dev/md0 num-devices=4 metadata=1.2 name=pandora:Raid4x12TBWdRed UUID=8b767a7d:c52c068d:c04f1a3c:fd8d4c5f
       devices=/dev/sda,/dev/sdb,/dev/sdc,/dev/sdd

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

  • Hello there.


    Code
    ~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md0 : inactive sdc[4](S) sda[0](S) sdd[1](S) sdb[2](S)
          46875017216 blocks super 1.2

    Somehow all your disks are marked as "spare". That is why you see for example sdc[4](S) (S) means spare.

    You can read up on that. There is a solution. If you can't figure it out, come back again :)

  • Hello there.


    Somehow all your disks are marked as "spare". That is why you see for example sdc[4](S) (S) means spare.

    You can read up on that. There is a solution. If you can't figure it out, come back again :)

    Thanks @bermuda , as you can see the Update Time and Events are the same only for 2 of the 4 disks and the Array State is AAAA only for one disk, for the others it is A.AA. So I don't know how to deal with a situation like this. So I rely on those who know more. Thanks

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

  • I forgot the command...

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

    • Official Post

    Any help from you is appreciated.

    Your array is inactive you will need to ssh into omv as root and from the cli run;


    mdadm --stop /dev/md0 wait for confirmation, then


    mdadm --assemble --force --verbose /dev/md0 /dev/sd[abcd] you can check the output by running


    cat /proc/mdstat

    Raid is not a backup! Would you go skydiving without a parachute?


    OMV 7x amd64 running on an HP N54L Microserver

  • Thank you @geave, this is the output:

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

  • Why was the /dev/sdd disk not added?

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

  • Thank you for your support.

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

    • Official Post

    Why was the /dev/sdd disk not added

    TBH I've never really understood the error (possibly out of date) I've always assumed that some sort of error occurred but mdadm doesn't actually remove the drive from the array.


    Do one thing at a time, the output shows the array as active (auto-read-only) but with only 3 drives, from the cli run;


    mdadm --readwrite /dev/md0 hopefully that will correct the (auto-read-only) cat /proc/mdstat should confirm that

  • TBH I've never really understood the error (possibly out of date) I've always assumed that some sort of error occurred but mdadm doesn't actually remove the drive from the array.


    Do one thing at a time, the output shows the array as active (auto-read-only) but with only 3 drives, from the cli run;


    mdadm --readwrite /dev/md0 hopefully that will correct the (auto-read-only) cat /proc/mdstat should confirm that

    Thannks geaves this is the output:

    Code
    ~# mdadm --readwrite /dev/md0
    ~# 
    ~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md0 : active raid10 sda[0] sdc[4] sdb[2]
          23437508608 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
          bitmap: 0/175 pages [0KB], 65536KB chunk
    
    unused devices: <none>

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

  • If you have a spare hdd add that to the array and replace the failed drive sdd. The errors could indicate a faulty cable.

    Usually a spare should be used to rebuild the array. The "faulty" disk you can then "low-level" format. It's tricky to get that to actually work, because of the "frozen" status of drives, but once it goes the drive will eventually correct any problems during that format procedure and can be reused as a new spare, if the next SMART check goes without any hick-up.

    Using RAID always bears the possibility to encounter problems to access the array when drives fail. But it also can prevent data loss when used with redundancy. So there are pros and cons. In your case, the loss of one drive is ok as long as no other fails during rebuild. Good luck :)

    • Official Post

    this is the output:

    Excellent, solved that problem, now for sdd


    mdadm --zero-superblock /dev/sdd not sure if there will be/is any output from this


    mdadm --add /dev/md0 /dev/sdd check the output with cat /proc/mdstat the array should be rebuilding and adding sdd back to the array


    BTW DO NOT REBOOT, SHUTDOWN OR PASS GO :)

  • Thank you geaves , this is the output:

    Now I think I understand that we can only wait and cross our fingers. :)

    Thank you

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

  • Yep, hopefully is should be OK :) that's why one should have a backup

    Thank you geaves , backing up 24TB is a lot of work but I will have to do it. In the meantime the procedure is complete. :) What do you recommend to do now?

    Code
    ~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
    md0 : active raid10 sdd[5] sda[0] sdc[4] sdb[2]
          23437508608 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
          bitmap: 0/175 pages [0KB], 65536KB chunk

    I will probably have to replace /dev/sdb asap which has some problems.

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

    • Official Post

    What do you recommend to do now

    I would suggest doing a short smart test on each of the drives and check 5, 187, 188, 197, 198 you're looking for any raw values in those drives, if there are any I would replace the drive, likewise if smart in omv's gui shows anything about bad sectors, replace the drive


    At the moment you're back up and running, as for backup start with what you don't want to lose

  • I would suggest doing a short smart test on each of the drives and check 5, 187, 188, 197, 198 you're looking for any raw values in those drives, if there are any I would replace the drive, likewise if smart in omv's gui shows anything about bad sectors, replace the drive


    At the moment you're back up and running, as for backup start with what you don't want to lose

    Thank you geaves , this is the situation of /dev/sdb (the others are ok):

    Code
      5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       10
    197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       24
    198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       157

    Is it better to proceed immediately with the disk replacement? The file system is still unmounted. What would be the correct procedure for the replacement? Thanks

    OMV 6.9.16-1 (Shaitan) - Debian 11 (Bullseye) - Linux 6.1.0-0.deb11.21-amd64

    OMV Plugins: backup 6.1.1 | compose 6.11.3 | cputemp 6.1.3 | flashmemory 6.2 | ftp 6.0.7-1 | kernel 6.4.10 | nut 6.0.7-1 | omvextrasorg 6.3.6 | resetperms 6.0.3 | sharerootfs 6.0.3-1

    ASRock J5005-ITX - 16GB DDR4 - 4x WD RED 12TB (Raid10), 2x WD RED 4TB (Raid1) [OFF], Boot from SanDisk Ultra Fit Flash Drive 32GB - Fractal Design Node 304

    • Official Post

    Is it better to proceed immediately with the disk replacement

    Yes, the standout there is 198 -> Offline uncorrectable


    The file system is still unmounted.

    ?? I would have expected that to be mounted once the array was reassembled, you could try mount -a from the cli or reboot now the array has rebuilt (reboot is what I would do)

    What would be the correct procedure for the replacement?

    I knew you would ask me that :) and TBH I don't know without running up a VM as the layout of the raid page changed slightly since I last used it. AFAIK you should be able to do all this from the GUI, so from Raid Management there should be an option/button/icon to remove a drive, this will run a script which will fail the drive and remove it from the array. The array will then show as clean/degraded.


    You then shut down remove that failed drive (this is the squeaking bum moment, double/triple check that it's the right drive), yes users have removed the wrong one :) insert the new drive, then from storage -> disks, select it and do a short/quick wipe. Then Raid Management and there should be a recover option, click on that select the new drive and click OK, the array should rebuild


    There's a lot of 'should's' in there that's just me being pessimistic

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!