OpenMediaVault5 NAS [RAID Management] "State: clean, degraded", [File Systems] "Status: Missing"

  • Hello lovely people,


    I do have a NAS that runs OpenMediaVault5 (version:5.5.13-1 (Usul)), use 11 HDD in RAID5 and that have (actually had till today) available 12TB of space. From this morning, all over the sudden, it started to show available only 65GB, which I suspect that actually is the drive on which OpenMediaVault5 is installed on. In order to not waste space, I've installed OpenMediaVault5 on an old 80GB (74.53GB real space) SATA drive.


    I do suspect that something went wrong with the RAID5 management, here's why:

    - Storage->Disks, all the disk are listed, so they are seen be the system.

    - Storage->S.M.A.R.T.->Devices, all the HDD are shown and the status is "OK" (the green dot).

    Screenshot: https://i.ibb.co/6J7Z7kd/01.png


    - Storage-> Logical Volume Management -> Physical Volumes, is only shown one device (with Available: 1.82TiB, Used: 1.82TiB), which should be more devices than this one.

    Screenshot: https://i.ibb.co/C6h6pRT/02.png


    - Storage-> Logical Volume Management -> Volume Groups, is one volume group, (Available: 9.10 TiB, Free: 0.00B, Physical volumes: [unknown]/dev/md126)


    Screenshot: https://i.ibb.co/JFs5J2n/03.png


    - Storage-> Logical Volume Management -> Logical volumes, only one, with a Capacity: 9.10TiB, Active: No. In my opinion there should be and additional Logical Volume, and beside Active, for this existing one, should be "Yes", instead of "No".

    Screenshot: ttps://i.ibb.co/ydCYbZC/04.png



    - Storage-> RAID Management, it's only one RAID device, which should be couple of them in fact. For the existing one, the "State" is "clean, degraded", Level: Capacity 1.82TiB

    Screenshot: https://i.ibb.co/YfDZ445/05.png


    - Storage-> File Systems, one out of 6 File Systems devices, have the Status: Missing, for "Mounted" column it looks like one out of 4 (the 80GB HDD) is mounted (the other 3 are not, being mentioned "No")

    Screenshot: https://i.ibb.co/5BhRnNF/06.png



    I don't know what to do at the moment in order to:

    a. diagnosed and full understand the problem and what caused it.

    b. have the data from these 11HDD safe, while I've trying to sort this out.

    c. to repair and remount the missing RAID devices without loosing any data, to get back access to all HDDs.


    for the Past month, I need to confess, I've turned OFF my OMV5 NAS, only by using the power button, long pressing it, and my fear is that on a medium to long term, this can cause some internal errors, or software error that lead to the RAID management problems.

    What do you think, what could be the cause of the problem and how can it be solved?

    Thank you in advance for your help.

  • ryecoaaron

    Hat das Thema freigeschaltet.
  • geaves

    Hat das Label OMV 5.x hinzugefügt.
    • Offizieller Beitrag

    use 11 HDD in RAID5

    it's only one RAID device, which should be couple of them in fact

    The second comment contradicts the first, to have a single Raid5 all drives must be part of that single array, however the image of raid management shows an array using /dev/sdf2 and /dev/sdj2. That suggests that partitions were used to create that array, the array was therefore not created using OMV's GUI as OMV uses the whole drive when creating raid arrays.

    for the Past month, I need to confess, I've turned OFF my OMV5 NAS, only by using the power button, long pressing it, and my fear is that on a medium to long term, this can cause some internal errors, or software error that lead to the RAID management problems.

    What do you think,

    Well at least you've noted your own error and yes it can cause corruption both to the file system and to OMV


    Follow this sticky and post each output in a code box this symbol </> on the forum menu bar, this will make it easier to read


    Is this recoverable, unknown, 11 drives is a Raid5 is not a good idea, Raid5 allows for one drive failure, if 2 drives fail the array is toast, as you have used partitions the missing partition in the image could be related to a drive in the other array.


    BTW OMV5 is EOL

  • Zitat

    The second comment contradicts the first, to have a single Raid5 all drives must be part of that single array...

    Geaves first of all, a BIG thank you for your time and effort to help me, I really appreciate it.

    Sorry for any confusion, I should've been more clear about the HDD configuration and the Raid.


    I will try to do my best to remember how they were connected almost 3 years ago, when this NAS was created.

    1. 1 x 80 GB HDD (ST3802110AS, 74.53GB) used only for OS (OMV5), stand alone.

    2. 10 HDD out of all 11 HDD were part of the same LVM (Logical Volume Manager) distributed as follows:

    a). 2 x 6TB HDD in RAID1 (ST6000VX0023-2EF110 , ST6000VN001-2BB186 with 5.46TB, each)

    b). 3 x 3TB HDD in RAID5 (HUS724030ALA640, ST3000DM008-2DM166, WDC WD30EFRX-68EUZN0, with 2.73TB, each).

    c). 2 x 2 TB HDD in RAID 1 (2 x ST2000DM001-1CH164)

    d). 3 x 500 GB HDD in RAID 5 (ST3500312CS, ST3500418AS, WDC WD5000AADS-00S9B0, 465.76 GB, each)


    Zitat


    [...] the image of raid management shows an array using /dev/sdf2 and /dev/sdj2. That suggests that partitions were used to create that array, the array was therefore not created using OMV's GUI as OMV uses the whole drive when creating raid arrays.

    You are right, one friend of mine, who's a DevOps Team Lead and a Linux expert, he discovered that OMV is based on Debian, and helped me to create the put the HDD in RAID, to create LVM and everything else need it, using PuTTY. He tried to configure the HDDs in such a way to get the most from my disks and keep my data as safe as possible. Because I was a total novice in OMV, and didn't know how to use it properly, I've accepted his help.


    Zitat

    Well at least you've noted your own error and yes it can cause corruption both to the file system and to OMV

    I've learned my lesson now.

    Zitat

    Follow this sticky and post each output in a code box this symbol </> on the forum menu bar, this will make it easier to read


    I did follow it, and here's the output:


    cat /proc/mdstat


    blkid


    fdisk -l | grep "Disk "


    cat /etc/mdadm/mdadm.conf


    mdadm --detail --scan --verbose

    Code
    root@BlackT:/# mdadm --detail --scan --verbose
    INACTIVE-ARRAY /dev/md127 num-devices=5 metadata=1.2 name=BlackT.local:r5-2tb UUID=89f0f506:35179a63:ad97870b:92e66241
       devices=/dev/sdb1,/dev/sde1,/dev/sdf1,/dev/sdi1,/dev/sdj1
    ARRAY /dev/md/BlackT.local:r5-1tb level=raid5 num-devices=3 metadata=1.2 spares=1 name=BlackT.local:r5-1tb UUID=137bec62:c8faf58a:f8793c53:72c5b132
       devices=/dev/sdb2,/dev/sdf2,/dev/sdj2
    INACTIVE-ARRAY /dev/md2 num-devices=1 metadata=1.2 name=localhost.localdomain:2 UUID=c3912ec0:9fd283df:d5ff048b:9effd01b
       devices=/dev/sdc1
    root@BlackT:/#

    I do hope that you will be able to confirm that this mess that I'm in, can be sort out without loosing data.


    Zitat

    BTW OMV5 is EOL

    I've seen now that there's a OMV6, but to be honest, I've always had this fear, that somehow in the future I will need up update my OMV to a new version and my NAS will fall apart, cause probably I will end up doing something wrong. Hopefully if I will be able to restore my NAS, I will consider to update, after educating myself about how to update.


    Thank you so much, geaves, once again (and everyone else who will go through my post) for your help.

    • Offizieller Beitrag

    You are right, one friend of mine, who's a DevOps Team Lead and a Linux expert, he discovered that OMV is based on Debian, and helped me to create the put the HDD in RAID, to create LVM and everything else need it, using PuTTY. He tried to configure the HDDs in such a way to get the most from my disks and keep my data as safe as possible. Because I was a total novice in OMV, and didn't know how to use it properly, I've accepted his help.

    Whilst your friend is obviously knowledgeable with Linux, OMV is based upon the KIS principle (Keep it Simple), all that he has done could have been completed from OMV's GUI


    Doing what he has done, may make sense to him but you have come to the forum to resolve an issue that is technically not OMV related due to the way it was set up.


    Look at the output from cat /proc/mdstat, the array md126 contains drive references /dev/sdf2 and dev/sdj2, the array md127 contains drive references /dev/sdf1 and /dev/sdj1

    So the drives /dev/sdf and dev/sdj are being used across 2 arrays due to the partitioning, something I would never do, because if there is an issue with the physical drive then it's going to reflect across both arrays


    Going back to cat /proc/mdstat md2 is referencing 1 partition, I have no idea if this is a Raid1 or Raid5 if it's a Raid5 it could be toast


    md126, this might be fixable as it's in an (auto-read-only) state as root from ssh try mdadm --readwrite /dev/md126


    md127 is inactive probably due to the powering off, try mdadm --assemble --force --verbose /dev/md127 /dev/sdb1 /dev/sdf1  dev/sdi1 /dev/sdj1 /dev/sde1 this might reassemble the array, but at this moment in time I have no idea

  • Zitat

    Look at the output from cat /proc/mdstat, the array md126 contains drive references /dev/sdf2 and dev/sdj2, the array md127 contains drive references /dev/sdf1 and /dev/sdj1


    So the drives /dev/sdf and dev/sdj are being used across 2 arrays due to the partitioning, something I would never do, because if there is an issue with the physical drive then it's going to reflect across both arrays


    Yes, you are right with the disks. I remember that we've done that (me and my friend), in order to maximize the available space, and having in mind to change that, when new larger drives it would've been bought in the future, which it happen after a while, but never got the chance to find time and change that.


    Zitat

    Going back to cat /proc/mdstat md2 is referencing 1 partition, I have no idea if this is a Raid1 or Raid5 if it's a Raid5 it could be toast


    So I think there 1 or 2 HDD in there (I can't remember, cause they were plugged in there long time ago, in the hope that I will make time to add them) that are not part of any array. Thank you for pointing that.


    Zitat

    md126, this might be fixable as it's in an (auto-read-only) state as root from ssh try mdadm --readwrite /dev/md126

    I've tried that and:

    Code
    root@BlackT:/# mdadm --readwrite /dev/md126
    mdadm: failed to set writable for /dev/md126: Device or resource busy
    Zitat

    md127 is inactive probably due to the powering off, try mdadm --assemble --force --verbose /dev/md127 /dev/sdb1 /dev/sdf1  dev/sdi1 /dev/sdj1 /dev/sde1 this might reassemble the array, but at this moment in time I have no idea

    Code
    root@BlackT:/# mdadm --assemble --force --verbose /dev/md127 /dev/sdb1 /dev/sdf1  dev/sdi1 /dev/sdj1 /dev/sde1
    mdadm: looking for devices for /dev/md127
    mdadm: /dev/sdb1 is busy - skipping
    mdadm: /dev/sdf1 is busy - skipping
    mdadm: dev/sdi1 is busy - skipping
    mdadm: /dev/sdj1 is busy - skipping
    mdadm: /dev/sde1 is busy - skipping

    It looks like I didn't had too much luck at this time. :(

    Will you have any other suggestions for me, please?

    • Offizieller Beitrag

    It looks like I didn't had too much luck at this time

    At least your learning what not to do :)


    OK the problem is the partitions across different arrays;


    Try this one first;


    mdadm --stop /dev/md127 you should get a message that the array has stopped, then;


    mdadm --assemble --force --verbose /dev/md127 /dev/sdb1 /dev/sdf1  dev/sdi1 /dev/sdj1 /dev/sde1


    if that works, then try stopping the array for md126 then run the --readwrite option in #4


    None of this I am hopeful of at this moment in time, the --readwrite option can be run on an active array without it erroring

  • Zitat

    At least your learning what not to do :)

    Yes, true. Thank you for your help, very much appreciated.


    Zitat

    mdadm --stop /dev/md127 you should get a message that the array has stopped

    It looks like it really did. I've got this in return:

    Code
    root@BlackT:/# mdadm --stop /dev/md127
    mdadm: stopped /dev/md127
    Zitat
    Code
    mdadm --assemble --force --verbose /dev/md127 /dev/sdb1 /dev/sdf1  dev/sdi1 /dev/sdj1 /dev/sde1

    After running it, I've got:


    Zitat

    if that works, then try stopping the array for md126 then run the --readwrite option in #4

    I think you mean to run the command:

    Code
    mdadm --stop /dev/md126

    , isn't? But not sure what "readwrite option in #4" means.

    Code
    root@BlackT:/# mdadm --stop /dev/md126
    mdadm: Cannot get exclusive access to /dev/md126:Perhaps a running process, mounted filesystem or active volume group?

    Is there anything else that I need to do?

  • After yesterday's post, I've turned off my NAS, and now is back on and I've run the commands you've advised (mentioning this because I don't know if this could affect the answer that I've got now and how yesterday commands are related with today's ones).


    Most of the time, when I don't need my NAS, I turning it off, to save energy and because I only use it to store data on it and I don't need it daily.


    I know OMV have an option to spin down/turn off the HDD that are not in use, but I don't know how to use it, and fear, I think, made me to stay away from things I don't know yet.

    Zitat


    OK try stopping md127, then run the readwrite command for md126, so mdadm --readwrite /dev/md126

    Code
    root@BlackT:/# mdadm --stop /dev/md127
    mdadm: Cannot get exclusive access to /dev/md127:Perhaps a running process, mounted filesystem or active volume group?
    root@BlackT:/# mdadm --readwrite /dev/md126
    mdadm: failed to set writable for /dev/md126: Device or resource busy
    • Offizieller Beitrag

    TBH there is only one option left for you to try, each version of OMV has the capability to install systemrescuecd, I think in V5 it's in omv-extras this installs and boots once to effectively a command line live cd.


    This works without the knowledge (best way to describe it) of OMV, but the mdadm commands will work, the device or resource busy is related to the cross usage of partitions, and each array pointing to shares. On a normally configured omv system this is not usually a problem but the way your system is configured it is.


    I would suggest you try that, if that doesn't work, I honestly don't know what to suggest or than contacting the person who set this up

  • Geaves, I just noticed that my NAS is showing now 9.09 TB, instead of 2TB couple of days ago and most of the data is there. I can't yet figure it out what's missing, but I still think that there are couple of TB missing. I think the whole space on it should've been 12TB, if I remember right.


    Going a bit back through first posts and your first advice, and checking commands and replies that I've got in Putty, and making a comparison, it seems that both md126 and md127 are now working (being active), compared with first time, when only md126 was active.


    Zitat

    THEN

    cat /proc/mdstat



    NOW

    md2 still stubborn, still inactive. :(


    Zitat


    TBH there is only one option left for you to try, each version of OMV has the capability to install systemrescuecd, I think in V5 it's in omv-extras this installs and boots once to effectively a command line live cd.

    I will consider this option as well. If I will face any challenges with systemrescuecd, I will comeback with questions.



    Geaves, thank you so much for your support, time and dedication to solve my problem.

  • I've got the chance to spend some time with my friend, that helped me configure my NAS at the beginnings and these are our findings about all the hard drive within my system:




    So, based on these, our understanding is that there's a problem with all md2 hdds, or a part of them like 'sdh1' and 'sdd1' for which we get "has no superblock - assembly aborted". But it looks like for 'sda' we don't get that, and we were wondering if is something that we can to with this one, in order to sort the things with the other 2? Any suggestion how can we bring back in this situation the last part of my NAS, the 'md2' ?


    At this moment it's even hard to determine if there was anything important on them or not on them (because on a first look, without testing the files, if all of them run ok or not, all the file are looking fine on my NAS, nothing missing). Is there a way to find out?


    Thank you millions time for all your help and time geaves!


    All the other HDDs it seems that are working, and at this moment, because the 2 x 6TB were empty, I use them to backup all my data from all the other NAS drives that have data on them.


    Any suggestion (based on my hard drives availability) how to reconfigure all my HDD in my NAS, to make things more reliable and avoid similar future situations?

    • Offizieller Beitrag

    Any suggestion how can we bring back in this situation the last part of my NAS, the 'md2'

    No, not possible......a Raid5 allows for one drive (partition) failure within an array, 2 and the array is toast, drive failure could be the drive going dead, intermittent i/o errors or superblock errors.

    Normally there is a backup superblock on the drive/partition which one can attempt to restore to recover the superblock, but I have found attempting with other users on here has never worked. In your case this line:->

    mdadm: No super block found on /dev/sdd1 (Expected magic a92b4efc, got 00000000)

    gives zeros that's something I've never seen before on here nor from real world experience.

    You could try searching for mdadm no super block found and you could try the various suggestions you will find but IMHO md2 is toast and any data on it has gone.

    ---------------------------------------------------------------------------------------------------------------------------------------------------

    If you have backed up your data from the other two arrays to one of your 6TB drives at least that's a step in the right direction


    The problem you have is your drive size mis match, from the information you have supplied


    3 x 500Gb

    2 x 2TB

    2 x 3TB

    2 x 6TB


    I can see now why your friend did what he did and use partitions to maximise the space available, but as you've discovered when things go wrong it's spreads across all the arrays.


    OMV uses the full block device (drive) to create an array using the GUI (OMV uses the KIS principle)


    A Raid5 is made up of a minimum of 3 drives, lets say you create a Raid5 from 2 x 2TB and 2 x 3TB, the array will be create based upon the smallest drive/s within the array, so 2 x 2TB and 2 x 3TB will give you -> 6TB data capacity. But what you could do in the future is to replace each of the 2TB drives with 3TB drives and grow the array, giving you 9TB data capacity.


    Your 6TB drives would then be used for data backup, this is a common failing amongst home users, they believe because they are using a Raid a backup is not necessary.


    As for the 500GB drives they are really not worth using, I use a 300GB laptop drive in my system but that is purely for docker, docker compose and container configs as I use zfs.


    Another option is to use mergerfs this would pool the 2 x 2TB and 2 x 3TB giving you 10TB of space, most use mergerfs with snapraid, however, snapraid was written primarily for use with large media files, whereby the data is not being changed on a regular basis. This is another learning curve and should not be used as a 'that sounds like a good idea'


    I don't know enough about mergerfs but I'll tag a couple of users who may be able to help crashtest  chente one caveat using mergerfs you will need to use one of those 500GB drives for docker, docker compose and docker configs, rather than point docker configs to a single drive within the pool. The 6TB drives would still be your backup drives and this can easily be set up using rsync.

    • Offizieller Beitrag
    • 2 pools mergerfs + rsync


    A simple setup could be to set up two pools of the same size and make copies from one to the other with rsync like this:


    pool_1 formed by:

    Data: 6TB + 6TB + 500GB + 500GB = 13TB

    pool_2 formed by:

    Data: 3TB + 3TB + 3TB + 2TB + 2TB = 13TB


    Use pool_1 for your data and use pool_2 to make regular rsync copies of existing data in pool_1


    You would have 13TB of data capacity and there is still another 500GB disk available that can be used for docker.


    • 2 pools mergerfs + SnapRaid + rsync


    If you want to complicate it a little more you can add SnapRaid to this configuration. You could do something like this:


    pool_1 formed by:

    Data: 3TB + 3TB + 2TB + 2TB + 500GB + 500GB = 11TB

    Parity: 3TB

    pool_2 formed by:

    Data: 6TB + 6TB = 12TB


    In this case you would have parity in pool_1 and a capacity of 11TB for data. You could set up regular syncs with rsync to pool_2. The last 500GB drive could be used for docker. If you are not going to use docker you can add this drive to pool_1 and you would have 11.5TB of capacity.


    Regarding regular copies with rsync it is a simple way to make backups. This can be optimized through specialized backup applications such as openmediavault-borgbackup that make versioned backups and other options in the same space.


    Regarding SnapRaid (or any type of Raid to add parity) as time goes by I become convinced that it is a waste of time. I don't think I've ever read anyone say they have corrupt file problems. That is a decision that you must make, but I can tell you that SnapRaid has a quite serious learning curve for novice users. mergerfs on the other hand has no secrets, it is very easy to configure and use.


    Here you have the documentation you might need.

    omv6:omv6_plugins:mergerfs [omv-extras.org]

    omv6:omv6_plugins:snapraid [omv-extras.org]

    • Offizieller Beitrag

    pool_1 formed by:

    Data: 3TB + 3TB + 2TB + 2TB + 500GB + 500GB = 11TB

    Parity: 3TB

    Rereading this I see that it should be optimized a little more. Probably a single parity drive would not be enough for 6 data drives, it would be advisable to have at least two parity drives.

    • Offizieller Beitrag

    Probably a single parity drive would not be enough for 6 data drives, it would be advisable to have at least two parity drives.

    I'd agree with this. At a minimum, I'd use the newest (healthiest) 3TB drive for Parity. There's no restore without it.

    one caveat using mergerfs you will need to use one of those 500GB drives for docker, docker compose and docker configs, rather than point docker configs to a single drive within the pool.

    Pointing Dockers to the mount point of a single physical drive is a work around for Dockers and SQL DB files. However, if the "Balance Tool" is used once (even accidentally) that might explode the Docker folder over all drives in the array, ruining the Docker install. It's far better to dedicate a small drive to Dockers and for other utility purposes, that is outside of a mergerfs pool. If SATA ports are lacking, these days, 256GB USB3 thumbdrives are reasonably priced.

    • Offizieller Beitrag

    However, if the "Balance Tool" is used once (even accidentally) that might explode the Docker folder over all drives in the array, ruining the Docker install.

    Ah, didn't know that, I just assumed one could just use a single drive within the pool if a single was not available, won't mention that one again :)

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!