Help - omv crashed, no more spool watched

  • hello everyone. I need help because after a strange crash I still haven’t found the cause (restarting a repetition of the server) the system disk (SSD) crashed and the server couldn’t find a boot partition. There was nothing left to do so I reinstalled OMV.


    OMV starts, my storage drives appear but I can’t create RAID, nor recover an old RAID.


    What can I do to recover my data?

  • root@openmediavault:~# cat /proc/mdstat

    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [ra id10]

    md127 : inactive sdf[5](S) sdd[2](S) sdh[6](S) sde[3](S) sdc[1](S) sdk[9](S) sdi [7](S) sdg[4](S) sdj[8](S) sdb[10](S)

    27347840880 blocks super 1.2


    unused devices: <none>

  • root@openmediavault:~# mdadm --stop /dev/md127

    mdadm: stopped /dev/md127

    root@openmediavault:~# mdadm --assemble --force --verbose /dev/md127 /dev/sd[fdheckigjb]

    mdadm: looking for devices for /dev/md127

    mdadm: /dev/sdb is identified as a member of /dev/md127, slot -1.

    mdadm: /dev/sdc is identified as a member of /dev/md127, slot 1.

    mdadm: /dev/sdd is identified as a member of /dev/md127, slot 2.

    mdadm: /dev/sde is identified as a member of /dev/md127, slot 3.

    mdadm: /dev/sdf is identified as a member of /dev/md127, slot -1.

    mdadm: /dev/sdg is identified as a member of /dev/md127, slot 4.

    mdadm: /dev/sdh is identified as a member of /dev/md127, slot 6.

    mdadm: /dev/sdi is identified as a member of /dev/md127, slot 7.

    mdadm: /dev/sdj is identified as a member of /dev/md127, slot 8.

    mdadm: /dev/sdk is identified as a member of /dev/md127, slot 9.

    mdadm: no uptodate device for slot 0 of /dev/md127

    mdadm: added /dev/sdc to /dev/md127 as 1 (possibly out of date)

    mdadm: added /dev/sde to /dev/md127 as 3

    mdadm: added /dev/sdg to /dev/md127 as 4

    mdadm: no uptodate device for slot 5 of /dev/md127

    mdadm: added /dev/sdh to /dev/md127 as 6

    mdadm: added /dev/sdi to /dev/md127 as 7

    mdadm: added /dev/sdj to /dev/md127 as 8

    mdadm: added /dev/sdk to /dev/md127 as 9

    mdadm: added /dev/sdb to /dev/md127 as -1

    mdadm: added /dev/sdf to /dev/md127 as -1

    mdadm: added /dev/sdd to /dev/md127 as 2

    mdadm: /dev/md127 assembled from 7 drives and 2 spares - not enough to start the array.


    however, there are 11 connected drives, the system SSD and 10 HDD and all are present in omv

    • Offizieller Beitrag

    10 HDD and all are present in omv

    The number of drives that OMV see's is irrelevant, it is what mdadm can locate/detect to assemble an array, from what I can see there are possibly 3 errors;

    mdadm: no uptodate device for slot 0 of /dev/md127

    mdadm: added /dev/sdc to /dev/md127 as 1 (possibly out of date)

    mdadm: no uptodate device for slot 5 of /dev/md127


    If that's the case there maybe no way that array will assemble and start


    Post the output of the following in 3 code boxes this symbol </> on the forum menu bar


    cat /proc/mdstat

    blkid

    mdadm --detail /dev/md127

  • Code
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md127 : inactive sdf[5](S) sdh[6](S) sdd[2](S) sdg[4](S) sde[3](S) sdb[10](S) sdk[9](S) sdc[1](S) sdj[8](S) sdi[7](S)
          27347840880 blocks super 1.2
    
    unused devices: <none>
    • Offizieller Beitrag

    like this?

    Yes


    Looking at the output in relation to the three errors;


    1) mdadm: added /dev/sdc to /dev/md127 as 1 (possibly out of date) this one is fixable and is the drive that is missing from the --assemble command


    2) no uptodate device for slot 0 of /dev/md127, mdadm: no uptodate device for slot 5 of /dev/md127, mdadm: added /dev/sdb to /dev/md127 as -1, mdadm: added /dev/sdf to /dev/md127 as -1, these four relate to each other, and would also explain the output from mdadm --detail last line

    mdadm --detail should display what slot each drive occupies, your output does not, so there could be a hardware problem, which could be connectivity, cable, anything


    How is the array connected? Unless one of these drives /dev/sdb or dev/sdf can be corrected, the array cannot start

  • The initial problem was random and wild restarts of the server after a change of PC box. When the server crashed to the point of not starting again, I take the opportunity to change the PC box again, I may not have reconnected the SATA sockets in the same order, this could be the cause of these 3 errors? because currently, everything is well connected.


    But B disk had been a problem for me for some time, the person I bought it from clearly ripped me off, the disk was defectous, I was often forced to format it and put it back in spare part for the RAID to repair.

    • Offizieller Beitrag

    I may not have reconnected the SATA sockets in the same order

    That doesn't matter, /dev/sdb and /dev/sdf are not being presented to mdadm as a slot/raid number, those two drives are related to slot 0 and slot 5, that's clear from the --assemble output


    At present this is non recoverable, if you look at the last line of mdadm --assemble #5 mdadm finds 7 drives and 2 as spares, the missing tenth drive is the one with the error (possibly out of date)


    You could try running mdadm --examine /dev/sdb and the same on /dev/sdf and post the output but I'm not hopeful

  • however I still have this problem of untimely rebooting. it will start again to crash even if we manage to recreate the storage

    • Offizieller Beitrag

    There is nothing I can do to attempt recovery, but I started to look back in your first post you stated'

    There was nothing left to do so I reinstalled OMV.

    Ok not a problem, but #14 where you posted the mdadm conf, I expected this to be empty, but.....

    the output is an mdadm conf from OMV5 not possible with a clean install


    I created a Raid5 on an OMV6 VM this is the mdadm conf before;

    this is the mdadm conf after;

    I also went back over the forum Raid section, you had an issue in February this year, then you were running a Raid5 with 8 drives, now you're running Raid6 with 10 drives :/

    The clean install I assume is OMV5, which is EOL, no longer supported


    So to reiterate;


    The drive 'possibly out of date' is fixable and could be added to a working array


    The two 'spares' are due to the fact that mdadm does not know what slot they are in, therefore the two drives are being marked as spares


    That's 3 drives out of 10 that mdadm cannot work with, hence the array cannot assemble, added to that /dev/sdb is a 4TB drive with bad blocks a sign that the drive is failing


    I asked in #9 how was the array connected, what I should have asked was how are the drives connected, as most m'boards have 4/6 sata ports the rest must be connected to a pcie card which I would guess is a chinese one


    The way I would proceed is too note each drive, the port it's connected too and it's drive reference in OMV, as at this moment you have no idea if this is hardware related.


    Making a copy of the various outputs from here + fdisk -l | grep "Disk " will give you information regarding each drive and where it is connected, it would also be useful to have each drives manufacturer name and serial number


    Then try the systemrescuecd (to boot once) in omv-extras and see if the array will assemble without the overhead of OMV. If this doesn't work then ->


    Armed with all the drive information try the drives in different ports, this is going to take meticulous detailing you cannot do it randomly, if one of the other drives gets tagged as a spare, but /dev/sdb or /dev/sdf is assigned a port by mdadm then this is hardware related.

    This procedure would be slow and may not even give an indication of a possible hardware issue, but it's the only option I can suggest

  • Indeed, you are right, at the beginning of the year I had problems with an HDD that was taken over by the manufacturer and replaced. at that time, I realized that I was starting to be limited in available space and I had done tests of different raid like ZFS but zfs could not accept different hard drive capabilities (at that time, I had time to make a backup of my NAS on Mega.nz)


    you are also right about the connections, 6 sata on the motherboard and 6 on a 4x PCI-express SATA card and I reinstalled my RAID6 NAS with 10 hard drives on omv5 and it worked well until a recent PC box change (taller PC box to fit the stack of 10 disks). Progressively whan a 2TB disk become defective, I replace them with 4TB disks, it’s not ideal, but I do it with the money I have...


    During the last crash, even though the system disk was no longer booting, I also reinstalled omv6. Maybe I couldn’t do it and stay on the 5th. I didn’t think it would cause any problems, so I thought the 6th should be more efficient and he wouldn’t have any trouble finding the array. I will reinstall omv5 as soon as I can borrow my daughter’s PC screen.

  • Zitat

    added to that /dev/sdb is a 4TB drive with bad blocks a sign that the drive is failing

    I know, this record has been a problem for me since the start. but the person certified me that this disc was in perfect condition, so I was looking for the cause elsewhere, especially since on the SMART test side, everything was fine


    Zitat

    The way I would proceed is too note each drive, the port it's connected too and it's drive reference in OMV, as at this moment you have no idea if this is hardware related.

    the disks are plugged in the stacking order: sata0 sda

    sata1 sdb etc...


    but I may be inverted without wanting two sata a begining. i will try this fdisk -l | grep "Disk " thanks


    • Offizieller Beitrag

    you are also right about the connections, 6 sata on the motherboard and 6 on a 4x PCI-express SATA card

    I am guessing here that the PCIe is probably the issue, there have been more than one instance on the forum


    During the last crash, even though the system disk was no longer booting, I also reinstalled omv6. Maybe I couldn’t do it and stay on the 5th.

    The output of your mdadm conf file in your #14 is OMV5 it is not OMV6 look at mine I posted from a VM.

    especially since on the SMART test side, everything was fine

    Obviously, not check out this from the New User Guide read that section


    As I have said unless you can locate why mdadm cannot find the 'slots' where /dev/sdb and /dev/sdf are connected I cannot help any further, Raid6 allows for 2 drive failures, that's not physical failure that's what mdadm cannot locate or interpret, you have 3!!

  • Well, we can close the subject, I made my grief of my 10To of lost data. I started all over again clean. Only one disk was really down, I disassembled everything and put it back together properly in a new case. no more untimely rebooting. only concern, when I ask the server to turn off, either via the web interface or by the power button, it turns on systemically. I don’t know where it comes from and it’s not great for energy savings since it’s programmed to shut off every morning at 7:00 am

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!