Help - omv crashed, no more spool watched

fredlepp · 3. Juni 2023

hello everyone. I need help because after a strange crash I still haven’t found the cause (restarting a repetition of the server) the system disk (SSD) crashed and the server couldn’t find a boot partition. There was nothing left to do so I reinstalled OMV.

OMV starts, my storage drives appear but I can’t create RAID, nor recover an old RAID.

What can I do to recover my data?

geaves · 3. Juni 2023

Zitat von fredlepp

OMV starts, my storage drives appear but I can’t create RAID, nor recover an old RAID

ssh into OMV as root and post the output of the following cat /proc/mdstat

fredlepp · 3. Juni 2023

root@openmediavault:~# cat /proc/mdstat

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [ra id10]

md127 : inactive sdf[5](S) sdd[2](S) sdh[6](S) sde[3](S) sdc[1](S) sdk[9](S) sdi [7](S) sdg[4](S) sdj[8](S) sdb[10](S)

27347840880 blocks super 1.2

unused devices: <none>

geaves · 3. Juni 2023

The array is inactive; mdadm --stop /dev/md127

mdadm --assemble --force --verbose /dev/md127 /dev/sd[fdheckigjb]

fredlepp · 3. Juni 2023

root@openmediavault:~# mdadm --stop /dev/md127

mdadm: stopped /dev/md127

root@openmediavault:~# mdadm --assemble --force --verbose /dev/md127 /dev/sd[fdheckigjb]

mdadm: looking for devices for /dev/md127

mdadm: /dev/sdb is identified as a member of /dev/md127, slot -1.

mdadm: /dev/sdc is identified as a member of /dev/md127, slot 1.

mdadm: /dev/sdd is identified as a member of /dev/md127, slot 2.

mdadm: /dev/sde is identified as a member of /dev/md127, slot 3.

mdadm: /dev/sdf is identified as a member of /dev/md127, slot -1.

mdadm: /dev/sdg is identified as a member of /dev/md127, slot 4.

mdadm: /dev/sdh is identified as a member of /dev/md127, slot 6.

mdadm: /dev/sdi is identified as a member of /dev/md127, slot 7.

mdadm: /dev/sdj is identified as a member of /dev/md127, slot 8.

mdadm: /dev/sdk is identified as a member of /dev/md127, slot 9.

mdadm: no uptodate device for slot 0 of /dev/md127

mdadm: added /dev/sdc to /dev/md127 as 1 (possibly out of date)

mdadm: added /dev/sde to /dev/md127 as 3

mdadm: added /dev/sdg to /dev/md127 as 4

mdadm: no uptodate device for slot 5 of /dev/md127

mdadm: added /dev/sdh to /dev/md127 as 6

mdadm: added /dev/sdi to /dev/md127 as 7

mdadm: added /dev/sdj to /dev/md127 as 8

mdadm: added /dev/sdk to /dev/md127 as 9

mdadm: added /dev/sdb to /dev/md127 as -1

mdadm: added /dev/sdf to /dev/md127 as -1

mdadm: added /dev/sdd to /dev/md127 as 2

mdadm: /dev/md127 assembled from 7 drives and 2 spares - not enough to start the array.

however, there are 11 connected drives, the system SSD and 10 HDD and all are present in omv

geaves · 3. Juni 2023

Zitat von fredlepp

10 HDD and all are present in omv

The number of drives that OMV see's is irrelevant, it is what mdadm can locate/detect to assemble an array, from what I can see there are possibly 3 errors;

mdadm: no uptodate device for slot 0 of /dev/md127

mdadm: added /dev/sdc to /dev/md127 as 1 (possibly out of date)

mdadm: no uptodate device for slot 5 of /dev/md127

If that's the case there maybe no way that array will assemble and start

Post the output of the following in 3 code boxes this symbol </> on the forum menu bar

cat /proc/mdstat

blkid

mdadm --detail /dev/md127

fredlepp · 3. Juni 2023

Code

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : inactive sdf[5](S) sdh[6](S) sdd[2](S) sdg[4](S) sde[3](S) sdb[10](S) sdk[9](S) sdc[1](S) sdj[8](S) sdi[7](S)
      27347840880 blocks super 1.2

unused devices: <none>

Code

/dev/sdc: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="4715633d-9d8d-0db3-c28b-f792281f6f79" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"
/dev/sdd: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="6caccab5-034c-bdab-f7d2-0bc3a639da30" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"
/dev/sdb: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="792933e3-5f59-113f-bf4a-27c6f116b29e" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"
/dev/sda1: UUID="E889-8B4E" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="57bdac3e-d116-45ff-95e9-7121b790cd89"
/dev/sda2: UUID="6b41b806-00ce-40b4-9830-a7007805bd7a" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="64e9bea0-8dab-4e34-bf86-44f2b416b42a"
/dev/sda3: UUID="600cf116-5d58-47cb-909d-4ef44468b922" TYPE="swap" PARTUUID="c22199a8-590e-4743-8abb-a024e9f58dda"
/dev/sdf: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="7517c9aa-f981-9bfe-d2e6-d42d5cce5cd0" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"
/dev/sdh: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="46d04c8e-4bba-f04e-b198-57258cb9dbcb" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"
/dev/sdi: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="a574e35c-e80b-59f4-6d75-e24380d6c373" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"
/dev/sdj: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="8824b2bc-36f8-7cd8-b52c-ca0b71b698ab" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"
/dev/sde: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="8adc44ee-9083-f5ab-ae07-94278af5f692" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"
/dev/sdg: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="df513cb0-01d5-856f-ea68-1fe91ef15cd6" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"
/dev/sdk: UUID="14e7a059-e5a1-d890-b802-e75a8b0c5514" UUID_SUB="bb76c220-3fe0-d3a6-8fb7-29755f0e4873" LABEL="openmediavault:Raid6" TYPE="linux_raid_member"

Alles anzeigen

Code

/dev/md127:
           Version : 1.2
        Raid Level : raid0
     Total Devices : 10
       Persistence : Superblock is persistent

             State : inactive
   Working Devices : 10

              Name : openmediavault:Raid6  (local to host openmediavault)
              UUID : 14e7a059:e5a1d890:b802e75a:8b0c5514
            Events : 237250

    Number   Major   Minor   RaidDevice

       -       8       64        -        /dev/sde
       -       8       32        -        /dev/sdc
       -       8      144        -        /dev/sdj
       -       8      112        -        /dev/sdh
       -       8       80        -        /dev/sdf
       -       8       48        -        /dev/sdd
       -       8       16        -        /dev/sdb
       -       8      160        -        /dev/sdk
       -       8      128        -        /dev/sdi
       -       8       96        -        /dev/sdg

Alles anzeigen

fredlepp · 3. Juni 2023

like this?

geaves · 3. Juni 2023

Zitat von fredlepp

like this?

Yes

Looking at the output in relation to the three errors;

1) mdadm: added /dev/sdc to /dev/md127 as 1 (possibly out of date) this one is fixable and is the drive that is missing from the --assemble command

2) no uptodate device for slot 0 of /dev/md127, mdadm: no uptodate device for slot 5 of /dev/md127, mdadm: added /dev/sdb to /dev/md127 as -1, mdadm: added /dev/sdf to /dev/md127 as -1, these four relate to each other, and would also explain the output from mdadm --detail last line

mdadm --detail should display what slot each drive occupies, your output does not, so there could be a hardware problem, which could be connectivity, cable, anything

How is the array connected? Unless one of these drives /dev/sdb or dev/sdf can be corrected, the array cannot start

fredlepp · 3. Juni 2023

The initial problem was random and wild restarts of the server after a change of PC box. When the server crashed to the point of not starting again, I take the opportunity to change the PC box again, I may not have reconnected the SATA sockets in the same order, this could be the cause of these 3 errors? because currently, everything is well connected.

But B disk had been a problem for me for some time, the person I bought it from clearly ripped me off, the disk was defectous, I was often forced to format it and put it back in spare part for the RAID to repair.

geaves · 3. Juni 2023

Zitat von fredlepp

I may not have reconnected the SATA sockets in the same order

That doesn't matter, /dev/sdb and /dev/sdf are not being presented to mdadm as a slot/raid number, those two drives are related to slot 0 and slot 5, that's clear from the --assemble output

At present this is non recoverable, if you look at the last line of mdadm --assemble #5 mdadm finds 7 drives and 2 as spares, the missing tenth drive is the one with the error (possibly out of date)

You could try running mdadm --examine /dev/sdb and the same on /dev/sdf and post the output but I'm not hopeful

fredlepp · 3. Juni 2023

Code

 mdadm --examine /dev/sdb
/dev/sdb:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x9
     Array UUID : 14e7a059:e5a1d890:b802e75a:8b0c5514
           Name : openmediavault:Raid6  (local to host openmediavault)
  Creation Time : Sat Apr 22 15:34:03 2023
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 7813772976 (3725.90 GiB 4000.65 GB)
     Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=3907008176 sectors
          State : clean
    Device UUID : 792933e3:5f59113f:bf4a27c6:f116b29e

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed May 31 23:04:39 2023
  Bad Block Log : 512 entries available at offset 16 sectors - bad blocks present.
       Checksum : e33a70d3 - correct
         Events : 237250

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : ..A.A.AAAA ('A' == active, '.' == missing, 'R' == replacing)

Alles anzeigen

Code

mdadm --examine /dev/sdf
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 14e7a059:e5a1d890:b802e75a:8b0c5514
           Name : openmediavault:Raid6  (local to host openmediavault)
  Creation Time : Sat Apr 22 15:34:03 2023
     Raid Level : raid6
   Raid Devices : 10

 Avail Dev Size : 3906764976 (1862.89 GiB 2000.26 GB)
     Array Size : 15627059200 (14903.13 GiB 16002.11 GB)
  Used Dev Size : 3906764800 (1862.89 GiB 2000.26 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=176 sectors
          State : clean
    Device UUID : 7517c9aa:f9819bfe:d2e6d42d:5cce5cd0

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed May 31 23:04:39 2023
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : 1ad2a4da - correct
         Events : 237250

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : ..A.A.AAAA ('A' == active, '.' == missing, 'R' == replacing)

Alles anzeigen

geaves · 3. Juni 2023

Well that confirms what we already know, it's seeing those two drives as spares and that could have something to do with 'no uptodate device' error

Post the output of cat /etc/mdadm/mdadm.conf

fredlepp · 3. Juni 2023

Code

# mdadm.conf
#
# !NB! Run update-initramfs -u after updating this file.
# !NB! This will ensure that initramfs has an uptodate copy.
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This configuration was auto-generated on Tue, 26 Apr 2022 18:15:36 +0000 by mkconf

Alles anzeigen

however I still have this problem of untimely rebooting. it will start again to crash even if we manage to recreate the storage

geaves · 4. Juni 2023

There is nothing I can do to attempt recovery, but I started to look back in your first post you stated'

Zitat von fredlepp

There was nothing left to do so I reinstalled OMV.

Ok not a problem, but #14 where you posted the mdadm conf, I expected this to be empty, but.....

Code

# mdadm.conf
#
# !NB! Run update-initramfs -u after updating this file.
# !NB! This will ensure that initramfs has an uptodate copy.
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays

# This configuration was auto-generated on Tue, 26 Apr 2022 18:15:36 +0000 by mkconf

Alles anzeigen

the output is an mdadm conf from OMV5 not possible with a clean install

I created a Raid5 on an OMV6 VM this is the mdadm conf before;

Code

# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# definitions of existing MD arrays

Alles anzeigen

this is the mdadm conf after;

Code

# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# definitions of existing MD arrays
ARRAY /dev/md0 metadata=1.2 name=omvvm6:0 UUID=0eca3ee2:97131206:a85ae8ee:c3f884a2

Alles anzeigen

I also went back over the forum Raid section, you had an issue in February this year, then you were running a Raid5 with 8 drives, now you're running Raid6 with 10 drives

The clean install I assume is OMV5, which is EOL, no longer supported

So to reiterate;

The drive 'possibly out of date' is fixable and could be added to a working array

The two 'spares' are due to the fact that mdadm does not know what slot they are in, therefore the two drives are being marked as spares

That's 3 drives out of 10 that mdadm cannot work with, hence the array cannot assemble, added to that /dev/sdb is a 4TB drive with bad blocks a sign that the drive is failing

I asked in #9 how was the array connected, what I should have asked was how are the drives connected, as most m'boards have 4/6 sata ports the rest must be connected to a pcie card which I would guess is a chinese one

The way I would proceed is too note each drive, the port it's connected too and it's drive reference in OMV, as at this moment you have no idea if this is hardware related.

Making a copy of the various outputs from here + fdisk -l | grep "Disk " will give you information regarding each drive and where it is connected, it would also be useful to have each drives manufacturer name and serial number

Then try the systemrescuecd (to boot once) in omv-extras and see if the array will assemble without the overhead of OMV. If this doesn't work then ->

Armed with all the drive information try the drives in different ports, this is going to take meticulous detailing you cannot do it randomly, if one of the other drives gets tagged as a spare, but /dev/sdb or /dev/sdf is assigned a port by mdadm then this is hardware related.

This procedure would be slow and may not even give an indication of a possible hardware issue, but it's the only option I can suggest

fredlepp · 4. Juni 2023

Indeed, you are right, at the beginning of the year I had problems with an HDD that was taken over by the manufacturer and replaced. at that time, I realized that I was starting to be limited in available space and I had done tests of different raid like ZFS but zfs could not accept different hard drive capabilities (at that time, I had time to make a backup of my NAS on Mega.nz)

you are also right about the connections, 6 sata on the motherboard and 6 on a 4x PCI-express SATA card and I reinstalled my RAID6 NAS with 10 hard drives on omv5 and it worked well until a recent PC box change (taller PC box to fit the stack of 10 disks). Progressively whan a 2TB disk become defective, I replace them with 4TB disks, it’s not ideal, but I do it with the money I have...

During the last crash, even though the system disk was no longer booting, I also reinstalled omv6. Maybe I couldn’t do it and stay on the 5th. I didn’t think it would cause any problems, so I thought the 6th should be more efficient and he wouldn’t have any trouble finding the array. I will reinstall omv5 as soon as I can borrow my daughter’s PC screen.

fredlepp · 4. Juni 2023

Zitat

added to that /dev/sdb is a 4TB drive with bad blocks a sign that the drive is failing

I know, this record has been a problem for me since the start. but the person certified me that this disc was in perfect condition, so I was looking for the cause elsewhere, especially since on the SMART test side, everything was fine

Zitat

The way I would proceed is too note each drive, the port it's connected too and it's drive reference in OMV, as at this moment you have no idea if this is hardware related.

the disks are plugged in the stacking order: sata0 sda

sata1 sdb etc...

but I may be inverted without wanting two sata a begining. i will try this fdisk -l | grep "Disk " thanks

geaves · 4. Juni 2023

Zitat von fredlepp

you are also right about the connections, 6 sata on the motherboard and 6 on a 4x PCI-express SATA card

I am guessing here that the PCIe is probably the issue, there have been more than one instance on the forum

Zitat von fredlepp

During the last crash, even though the system disk was no longer booting, I also reinstalled omv6. Maybe I couldn’t do it and stay on the 5th.

The output of your mdadm conf file in your #14 is OMV5 it is not OMV6 look at mine I posted from a VM.

Zitat von fredlepp

especially since on the SMART test side, everything was fine

Obviously, not check out this from the New User Guide read that section

As I have said unless you can locate why mdadm cannot find the 'slots' where /dev/sdb and /dev/sdf are connected I cannot help any further, Raid6 allows for 2 drive failures, that's not physical failure that's what mdadm cannot locate or interpret, you have 3!!

fredlepp · 7. Juni 2023

Well, we can close the subject, I made my grief of my 10To of lost data. I started all over again clean. Only one disk was really down, I disassembled everything and put it back together properly in a new case. no more untimely rebooting. only concern, when I ask the server to turn off, either via the web interface or by the power button, it turns on systemically. I don’t know where it comes from and it’s not great for energy savings since it’s programmed to shut off every morning at 7:00 am

Jetzt mitmachen!