Lost and Thankfully Recovered RAID Array; Stuck on Old Kernel

ascarola · 9. März 2021

Hello,

Thought I'd share something I experienced at the end of January to help others and myself from reoccurrence. Here is my setup: Raspberry Pi 4, 2GB with two 5TB WD USB drives attached to a powered USB 3.0 hub. The two drives were used in an mdadm-configured RAID array (mirrored) [and yes, I understand this is not the most secure approach and can be a bit slow.] I'm currently running OMV 5.5.23-1 (Usul). I'm on kernel 5.4.79-v7l+ and currently reluctant to move forward to new versions. The system is used for a home network to backup three Mac computers (TimeMachine style); everything connected via gigabit Ethernet to reduce bottlenecks. By the way, I also have an SD card attached to the USB hub and performing weekly backups (using sudo dd if=/dev/mmcblk0 of=/dev/sde as a scheduled job within OMV). This SD is /dev/sde. Side-note, when I look at File Systems in OMV, sometimes I see /dev/sde1 as boot and /dev/sde2 as rootfs; and other times I see /dev/mmcblk0p1 as boot and /dev/mmcblk0p2 as rootfs. Not sure if this is an issue or not but it seems strange to flip-flop between the two like it does.

Anyway, it's a pretty straight forward setup and I had it up and running nonstop for about a month and and half without issues until the end of January, around the 28th. Earlier that week, I applied all of the updates to the system as I normally do. I do not recall doing a restart after the updates, or possibly the system did by itself. On the morning of the 28th, I gracefully shut down the system and relocated it to my new house. After powering it back on, the RAID array was gone. I posted the ordeal to Reddit (see https://www.reddit.com/r/linux…mdadm_raid1_failed_array/) and received the help needed to get it back up and running, which was to basically revert Kernels from what I could tell. I'm not sure what exactly happened but I know at this point I'm very reluctant to perform any more updates. If anyone has any thoughts on what happened or why, I'd greatly appreciate your input.

To save you some time in reviewing the Reddit post, here's what tipped off the glorious expert (ang-p) on my problem:

Code

pi@piserver:~ $ sudo modprobe raid1     
modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/5.4.79-v7l+/modules.dep.bin'     
modprobe: FATAL: Module raid1 not found in directory /lib/modules/5.4.79-v7l+

He said, "Looks like your old modules were wiped, new ones were put in place, but your kernel in [tt]/boot[/tt] was not updated for some reason...."

And, here is what ultimately fixed it:

sudo dpkg --install /var/cache/apt/archives/raspberrypi-kernel_1.20201201-1_armhf.deb

And here are some parting comments from ang-p regarding the potential causes:

Zitat

All I can guess is that either sometime after the kernel was updated (from 5.4.79 to 5.4.83) something wrote the old(s) one back to the [tt]boot[/tt] partition without going through the proper procedure (i.e. the complete package...), or the upgrade failed (silently??? :-/ )
When it was clear that it wasn't just that one module missing, it was a case of find the right kernel to install, but until I saw the output you just posted I wasn't sure that you even had the deb containing the 5.4.79 - I knew you had 2 kernel debs (from [tt]20201201[/tt] and [tt]20210108[/tt]), and knew [tt]20210108[/tt] was 5.4.83, but not sure if the [tt]20201201[/tt] was 5.4.79 or something completely different....
Bit puzzled why 20210108 didn't take when you tried [tt]dpkg --install[/tt]ing it..... it might be the root of the issue... maybe it goofed it's own install?

Last, here are some of my current config outputs:

Code

pi@piserver:~ $ cat /proc/mdstat
Personalities : [raid1] 
md127 : active raid1 sdb1[0] sdd1[1]
      4883604416 blocks super 1.2 [2/2] [UU]
      bitmap: 2/37 pages [8KB], 65536KB chunk

unused devices: <none>

Code

pi@piserver:~ $ blkid
/dev/mmcblk0p1: LABEL_FATBOOT="boot" LABEL="boot" UUID="4AD7-B4D5" TYPE="vfat" PARTUUID="884b5d9f-01"
/dev/mmcblk0p2: LABEL="rootfs" UUID="2887d26c-6ae7-449d-9701-c5a4018755b0" TYPE="ext4" PARTUUID="884b5d9f-02"
/dev/sdb1: UUID="7590167c-f276-b76c-56a2-c0313952a1f2" UUID_SUB="c813b465-b96f-1a31-41a6-bc89e4d0d4fe" LABEL="piserver:RAIDARRAY" TYPE="linux_raid_member" PARTUUID="917ef790-02d7-43fa-8f67-c0f3bd7530fb"
/dev/sda1: LABEL="500GB" UUID="0fd99b27-fde7-4242-b3e1-28a92475dc0f" TYPE="ext4" PARTUUID="b433c519-c292-4e33-afff-095a9e34b028"
/dev/sdd1: UUID="7590167c-f276-b76c-56a2-c0313952a1f2" UUID_SUB="99cb9f96-c5ef-9945-4888-ef4dc0b5c62e" LABEL="piserver:RAIDARRAY" TYPE="linux_raid_member" PARTUUID="fe3a1dc5-9b83-4aaa-96d4-c4249c2f4279"
/dev/sde1: LABEL_FATBOOT="boot" LABEL="boot" UUID="4AD7-B4D5" TYPE="vfat" PARTUUID="884b5d9f-01"
/dev/sde2: LABEL="rootfs" UUID="2887d26c-6ae7-449d-9701-c5a4018755b0" TYPE="ext4" PARTUUID="884b5d9f-02"
/dev/md127: LABEL="RAID1" UUID="197fa14c-b5ad-4ec5-9bc4-552ad6670e16" TYPE="ext4"

Code

pi@piserver:~ $ sudo fdisk -l | grep "Disk "
Disk /dev/ram0: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram1: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram2: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram3: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram4: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram5: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram6: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram7: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram8: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram9: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram10: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram11: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram12: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram13: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram14: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/ram15: 4 MiB, 4194304 bytes, 8192 sectors
Disk /dev/mmcblk0: 29.7 GiB, 31914983424 bytes, 62333952 sectors
Disk identifier: 0x884b5d9f
Disk /dev/sdb: 4.6 TiB, 5000947302400 bytes, 9767475200 sectors
Disk model: Elements 25A1   
Disk identifier: 3C5CE4D3-6D46-464A-B26E-9903D7002447
Disk /dev/sda: 465.7 GiB, 500074283008 bytes, 976707584 sectors
Disk model: Elements 10B8   
Disk identifier: 44D5C4D1-1170-4A84-8D18-9DFE7562F9F8
Disk /dev/sdd: 4.6 TiB, 5000947302400 bytes, 9767475200 sectors
Disk model: Elements 25A1   
Disk identifier: 9E176C51-ECE7-4DC9-A996-03E0A8214584
Disk /dev/sde: 29.7 GiB, 31914983424 bytes, 62333952 sectors
Disk model: STORAGE DEVICE  
Disk identifier: 0x884b5d9f
Disk /dev/md127: 4.6 TiB, 5000810921984 bytes, 9767208832 sectors

Alles anzeigen

Code

pi@piserver:~ $ sudo cat /etc/mdadm/mdadm.conf
# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.


# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#


# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions


# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes


# automatically tag new arrays as belonging to the local system
HOMEHOST <system>
# instruct the monitoring daemon where to send mail alerts
MAILADDR {...snip...}
MAILFROM root


# definitions of existing MD arrays
ARRAY /dev/md/RAIDARRAY metadata=1.2 name=piserver:RAIDARRAY UUID=7590167c:f276b76c:56a2c031:3952a1f2

Alles anzeigen

Code

pi@piserver:~ $ sudo mdadm --detail --scan --verbose
ARRAY /dev/md/RAIDARRAY level=raid1 num-devices=2 metadata=1.2 name=piserver:RAIDARRAY UUID=7590167c:f276b76c:56a2c031:3952a1f2
   devices=/dev/sdb1,/dev/sdd1

So to summarize, I'd love to know if it is possible to safely upgrade to the latest kernel. I'm very much interested to keep up with the latest patches and bug-fixes, security updates, etc. But, I'm afraid now for fear of losing my array again.

And {really the last thing this time}, I am very much interested to break the array and setup RSYNC instead. Would be great if there were any tutorials on that [inclusive of no data loss], but I cannot find at this point. I appreciate any pointers in this regard.

Thanks and regards,

Anthony

geaves · 9. März 2021

Zitat von ascarola

So to summarize, I'd love to know if it is possible to safely upgrade to the latest kernel. I'm very much interested to keep up with the latest patches and bug-fixes, security updates, etc. But, I'm afraid now for fear of losing my array again

One of the reasons the use of USB drives for raid config was removed, but having read through the above it certainly doesn't make sense that a kernel update would delete modules.

To revert to an earlier kernel, you can change that in omv-extras -> kernel tab.

Zitat von ascarola

And {really the last thing this time}, I am very much interested to break the array and setup RSYNC instead

Wise choice tutorials, no, but, it may be possible, but, it will be like starting again.

ascarola · 10. März 2021

Zitat von geaves

One of the reasons the use of USB drives for raid config was removed

I will take that as you're saying--in general--that USB RAID is buggy, flaky and not worthy of support. I get it. But I still agree that it is worth it in certain cases; for example, for those of us who do not have an unlimited budget for hardware RAID and don't want to risk losing data. In the Rsync setup--which I will eventually get to, once I figure out how to do it safely--there's a chance that one of my drives dies and didn't get a chance to replicate the data to the other, and therefore I have data loss. At least with RAID, the chance is slightly less. Regardless, due to the lack of real USB RAID support in OMV, I understand I need to go there regardless or spend more money on a hardware RAID setup. I respect and appreciate your opinion.

Zitat von geaves

To revert to an earlier kernel, you can change that in omv-extras -> kernel tab.

I didn't see a "kernel tab" in my setup so I researched and found that's not available for us ARM folks.

Zitat von geaves

tutorials, no, but, it may be possible, but, it will be like starting again

Really? I was thinking to remove one of the RAID drives, format it outside of the server, re-attach it, copy the data from the failed array, delete the array, format the second drive outside of the server, re-attach it and setup the Rsync. Sucks that I'd have to start over again... surprised. How unfortunate as I have now over 3 months of backups for 3 Macs. I've actually used the backups once to restore a ton of data to my wife's computer that she accidentally deleted. Worked like a charm.

Thanks for the reply.

geaves · 10. März 2021

Zitat von ascarola

I will take that as you're saying--in general--that USB RAID is buggy, flaky and not worthy of support

Nope, but it has everything to do with the stability of USB drives on an SBC, you have the USB bus on an SBC, then you have sata to usb bridge within the drive enclosure. There's an interesting thread here if you want to read it, the remainder of your paragraph is covered in that thread.

Zitat von ascarola

I didn't see a "kernel tab" in my setup so I researched and found that's not available for us ARM folks

My apologies, I don't use ARM as a server my ARM boards perform specific functions on my network.

Zitat von ascarola

Really

Yes, why? because all your shares point to the raid configuration, whilst you don't have to remove the data you have to do the reverse, remove SMB shares, then remove Shared Folders but not the content + stop any dockers if you use them as these will need to be reconfigured.

Zitat von ascarola

I was thinking to remove one of the RAID drives, format it outside of the server, re-attach it, copy the data from the failed array, delete the array, format the second drive outside of the server, re-attach it and setup the Rsync.

Close but there are two ways to do this;

1) The safe way

2) The squeaky bum way

You've suggested option 2, which means remove a drive from the array, wipe it, format it, then move the data from the degraded array to the single drive. Then unmount the array and delete it from the GUI, wipe the drive, format it and set up Rsync after setting up your shares.

Option 1 is make a backup first, then proceed.

Zitat von ascarola

for those of us who do not have an unlimited budget for hardware RAID

I decided to answer this at the end, neither do I, my current server (HP Microserver) I bought off ebay for less than £100. My previous servers were from my then place of work where they had been in use for 6 years.

Soma · 10. März 2021

Zitat von ascarola

By the way, I also have an SD card attached to the USB hub and performing weekly backups (using sudo dd if=/dev/mmcblk0 of=/dev/sde as a scheduled job within OMV). This SD is /dev/sde. Side-note, when I look at File Systems in OMV, sometimes I see /dev/sde1 as boot and /dev/sde2 as rootfs; and other times I see /dev/mmcblk0p1 as boot and /dev/mmcblk0p2 as rootfs. Not sure if this is an issue or not but it seems strange to flip-flop between the two like it does.

Here is what gives your the problem: RPi4 is now able to boot straight from USB and you have 2 fully operational OS attached.

By whatever reason, (SD bus busy and USB reads first or vice-versa) the system will boot first with that one.

If you do boot to USB and have "/root" and "/boot" on SDE, and then you update boot rom and kernel, it will update the CLONE only (the USB with the SD on it).

Now let's say you reboot and the RPi4 goes with "mmcblk0" as the OS, you will still be on the previous situation you were since there was no update to it.

Or even if you do boot from the USB with the SD on it, that did the update, you know that doing sudo dd if=/dev/mmcblk0 of=/dev/sde will kill it.

Best way to have your rig is using only one in place (either the SD or the USB) not both at the same time and plug it when you want to do the backup. After the backup unplug it.

To see what order the RPi4 boots, you can see it by (the hex is read from right to left):

rpi-eeprom-config

The order is shown:

BOOT_ORDER=0xf41

The "1" is --> boot from SD card

The "4" is --> boot from USB

And this "0xf" is the order to "Restart" if unable to boot from neither.

All this information is available here:

ascarola · 10. März 2021

Geaves, Good info, thank you.

Soma, Wow, yes I was starting to think that could have been part of the problem when I was writing this... Very interesting! I will need to do some more digging to see if I can fix it. Thank you!!

ascarola · 10. März 2021

Zitat von Soma

To see what order the RPi4 boots, you can see it by (the hex is read from right to left):
rpi-eeprom-config
The order is shown:
BOOT_ORDER=0xf41
The "1" is --> boot from SD card
The "4" is --> boot from USB
And this "0xf" is the order to "Restart" if unable to boot from neither.

Alles anzeigen

Soma, Here is my output below. I was expecting to see something else based on your reply. Regardless, I take this to mean that my Pi is only booting from the built-in SD card. Regardless, I guess what you are saying is that at some point in January, when I updated/patched, it only patched one drive (SD or USB) or maybe part of the drive (boot or rootfs) and then got wiped with the DD backup. Then when I booted again, it reverted to the prior version from the other unpatched drive. IDK, it sounds strange but thinking it definitely had something to do with it. I will need to try to do a complete backup, remove the backup and then upgrade and see if everything works ok. Will report back later. Thanks again for the suggested issue.

Code

pi@piserver:~ $ sudo rpi-eeprom-config 
[all]
BOOT_UART=0
WAKE_ON_GPIO=1
POWER_OFF_ON_HALT=0
DHCP_TIMEOUT=45000
DHCP_REQ_TIMEOUT=4000
TFTP_FILE_TIMEOUT=30000
TFTP_IP=
TFTP_PREFIX=0
BOOT_ORDER=0x1
SD_BOOT_MAX_RETRIES=3
NET_BOOT_MAX_RETRIES=5
[none]
FREEZE_VERSION=0

Alles anzeigen

ascarola · 10. März 2021

Zitat von Soma

Here is what gives your the problem: RPi4 is now able to boot straight from USB and you have 2 fully operational OS attached.

To confirm, yes that was the problem. The system would sometimes use the USB SD card as boot but remain on the built-in SD card for rootfs. And, because one or the other wasn't updated properly, when it would boot to the other device, it couldn't find the old kernel and basically died. I resolved it today by 1) performing a full backup to the USB SD card; 2) removing the USB SD card; 3) restarting; 4) performing a full update with all patches and latest kernel; and 5) restarting. All worked well and now I have learned a lesson, to not keep my USB SD backup in the system all the time, but only to use that when I want to perform a backup; such as before updating. THANK YOU SOMA! Awesome.

Lost and Thankfully Recovered RAID Array; Stuck on Old Kernel

Jetzt mitmachen!

Tags