Lost and Thankfully Recovered RAID Array; Stuck on Old Kernel

  • Hello,


    Thought I'd share something I experienced at the end of January to help others and myself from reoccurrence. Here is my setup: Raspberry Pi 4, 2GB with two 5TB WD USB drives attached to a powered USB 3.0 hub. The two drives were used in an mdadm-configured RAID array (mirrored) [and yes, I understand this is not the most secure approach and can be a bit slow.] I'm currently running OMV 5.5.23-1 (Usul). I'm on kernel 5.4.79-v7l+ and currently reluctant to move forward to new versions. The system is used for a home network to backup three Mac computers (TimeMachine style); everything connected via gigabit Ethernet to reduce bottlenecks. By the way, I also have an SD card attached to the USB hub and performing weekly backups (using sudo dd if=/dev/mmcblk0 of=/dev/sde as a scheduled job within OMV). This SD is /dev/sde. Side-note, when I look at File Systems in OMV, sometimes I see /dev/sde1 as boot and /dev/sde2 as rootfs; and other times I see /dev/mmcblk0p1 as boot and /dev/mmcblk0p2 as rootfs. Not sure if this is an issue or not but it seems strange to flip-flop between the two like it does.


    Anyway, it's a pretty straight forward setup and I had it up and running nonstop for about a month and and half without issues until the end of January, around the 28th. Earlier that week, I applied all of the updates to the system as I normally do. I do not recall doing a restart after the updates, or possibly the system did by itself. On the morning of the 28th, I gracefully shut down the system and relocated it to my new house. After powering it back on, the RAID array was gone. I posted the ordeal to Reddit (see https://www.reddit.com/r/linux…mdadm_raid1_failed_array/) and received the help needed to get it back up and running, which was to basically revert Kernels from what I could tell. I'm not sure what exactly happened but I know at this point I'm very reluctant to perform any more updates. If anyone has any thoughts on what happened or why, I'd greatly appreciate your input.


    To save you some time in reviewing the Reddit post, here's what tipped off the glorious expert (ang-p) on my problem:

    Code
    pi@piserver:~ $ sudo modprobe raid1     
    modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/5.4.79-v7l+/modules.dep.bin'     
    modprobe: FATAL: Module raid1 not found in directory /lib/modules/5.4.79-v7l+       

    He said, "Looks like your old modules were wiped, new ones were put in place, but your kernel in [tt]/boot[/tt] was not updated for some reason...."


    And, here is what ultimately fixed it:

    sudo dpkg --install /var/cache/apt/archives/raspberrypi-kernel_1.20201201-1_armhf.deb 


    And here are some parting comments from ang-p regarding the potential causes:

    Zitat

    All I can guess is that either sometime after the kernel was updated (from 5.4.79 to 5.4.83) something wrote the old(s) one back to the [tt]boot[/tt] partition without going through the proper procedure (i.e. the complete package...), or the upgrade failed (silently??? :-/ )

    When it was clear that it wasn't just that one module missing, it was a case of find the right kernel to install, but until I saw the output you just posted I wasn't sure that you even had the deb containing the 5.4.79 - I knew you had 2 kernel debs (from [tt]20201201[/tt] and [tt]20210108[/tt]), and knew [tt]20210108[/tt] was 5.4.83, but not sure if the [tt]20201201[/tt] was 5.4.79 or something completely different....

    Bit puzzled why 20210108 didn't take when you tried [tt]dpkg --install[/tt]ing it..... it might be the root of the issue... maybe it goofed it's own install?

    Last, here are some of my current config outputs:

    Code
    pi@piserver:~ $ cat /proc/mdstat
    Personalities : [raid1] 
    md127 : active raid1 sdb1[0] sdd1[1]
          4883604416 blocks super 1.2 [2/2] [UU]
          bitmap: 2/37 pages [8KB], 65536KB chunk
    
    unused devices: <none>
    Code
    pi@piserver:~ $ blkid
    /dev/mmcblk0p1: LABEL_FATBOOT="boot" LABEL="boot" UUID="4AD7-B4D5" TYPE="vfat" PARTUUID="884b5d9f-01"
    /dev/mmcblk0p2: LABEL="rootfs" UUID="2887d26c-6ae7-449d-9701-c5a4018755b0" TYPE="ext4" PARTUUID="884b5d9f-02"
    /dev/sdb1: UUID="7590167c-f276-b76c-56a2-c0313952a1f2" UUID_SUB="c813b465-b96f-1a31-41a6-bc89e4d0d4fe" LABEL="piserver:RAIDARRAY" TYPE="linux_raid_member" PARTUUID="917ef790-02d7-43fa-8f67-c0f3bd7530fb"
    /dev/sda1: LABEL="500GB" UUID="0fd99b27-fde7-4242-b3e1-28a92475dc0f" TYPE="ext4" PARTUUID="b433c519-c292-4e33-afff-095a9e34b028"
    /dev/sdd1: UUID="7590167c-f276-b76c-56a2-c0313952a1f2" UUID_SUB="99cb9f96-c5ef-9945-4888-ef4dc0b5c62e" LABEL="piserver:RAIDARRAY" TYPE="linux_raid_member" PARTUUID="fe3a1dc5-9b83-4aaa-96d4-c4249c2f4279"
    /dev/sde1: LABEL_FATBOOT="boot" LABEL="boot" UUID="4AD7-B4D5" TYPE="vfat" PARTUUID="884b5d9f-01"
    /dev/sde2: LABEL="rootfs" UUID="2887d26c-6ae7-449d-9701-c5a4018755b0" TYPE="ext4" PARTUUID="884b5d9f-02"
    /dev/md127: LABEL="RAID1" UUID="197fa14c-b5ad-4ec5-9bc4-552ad6670e16" TYPE="ext4"
    Code
    pi@piserver:~ $ sudo mdadm --detail --scan --verbose
    ARRAY /dev/md/RAIDARRAY level=raid1 num-devices=2 metadata=1.2 name=piserver:RAIDARRAY UUID=7590167c:f276b76c:56a2c031:3952a1f2
       devices=/dev/sdb1,/dev/sdd1

    So to summarize, I'd love to know if it is possible to safely upgrade to the latest kernel. I'm very much interested to keep up with the latest patches and bug-fixes, security updates, etc. But, I'm afraid now for fear of losing my array again.


    And {really the last thing this time}, I am very much interested to break the array and setup RSYNC instead. Would be great if there were any tutorials on that [inclusive of no data loss], but I cannot find at this point. I appreciate any pointers in this regard.


    Thanks and regards,

    Anthony

    • Offizieller Beitrag

    So to summarize, I'd love to know if it is possible to safely upgrade to the latest kernel. I'm very much interested to keep up with the latest patches and bug-fixes, security updates, etc. But, I'm afraid now for fear of losing my array again

    :D One of the reasons the use of USB drives for raid config was removed, but having read through the above it certainly doesn't make sense that a kernel update would delete modules.

    To revert to an earlier kernel, you can change that in omv-extras -> kernel tab.


    And {really the last thing this time}, I am very much interested to break the array and setup RSYNC instead

    Wise choice :) tutorials, no, but, it may be possible, but, it will be like starting again.

  • One of the reasons the use of USB drives for raid config was removed

    I will take that as you're saying--in general--that USB RAID is buggy, flaky and not worthy of support. I get it. But I still agree that it is worth it in certain cases; for example, for those of us who do not have an unlimited budget for hardware RAID and don't want to risk losing data. In the Rsync setup--which I will eventually get to, once I figure out how to do it safely--there's a chance that one of my drives dies and didn't get a chance to replicate the data to the other, and therefore I have data loss. At least with RAID, the chance is slightly less. Regardless, due to the lack of real USB RAID support in OMV, I understand I need to go there regardless or spend more money on a hardware RAID setup. I respect and appreciate your opinion.

    To revert to an earlier kernel, you can change that in omv-extras -> kernel tab.

    I didn't see a "kernel tab" in my setup so I researched and found that's not available for us ARM folks.

    tutorials, no, but, it may be possible, but, it will be like starting again

    Really? I was thinking to remove one of the RAID drives, format it outside of the server, re-attach it, copy the data from the failed array, delete the array, format the second drive outside of the server, re-attach it and setup the Rsync. Sucks that I'd have to start over again... surprised. How unfortunate as I have now over 3 months of backups for 3 Macs. I've actually used the backups once to restore a ton of data to my wife's computer that she accidentally deleted. Worked like a charm.


    Thanks for the reply.

    • Offizieller Beitrag

    I will take that as you're saying--in general--that USB RAID is buggy, flaky and not worthy of support

    Nope, but it has everything to do with the stability of USB drives on an SBC, you have the USB bus on an SBC, then you have sata to usb bridge within the drive enclosure. There's an interesting thread here if you want to read it, the remainder of your paragraph is covered in that thread.

    I didn't see a "kernel tab" in my setup so I researched and found that's not available for us ARM folks

    My apologies, I don't use ARM as a server my ARM boards perform specific functions on my network.

    Really

    Yes, why? because all your shares point to the raid configuration, whilst you don't have to remove the data you have to do the reverse, remove SMB shares, then remove Shared Folders but not the content + stop any dockers if you use them as these will need to be reconfigured.

    I was thinking to remove one of the RAID drives, format it outside of the server, re-attach it, copy the data from the failed array, delete the array, format the second drive outside of the server, re-attach it and setup the Rsync.

    Close :) but there are two ways to do this;

    1) The safe way

    2) The squeaky bum way


    You've suggested option 2, which means remove a drive from the array, wipe it, format it, then move the data from the degraded array to the single drive. Then unmount the array and delete it from the GUI, wipe the drive, format it and set up Rsync after setting up your shares.


    Option 1 is make a backup first, then proceed.

    for those of us who do not have an unlimited budget for hardware RAID

    I decided to answer this at the end, neither do I, my current server (HP Microserver) I bought off ebay for less than £100. My previous servers were from my then place of work where they had been in use for 6 years.

  • By the way, I also have an SD card attached to the USB hub and performing weekly backups (using sudo dd if=/dev/mmcblk0 of=/dev/sde as a scheduled job within OMV). This SD is /dev/sde. Side-note, when I look at File Systems in OMV, sometimes I see /dev/sde1 as boot and /dev/sde2 as rootfs; and other times I see /dev/mmcblk0p1 as boot and /dev/mmcblk0p2 as rootfs. Not sure if this is an issue or not but it seems strange to flip-flop between the two like it does.

    Here is what gives your the problem: RPi4 is now able to boot straight from USB and you have 2 fully operational OS attached.

    By whatever reason, (SD bus busy and USB reads first or vice-versa) the system will boot first with that one.

    If you do boot to USB and have "/root" and "/boot" on SDE, and then you update boot rom and kernel, it will update the CLONE only (the USB with the SD on it).


    Now let's say you reboot and the RPi4 goes with "mmcblk0" as the OS, you will still be on the previous situation you were since there was no update to it.


    Or even if you do boot from the USB with the SD on it, that did the update, you know that doing sudo dd if=/dev/mmcblk0 of=/dev/sde will kill it.


    Best way to have your rig is using only one in place (either the SD or the USB) not both at the same time and plug it when you want to do the backup. After the backup unplug it.


    To see what order the RPi4 boots, you can see it by (the hex is read from right to left):

    rpi-eeprom-config

    The order is shown:

    BOOT_ORDER=0xf41

    The "1" is --> boot from SD card

    The "4" is --> boot from USB

    And this "0xf" is the order to "Restart" if unable to boot from neither.


    All this information is available here:

  • Geaves, Good info, thank you.

    Soma, Wow, yes I was starting to think that could have been part of the problem when I was writing this... Very interesting! I will need to do some more digging to see if I can fix it. Thank you!!

  • Soma, Here is my output below. I was expecting to see something else based on your reply. Regardless, I take this to mean that my Pi is only booting from the built-in SD card. Regardless, I guess what you are saying is that at some point in January, when I updated/patched, it only patched one drive (SD or USB) or maybe part of the drive (boot or rootfs) and then got wiped with the DD backup. Then when I booted again, it reverted to the prior version from the other unpatched drive. IDK, it sounds strange but thinking it definitely had something to do with it. I will need to try to do a complete backup, remove the backup and then upgrade and see if everything works ok. Will report back later. Thanks again for the suggested issue.

  • Here is what gives your the problem: RPi4 is now able to boot straight from USB and you have 2 fully operational OS attached.

    To confirm, yes that was the problem. The system would sometimes use the USB SD card as boot but remain on the built-in SD card for rootfs. And, because one or the other wasn't updated properly, when it would boot to the other device, it couldn't find the old kernel and basically died. I resolved it today by 1) performing a full backup to the USB SD card; 2) removing the USB SD card; 3) restarting; 4) performing a full update with all patches and latest kernel; and 5) restarting. All worked well and now I have learned a lesson, to not keep my USB SD backup in the system all the time, but only to use that when I want to perform a backup; such as before updating. THANK YOU SOMA! Awesome. 8)

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!