PVE Kernel nuked RAID

  • Lo


    Due to a powercut my system rebooted.

    When it did, I had no Docker containers, and when I checked, my OMV share drive was missing.


    I have rebooted back to use 5.4.106-1-pve #1 SMP PVE 5.4.106-1 and it works. I used that as a panic default, but I can see in logs I was under 5.4.114-1 before.


    However, under 5.4.119-1-pve, when I start, it fails to mount OMV data partition.


    mdadm --assemble --scan then reports /dev/md0 is already in use


    Clues? (also to warn others!)

  • I would guess it isn't the kernel version causing this. Probably a bad initramfs. Fix with sudo update-initramfs -k all -u


    And if you are running a raid array, you really should have battery backup.

    omv 5.6.13 usul | 64 bit | 5.11 proxmox kernel | omvextrasorg 5.6.2 | kvm plugin 5.1.6
    omv-extras.org plugins source code and issue tracker - github


    Please read this before posting a question.
    Please don't PM for support... Too many PMs!

  • It's softraid, and I have UPS

    But from what I can see in default nut, the default command to shutdown in /etc/nut/upsmon.conf - was /sbin/shutdown

    and shutdown ACTUALLY lives in /usr/sbin/shutdown

    If that's the case, that could affect all OMV users? :)


    So it didn't shutdown...


    Should I try the sudo update-initramfs -k all -u under the new (failed) kernel or from the working one (and then reboot?)

  • It's softraid,

    raid is raid and needs battery backup regardles of type whenever data is written to more than one drive. Good that you have a ups.


    But from what I can see in default nut, the default command to shutdown was /sbin/shutdown

    and shutdown ACTUALLY lives in /usr/sbin/shutdown

    /sbin is a symlink to /usr/sbin on Debian (https://wiki.debian.org/UsrMerge). If your system is not like this, then you need to install usrmerge.


    lrwxrwxrwx 1 root root 8 Jan 13 2020 sbin -> usr/sbin


    If that's the case, that could affect all OMV users?

    If they have an unmerged system, yes. But I thought the nut plugin didn't even install on an unmerged system but I haven't tried lately.


    Should I try the sudo update-initramfs -k all -u under the new (failed) kernel or from the working one (and then reboot?)

    Doesn't matter. The -k all parameter means build a new initramfs for all kernels installed on the system.

    omv 5.6.13 usul | 64 bit | 5.11 proxmox kernel | omvextrasorg 5.6.2 | kvm plugin 5.1.6
    omv-extras.org plugins source code and issue tracker - github


    Please read this before posting a question.
    Please don't PM for support... Too many PMs!

  • I will try the initramfs - although the kernel updates have all been done via the OMV GUI to date.

    As it should but a power cut can corrupt files.


    Interestingly, I've also noticed that my "many kernels" have been reduced back to 5 (I had another thread about them racking up)

    So something in the OMV-Extras/reboot IS tidying them up....

    That is all apt. omv-extras and reboot won't change that. 5 is still too many unless you have the 5.4 and 5.11 kernels installed.

    omv 5.6.13 usul | 64 bit | 5.11 proxmox kernel | omvextrasorg 5.6.2 | kvm plugin 5.1.6
    omv-extras.org plugins source code and issue tracker - github


    Please read this before posting a question.
    Please don't PM for support... Too many PMs!

  • All I have (after initramfs) is:


    update-initramfs: Generating /boot/initrd.img-5.4.119-1-pve

    update-initramfs: Generating /boot/initrd.img-5.4.114-1-pve

    update-initramfs: Generating /boot/initrd.img-5.4.106-1-pve

    update-initramfs: Generating /boot/initrd.img-5.4.103-1-pve

    update-initramfs: Generating /boot/initrd.img-5.4.101-1-pve

  • Perfect. Does rebooting into the 119 kernel work now?

    omv 5.6.13 usul | 64 bit | 5.11 proxmox kernel | omvextrasorg 5.6.2 | kvm plugin 5.1.6
    omv-extras.org plugins source code and issue tracker - github


    Please read this before posting a question.
    Please don't PM for support... Too many PMs!

  • Right, major issue now

    I could previously reboot to an old version. Now I've done the update-initramfs -k all -u thing (as suggested), ALL kernels don't load my OMV Data drive.... :\


    If I try mdadm --assemble --scan I get

    Code
    mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array
    mdadm: /dev/md0 is already in use


    if I cat /sys/modules/md_mod/parmeters/new_array I get permission denied

    it is u+w only (not read)

    If I set u+r on new_array and cat it, I get "Operation not permitted" (and I then set u-r on it again)


    Checking....

  • Small update - /etc/mdadm/mdadm.conf listed the drives thus:


    ARRAY /dev/md/debian:0 metadata=1.2 name=debian:0 UUID=90dece3a:d04b3040:2c4fa91c:9c57ccb2

    ARRAY /dev/md/debian:1 metadata=1.2 name=debian:1 UUID=e4673560:6b1f8d7a:be5a9a98:fcb7ccc9

    ARRAY /dev/md0 metadata=1.2 name=OMVDataRAID UUID=bbfef1bf:cd1f8aab:2421bb14:c4cc3028


    Given /dev/md/debian:0 symlinks to /dev/md0 - the last line is never going to work....


    So, I changed the last line to :


    ARRAY /dev/md127 metadata=1.2 name=OMVDataRAID UUID=bbfef1bf:cd1f8aab:2421bb14:c4cc3028


    and mdadm --assemble --scan worked

    I could then mount /dev/md127 and things reappeared.

    Trying reboot

  • I can now boot, however I see an error on boot about /dev/md0 still being in use (and /sys/module/md_mod/parameters/new_array) but it's now booting AND mounting my data volume - but I suspect this is due to my kludge, not a "proper" fix. I don't deem this stable (or at least not without an explanation).


    I'm rather sceptical about what has changed where :\

    Code
    cat messages.1 | grep "Jun 11" | grep pve
    Jun 11 07:48:51 openmediavault cron-apt: pve-headers-5.4.119-1-pve pve-kernel-5.4.119-1-pve
    Jun 11 07:48:51 openmediavault cron-apt: libwebp6 libwebpmux3 pve-headers-5.4 pve-kernel-5.4


    So I think apt or initram (after pve kernel download - the last was Jun 11th...) did something


    Forcing the initramfs per suggestion broke the older installed kernels, so I couldn't (easily) recover - until I did the /etc/mdadm/mdadm.conf fix, which is manual and a bit worrying, given the file states (at the top):

    Code
    # This file is auto-generated by openmediavault (https://www.openmediavault.org)
    # WARNING: Do not edit this file, your changes will get lost.


    So what broke my /etc/mdadm/mdadm.conf file? How do I get this back? From what I can see in the OMV saltstack files all OMV is doing is:

    Code
    mdadm_save_config:
    cmd.run:
    - name: "mdadm --detail --scan >> /etc/mdadm/mdadm.conf"

    So what caused it to default back to /dev/md0 when I clearly already have a /dev/md0 - as /dev/md/debian:0 is a symlink to /dev/md0

    Should I change /etc/mdadm/mdadm.conf to be:

    Code
    ARRAY /dev/md/debian:0 metadata=1.2 name=debian:0 UUID=90dece3a:d04b3040:2c4fa91c:9c57ccb2
    ARRAY /dev/md/debian:1 metadata=1.2 name=debian:1 UUID=e4673560:6b1f8d7a:be5a9a98:fcb7ccc9
    ARRAY /dev/md/OMVDataRAID metadata=1.2 name=OMVDataRAID UUID=bbfef1bf:cd1f8aab:2421bb14:c4cc3028

    ...as I can't see what is clashing with /dev/md0... :\


    If I (or OMV) does do a --detail --scan now I/it will get the same output back in the file but I can't see why it changed in the first place.


    Clues?

    Do I need to redo the update-initramfs -k all -u again?

  • No need to keep trying to fix the old kernels. I would:

    sudo omv-salt deploy run mdadm

    sudo update-initramfs -u

    omv 5.6.13 usul | 64 bit | 5.11 proxmox kernel | omvextrasorg 5.6.2 | kvm plugin 5.1.6
    omv-extras.org plugins source code and issue tracker - github


    Please read this before posting a question.
    Please don't PM for support... Too many PMs!

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!