RAID 1 degraded, no apparent reason, unit cannot reboot or shutdown, SMART looks fine for both HDDs

  • Hello, I am struggling with an issue that has just shown up, after having OMV run with no issues for quite a while.


    All of the sudden, I cannot gain access to OMV from one Win7 computer. Win10 apparently works ok. When checking system status via GUI, I found that there were some pending updates, that I tried to run several times with no avail. Anyways, I was not quite concerned about the updates, so I kept searching for potential issues, and I found that even though the SMART HDD analysis looks fine, RAID status shows /dev/md0 active, degraded.
    No clue about what led to this situation, as I have a UPS to avoid brownouts or any power loss.


    cat /proc/mdstat


    root@OMV:~# cat /proc/mdstat
    Personalities : [raid1]
    md0 : active raid1 sdb[0](F) sdc[1]
    2930135360 blocks super 1.2 [2/1] [_U]
    unused devices: <none>


    root@OMV:~# fdisk -l | grep "Disk "
    Disk /dev/md0 doesn't contain a valid partition table
    Disk /dev/sdd doesn't contain a valid partition table
    Disk /dev/sde doesn't contain a valid partition table
    Disk /dev/sda: 160.0 GB, 160041885696 bytes
    Disk identifier: 0x00071626
    Disk /dev/md0: 3000.5 GB, 3000458608640 bytes
    Disk identifier: 0x00000000
    Disk /dev/sdd: 3000.6 GB, 3000592982016 bytes
    Disk identifier: 0x00000000
    Disk /dev/sde: 3000.6 GB, 3000592982016 bytes
    Disk identifier: 0x00000000



    root@OMV:~# cat /etc/mdadm/mdadm.conf
    # mdadm.conf
    #
    # Please refer to mdadm.conf(5) for information about this file.
    #
    # by default, scan all partitions (/proc/partitions) for MD superblocks.
    # alternatively, specify devices to scan, using wildcards if desired.
    # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
    # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
    # used if no RAID devices are configured.
    DEVICE partitions
    # auto-create devices with Debian standard permissions
    CREATE owner=root group=disk mode=0660 auto=yes
    # automatically tag new arrays as belonging to the local system
    HOMEHOST <system>
    # definitions of existing MD arrays
    ARRAY /dev/md0 metadata=1.2 name=OMV:VolumeOne UUID=3dc20369:61bff7ce:f7e09590:7aee7194




    root@OMV:~# mdadm --detail --scan --verbose
    ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2


    I have two WD RED 3TB drives as well as a small Hitachi 150GB drive for OS



    It looks pretty odd that I cannot reboot or shutdown the unit, either from the web GUI or using a terminal. I am not quite skilled on Linux, but I guess I know how to reboot or shutdown from command line.



    Your guidance will be appreciated, as I have no idea about what is going on here.



    Regards



    Alberto

    • Offizieller Beitrag

    mdadm --stop /dev/md0
    mdadm --assemble --force --verbose /dev/md0 /dev/sd[ed]


    if that starts the rebuild, then omv-mkconf mdadm

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Hello, Thank you for the feedback.


    I ran mdadm to no avail, as follows:


    root@OMV:~# mdadm --stop /dev/md0
    mdadm: Cannot get exclusive access to /dev/md0:Perhaps a running process, mounted filesystem or active volume group?
    root@OMV:~# sudo mdadm --stop /dev/md0
    mdadm: Cannot get exclusive access to /dev/md0:Perhaps a running process, mounted filesystem or active volume group?


    In case I can somehow stop RAID, how can I identify the supposedly missing (damaged) drive to run next command? (the [ed] between brackets).


    mdadm --assemble --force --verbose /dev/md0 /dev/sd[ed]



    Thank you!



    Alberto

    • Offizieller Beitrag

    I forgot that since it is still assembled but degraded that the filesystem would still be mounted. You need to unmount the filesystem before running the command. Use the umount command.


    In case I can somehow stop RAID, how can I identify the supposedly missing (damaged) drive to run next command? (the [ed] between brackets).

    You are assuming the drive is damaged. Just because an array didn't assembled doesn't mean the drive is damaged. This is a big reason I would normally tell someone to not use raid. With mirror, I think rsyncing the two drives on a regular basis is better than using mdadm mirroring.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • Hello,


    I tried umount:


    root@OMV:~# umount /dev/md0
    umount: /media/3156d0bd-44c3-4046-a2d0-ecf6331ef93f: device is busy.
    (In some cases useful info about processes that use
    the device is found by lsof(8) or fuser(1))


    I guess the disks are ok, as you said on previous message. Once I get this thing back on track I will seriously evaluate whether keeping RAID or using mirroring as you propose. RAID seems to be more complex to fix in case things go wrong.


    Regards


    Alberto

    • Offizieller Beitrag

    I tried umount:

    You probably have services using that filesystem. Booting into a rescue distro like systemrescuecd is probably the best way to do this.

    omv 7.0.5-1 sandworm | 64 bit | 6.8 proxmox kernel

    plugins :: omvextrasorg 7.0 | kvm 7.0.13 | compose 7.1.4 | k8s 7.1.0-3 | cputemp 7.0.1 | mergerfs 7.0.4


    omv-extras.org plugins source code and issue tracker - github - changelogs


    Please try ctrl-shift-R and read this before posting a question.

    Please put your OMV system details in your signature.
    Please don't PM for support... Too many PMs!

  • OK, I was trying to avoid abrupt power off, but the system is not responding to either shutdown or reboot commands, no matter whether command is issued from CLI or GUI.


    Should I just force power off?


    Regards

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!