Missing RAID-10 after shutdown/restart

  • Hi,


    I had to shutdown my NAS, did this via power btn as usual. NAS shut down after a while. So far everything went as normal (pwer btn is configured to "shut down")


    Now I am missing my RAID 10, 4 x ST8000VN0022-2EL112 (8TB SATA drives). The single drives are listed properly. They appear as sdc...sdf and are all connected directly to the onboard SATA controller.


    A simple

    Code
    root@omv5:~# mdadm --assemble --scan
    mdadm: Fail to create md0 when using /sys/module/md_mod/parameters/new_array, fallback to creation via node
    mdadm: /dev/md0 is already in use.

    did not work.

    Code
    root@omv5:~# cat /proc/mdstat
    Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [ra                                             id10]
    md0 : active raid1 sdb[1] sda[0]
          19531693056 blocks super 1.2 [2/2] [UU]
          bitmap: 0/146 pages [0KB], 65536KB chunk
    
    unused devices: <none>
    Code
    root@omv5:~# blkid
    /dev/sdf: UUID="7880e29c-7e10-8585-07a7-de53a0202f0c" UUID_SUB="2eddf677-d864-09f8-fb32-28e93a182907" LABEL="omv5:0" TYPE="linux_raid_member"
    /dev/sdd: UUID="7880e29c-7e10-8585-07a7-de53a0202f0c" UUID_SUB="56f2ca3e-b755-4b2b-4a86-e5a0ef8b3132" LABEL="omv5:0" TYPE="linux_raid_member"
    /dev/sdb: UUID="37da4029-f22c-71a8-60e4-374a0a6da7fb" UUID_SUB="5b845c85-a1a5-12dd-0a07-1c5002a3d95d" LABEL="omv3:0" TYPE="linux_raid_member"
    /dev/md0: UUID="5afbf069-9b76-484d-91f2-724406af67b8" BLOCK_SIZE="4096" TYPE="xfs"
    /dev/sdg5: UUID="000e208b-d3cc-44bd-9882-23b227df94e4" TYPE="swap" PARTUUID="17e4cf92-05"
    /dev/sdg1: UUID="13ca13fc-8bad-4071-8a5a-4af9b7ee456d" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="17e4cf92-01"
    /dev/sde: UUID="7880e29c-7e10-8585-07a7-de53a0202f0c" UUID_SUB="9b02534d-1767-a1db-6fd3-63be8a982169" LABEL="omv5:0" TYPE="linux_raid_member"
    /dev/sdc: UUID="7880e29c-7e10-8585-07a7-de53a0202f0c" UUID_SUB="46df13e0-d1f3-af87-92c8-f253c4a624ac" LABEL="omv5:0" TYPE="linux_raid_member"
    /dev/sda: UUID="37da4029-f22c-71a8-60e4-374a0a6da7fb" UUID_SUB="ce47e707-988f-78ea-14f2-367bef1c6a75" LABEL="omv3:0" TYPE="linux_raid_member"
    Code
    root@omv5:~# mdadm --detail --scan --verbose
    ARRAY /dev/md/omv3:0 level=raid1 num-devices=2 metadata=1.2 name=omv3:0 UUID=37da4029:f22c71a8:60e4374a:0a6da7fb
       devices=/dev/sda,/dev/sdb

    I'd really appreciate if you could help me with that! :*

  • Have you done an examine of the 4 drives that were in the RAiD10 to check events count/update time?

    E.g. mdadm -E /dev/sd[cf] | grep -E "Event|Update" and check which drives are in which mirror pair.


    Next to try is mdadm --assemble /dev/mdXXX --force /dev/sd[cf] which will give some feedback if it fails. The mdXXX number will be in the past boot logs if you don't remember it.


    It one or more drives will not rebuild, just cross fingers it's not from the same mirror pair.

  • Hi, thanks for your response. All drives ran their weekly SMART test a few days ago and there where no probs announced.


    Following your recommendation

    Code
    mdadm -E /dev/sd[cf] | grep -E "Event|Update"
        Update Time : Mon Mar  4 07:36:51 2024
             Events : 25342
        Update Time : Mon Mar  4 07:36:51 2024
             Events : 25342

    Do I need to specify the recently used MDxxx number or can I use any number except for md0 which is in use?

  • A bit strange but the boot log mentions a Hitachi disc drive (2TB) which is definitely not connected to the system


    Code
    2024-03-05T17:41:44+0100 omv5 run-parts[747]: /dev/disk/by-id/ata-Hitachi_HDS5C3020ALA632_ML0230FA0U3H7D: Unable to detect device type
    2024-03-05T17:41:44+0100 omv5 run-parts[747]: Please specify device type with the -d option.
    2024-03-05T17:41:44+0100 omv5 run-parts[747]: Use smartctl -h to get a usage summary
    2024-03-05T17:41:44+0100 omv5 smartd[744]: smartd 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-18-amd64] (local build)
    2024-03-05T17:41:44+0100 omv5 smartd[744]: Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
    2024-03-05T17:41:44+0100 omv5 smartd[744]: Opened configuration file /etc/smartd.conf
    2024-03-05T17:41:44+0100 omv5 run-parts[742]: run-parts: /etc/smartmontools/hdparm.d//openmediavault-dev-disk-by-id-ata-Hitachi_HDS5C302
  • OK I've tried that, because it seems to persist until the next reboot only


    Code
    root@omv5:~# mdadm --assemble /dev/md127 --force /dev/sdc /dev/sdd /dev/sdf /dev/sdg
    mdadm: /dev/md127 has been started with 4 drives.


    It looks like the RAID is up and running again but I can't access the data - I assume because the file system is not initialized properly?

  • Hi, thanks for your response. All drives ran their weekly SMART test a few days ago and there where no probs announced.


    Following your recommendation

    Code
    mdadm -E /dev/sd[cf] | grep -E "Event|Update"
        Update Time : Mon Mar  4 07:36:51 2024
             Events : 25342
        Update Time : Mon Mar  4 07:36:51 2024
             Events : 25342

    Do I need to specify the recently used MDxxx number or can I use any number except for md0 which is in use?

    Only two out four disks? Check the full output of mdadm -E /dev/sd[cf] Which two disk show up and are they in the same or different mirror pair?

  • Hi, thanks for taking your time! I see that we newbiews are keeping you busy today :/


    When I look at the shared folders configuration I can see that the expected device (the missing RAID) is referenced as

    Code
    /dev/disk/by-uuid/71c1090c-fc3f-4ecf-b159-b30036a53704

    whereas the shard folders on the other RAID show

    Code
    /dev/md0

    So should I try to repeat the assembly by using the device name given above?:

    Code
    root@omv5:~# mdadm --assemble /dev/disk/by-uuid/71c1090c-fc3f-4ecf-b159-b30036a53704 --force /dev/sdc /dev/sdd /dev/sdf /dev/sdg


    [Edit]

    Of course, this didn't work. Sorry for confusion.

    Question is how can I access the data on the reassembled RAID volume? The configured file system is shown as "online" but does not show any information about available/used space.

  • Zitat


    Check the full output of mdadm -E /dev/sd[cf]


    It seems that [cf] is not interpreted properly, it actually show the information of the two drives, only. [cefg] is looking more complete

    I can see it here (last line) but how to continue?

  • The [cf] thing was my error, thinking your 4 drives were consecutively lettered cdef. The question is do the current fstab contents marry with the

    /dev/md127 fs UUID. If so, then mount -a should mount the xfs fs on the RAID10.

  • I've just tried that. The RAID10 is not available after reboot.


    A quick


    Code
    mdadm --assemble
    mdadm --readwrite /dev/md127
    mount -a

    worked of course, even without --force but I am a bit confused now. One thing that I have mentioned is that the boot drive has repeatedly changed the device name after reboot. It is now dev/sdf and was dev/sde before rebooting.

    Perhaps this is what confuses the system on startup?

  • OK, let' re-check the basics. This is OMV5, which by the way is long long overdue for upgrade to OMV6 and them OMV7, so does cat/etc/fstab need changing? Post output of cat /etc/fstab , blkid, and cat /proc/mdstat I'll be back tomorrow.

  • Thanks again! I'll be back around 19:00h Berlin Time.


    The name is misleading, it is actually Version 7.0-32 (Sandworm). I've built this as OMV v6 and upgraded to v7 a while a go.

    Shutting down / restarting has never been a problem (until yesterday).


    Even when I've tried to improve performance by using a PCI host controller everything appeared at the right place after connecting the drives from the onboard controller to the dedicated controller. Finally I've moved back to the onboard controller because there was only little performance improvement and the host controller took a lot of extra power, getting quite hot as well. But this back and forth was done many days before the incident.


    /dev/sdh1 is only temporarily connected


  • Bit of a late night post yesterday which should have read "If this is omv5". Anyway, looking back to your first post I'd forgotten your mdadm.conf file only had the details of one array. So I don't know if you'd over-written it at some stage. A copy of the mdadm.conf is part of the system initrd and should match the current /etc/mdadm/mdadm.conf file.


    Can you please post the output of:


    mdadm --detail --scan

    cat /etc/mdadm/mdadm.conf

  • Hey there, sorry for letting you wait


    Code
    root@omv5:~# mdadm --detail --scan
    ARRAY /dev/md/omv3:0 metadata=1.2 name=omv3:0 UUID=37da4029:f22c71a8:60e4374a:0a6da7fb
    ARRAY /dev/md127 metadata=1.2 name=omv5:0 UUID=7880e29c:7e108585:07a7de53:a0202f0c


  • Now it seems to be identical

    Code
    root@omv5:~# mdadm --detail --scan
    ARRAY /dev/md/omv3:0 metadata=1.2 name=omv3:0 UUID=37da4029:f22c71a8:60e4374a:0a6da7fb
    ARRAY /dev/md127 metadata=1.2 name=omv5:0 UUID=7880e29c:7e108585:07a7de53:a0202f0c

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!