Raid1 Failure

    • OMV 2.x
    • Raid1 Failure

      I've had a raid1 array failure, it looks like it happened overnight during a resync. Normally I would try to sort it out on my own, but with SMART being disabled on both drives for some reason, I'm less confident. Please let me know what info you need, or steps I should take to get my array functioning again.

      As per ryecoaaron and Degraded or missing raid array questions Here is my info:

      Source Code

      1. root@RoachHotel:~# cat /proc/mdstat
      2. Personalities : [raid1]
      3. md2 : active raid1 sdc[0] sdb[1]
      4. 4883639360 blocks super 1.2 [2/2] [UU]
      5. [===================>.] check = 96.7% (4723518144/4883639360) finish=25.8min speed=103240K/sec
      6. md1 : active raid1 sdd[0] sde[1]
      7. 1953383360 blocks super 1.2 [2/2] [UU]
      8. md0 : active raid1 sdf[0] sdg[1](F)
      9. 2930135360 blocks super 1.2 [2/1] [U_]
      10. unused devices: <none>
      Display All




      Source Code

      1. root@RoachHotel:~# blkid
      2. /dev/sda1: UUID="5a75f3af-251a-43c2-b1c6-d8bfb62b53ce" TYPE="ext4"
      3. /dev/sde: UUID="e371445a-e72e-2298-0b4d-f961a7f3b799" UUID_SUB="0c838662-4d40-287a-b137-28a93821195e" LABEL="RoachHotel:Volume2" TYPE="linux_raid_member"
      4. /dev/sdd: UUID="e371445a-e72e-2298-0b4d-f961a7f3b799" UUID_SUB="5b137bcd-d7fd-0a2f-1aab-3dd3c712d87b" LABEL="RoachHotel:Volume2" TYPE="linux_raid_member"
      5. /dev/sdb: UUID="81aa4e20-7495-6f6b-39df-90c3236e8a53" UUID_SUB="e3fd4806-9ec2-f1cf-5c67-0539feec47f5" LABEL="RoachHotel:Volume3" TYPE="linux_raid_member"
      6. /dev/md1: LABEL="Volume2" UUID="c806cd00-887e-4f26-a629-250a06182018" TYPE="ext4"
      7. /dev/md2: LABEL="Volume3" UUID="cc3b5002-20d9-44bf-a583-e76fcbeb8bdd" TYPE="ext4"
      8. /dev/sdc: UUID="81aa4e20-7495-6f6b-39df-90c3236e8a53" UUID_SUB="de60ab9e-0ce8-e732-9008-eada17f76806" LABEL="RoachHotel:Volume3" TYPE="linux_raid_member"

      Source Code

      1. root@RoachHotel:~# fdisk -l | grep "Disk "
      2. Disk /dev/sdb doesn't contain a valid partition table
      3. Disk /dev/sdc doesn't contain a valid partition table
      4. Disk /dev/sdd doesn't contain a valid partition table
      5. Disk /dev/sde doesn't contain a valid partition table
      6. Disk /dev/md1 doesn't contain a valid partition table
      7. Disk /dev/md2 doesn't contain a valid partition table
      8. Disk /dev/sda: 32.0 GB, 32017047552 bytes
      9. Disk identifier: 0x000753d0
      10. Disk /dev/sdb: 5001.0 GB, 5000981078016 bytes
      11. Disk identifier: 0x00000000
      12. Disk /dev/sdc: 5001.0 GB, 5000981078016 bytes
      13. Disk identifier: 0x00000000
      14. Disk /dev/sdd: 2000.4 GB, 2000398934016 bytes
      15. Disk identifier: 0x00000000
      16. Disk /dev/sde: 2000.4 GB, 2000398934016 bytes
      17. Disk identifier: 0x00000000
      18. Disk /dev/md1: 2000.3 GB, 2000264560640 bytes
      19. Disk identifier: 0x00000000
      20. Disk /dev/md2: 5000.8 GB, 5000846704640 bytes
      21. Disk identifier: 0x00000000
      Display All

      Source Code

      1. root@RoachHotel:~# cat /etc/mdadm/mdadm.conf
      2. # mdadm.conf
      3. #
      4. # Please refer to mdadm.conf(5) for information about this file.
      5. #
      6. # by default, scan all partitions (/proc/partitions) for MD superblocks.
      7. # alternatively, specify devices to scan, using wildcards if desired.
      8. # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
      9. # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
      10. # used if no RAID devices are configured.
      11. DEVICE partitions
      12. # auto-create devices with Debian standard permissions
      13. CREATE owner=root group=disk mode=0660 auto=yes
      14. # automatically tag new arrays as belonging to the local system
      15. HOMEHOST <system>
      16. # definitions of existing MD arrays
      17. ARRAY /dev/md0 metadata=1.2 name=RoachHotel:Volume1 UUID=734cfdf1:3652f210:cf4a6e9b:f35a4fe0
      18. ARRAY /dev/md1 metadata=1.2 name=RoachHotel:Volume2 UUID=e371445a:e72e2298:0b4df961:a7f3b799
      19. ARRAY /dev/md2 metadata=1.2 name=RoachHotel:Volume3 UUID=81aa4e20:74956f6b:39df90c3:236e8a53
      20. # instruct the monitoring daemon where to send mail alerts
      21. MAILADDR coldshouldermedia@yahoo.com
      22. MAILFROM root
      Display All

      Source Code

      1. root@RoachHotel:~# mdadm --detail --scan --verbose
      2. ARRAY /dev/md0 level=raid1 num-devices=2 metadata=1.2
      3. devices=/dev/sdf,/dev/sdg
      4. ARRAY /dev/md1 level=raid1 num-devices=2 metadata=1.2 name=RoachHotel:Volume2 UUID=e371445a:e72e2298:0b4df961:a7f3b799
      5. devices=/dev/sdd,/dev/sde
      6. ARRAY /dev/md2 level=raid1 num-devices=2 metadata=1.2 name=RoachHotel:Volume3 UUID=81aa4e20:74956f6b:39df90c3:236e8a53
      7. devices=/dev/sdc,/dev/sdb
      Images
      • temp.png

        11.91 kB, 683×166, viewed 141 times
    • ColdShoulderMedia wrote:

      It seems like just a single disk failure

      Maybe, maybe not. You did not provide dmesg or /var/log/syslog output so at least I find it pretty hard to guess what might have happened? In fdisk output above /dev/sdg is totally missing, this can be the result of a completely dead drive or of $whatever. I've seen drives kicked out of arrays for various reasons already (cable/contact issues, underpowering, vibrations, sometimes even a disk problem or even a 'dead disk')

      @ryecoaaron and others: Wouldn't it be a nice feature to allow OMV sending such 'debug logs' as Armbian already does (example). At least dmesg output, SMART health data (and attribute 199!) and HBA info should be collected and then submitted to an online pasteboard service as above.
    • tkaiser wrote:

      Wouldn't it be a nice feature to allow OMV sending such 'debug logs' as Armbian already does
      omv-extras used to have a way to submit a report that was stored on the omv-extras server but not many people used it. People can also download a system information report from the Diagnostics -> System Information -> Report tab. But, I admit that armbianmonitor works pretty well. What would you suggest - fork armbianmonitor, just use it as is more systems, other ideas?
      omv 4.1.11 arrakis | 64 bit | 4.15 proxmox kernel | omvextrasorg 4.1.11
      omv-extras.org plugins source code and issue tracker - github

      Please read this before posting a question and this and this for docker questions.
      Please don't PM for support... Too many PMs!
    • ryecoaaron wrote:

      I admit that armbianmonitor works pretty well. What would you suggest - fork armbianmonitor, just use it as is more systems, other ideas?
      Well, I think having a tool like omv-diag ready that can be used directly via SSH (and maybe later in some way through the web UI) to
      • upload a brief system overview (containing system overview and at least last 200 dmesg lines)
      • collect a more complete report somewhere to be manually pasted to an online pasteboard service
      would already be nice to support such issues like this here. Since why does a disk disappear from the bus? If the reason is a faulty cable or contact issues then sure a replacement disk will help since it fixes the real issue ('connection loss') by accident too.

      And armbianmonitor is not the right tool to fork (since in Armbian the main logging happens at every startup and goes to /var/log/armhwinfo.log where it will be collected later when armbianmonitor uploads stuff) so maybe starting with another fork that already took care of this? Eg. github.com/ayufan-rock64/linux…bin/rock64_diagnostics.sh


      ColdShoulderMedia wrote:

      Just let me know what else you need

      In case you haven't rebooted yet, the output from the following command would be useful:

      Source Code

      1. dmesg | curl -F 'sprunge=<-' http://sprunge.us