Disk dropped out of array

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • Disk dropped out of array

      Over the last few days I noticed that when I'd copy a file to OMV that one of the drive lights would flicker for a second, then all four would flicker. Last night I saw the opposite - three would flicker, one never would. I looked on the web interface and the raid reports "clean but degraded" and sda, sdb, and sdc were in the raid, but sdd was missing. I came here and saw a post that requested this info:
      root@helios4:/home/larry# cat /proc/mdstat
      Personalities : [raid10]
      md0 : active raid10 sda[0] sdc[3] sdb[2]
      15627790336 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
      bitmap: 26/117 pages [104KB], 65536KB chunk

      unused devices: <none>

      root@helios4:/home/larry# blkid
      /dev/mmcblk0p1: UUID="1f489a8c-b3a3-4218-b92b-9f1999841c52" TYPE="ext4" PARTUUID="7fb57f23-01"
      /dev/sda: UUID="d1e18bf2-0b0e-760b-84be-c773f4dbf945" UUID_SUB="09d3b6c9-f312-e5b8-14c4-dc128ed0abde" LABEL="helios4:Store" TYPE="linux_raid_member"
      /dev/sdb: UUID="d1e18bf2-0b0e-760b-84be-c773f4dbf945" UUID_SUB="b2ad94c4-58fa-554e-0508-fb8cbf6f6eec" LABEL="helios4:Store" TYPE="linux_raid_member"
      /dev/md0: UUID="GmgEll-khiX-a7DB-5HNZ-KGRm-5vGq-1vPV4w" TYPE="LVM2_member"
      /dev/sdc: UUID="d1e18bf2-0b0e-760b-84be-c773f4dbf945" UUID_SUB="4235d123-bcec-3f18-5ec2-6e530400c8b4" LABEL="helios4:Store" TYPE="linux_raid_member"
      /dev/mapper/Store-Store: LABEL="Store" UUID="6c7b4b44-4cae-4169-95fe-d9a14d04e814" TYPE="ext4"
      /dev/zram0: UUID="e94d3e0b-c8fb-4b8c-b780-035797842a7d" TYPE="swap"
      /dev/zram1: UUID="4b9d8a94-1260-49a9-b23f-57fb627229d6" TYPE="swap"
      /dev/sdd: UUID="d1e18bf2-0b0e-760b-84be-c773f4dbf945" UUID_SUB="f69820ac-517a-b430-0a2b-ae6c52d1922f" LABEL="helios4:Store" TYPE="linux_raid_member"
      /dev/mmcblk0: PTUUID="7fb57f23" PTTYPE="dos"
      /dev/mmcblk0p2: PARTUUID="7fb57f23-02"

      root@helios4:/home/larry# cat /etc/mdadm/mdadm.conf
      # mdadm.conf
      # Please refer to mdadm.conf(5) for information about this file.

      # by default, scan all partitions (/proc/partitions) for MD superblocks.
      # alternatively, specify devices to scan, using wildcards if desired.
      # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
      # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
      # used if no RAID devices are configured.
      DEVICE partitions

      # auto-create devices with Debian standard permissions
      CREATE owner=root group=disk mode=0660 auto=yes

      # automatically tag new arrays as belonging to the local system
      HOMEHOST <system>

      # definitions of existing MD arrays
      ARRAY /dev/md0 metadata=1.2 name=helios4:Store UUID=d1e18bf2:0b0e760b:84bec773:f4dbf945

      root@helios4:/home/larry# mdadm --detail --scan --verbose
      ARRAY /dev/md0 level=raid10 num-devices=4 metadata=1.2 name=helios4:Store UUID=d1e18bf2:0b0e760b:84bec773:f4dbf945

      Is there a was to tell what happened to the fourth drive and get it back in?
      thanks, Larry

      HELP!! A second disk has dropped out of the array. Now sdb and sdd are missing. Physical drives still shows all four. How do I figure out why the system isn't using them and how to convince it to put them back in? I have a spare drive that I can sub-in if that will help.

      Can someone at least help me ensure I am interpreting the lsscsi output correctly? Given this:
      lsscsi --verbose[0:0:0:0] disk ATA ST8000DM004-2CX1 0001 /dev/sda dir: /sys/bus/scsi/devices/0:0:0:0 [/sys/devices/platform/soc/soc:internal-regs/f10a8000.sata/ata1/host0/target0:0:0/0:0:0:0][1:0:0:0] disk ATA ST8000DM004-2CX1 0001 /dev/sdb dir: /sys/bus/scsi/devices/1:0:0:0 [/sys/devices/platform/soc/soc:internal-regs/f10a8000.sata/ata2/host1/target1:0:0/1:0:0:0][2:0:0:0] disk ATA ST8000DM004-2CX1 0001 /dev/sdc dir: /sys/bus/scsi/devices/2:0:0:0 [/sys/devices/platform/soc/soc:internal-regs/f10e0000.sata/ata3/host2/target2:0:0/2:0:0:0][3:0:0:0] disk ATA ST8000DM004-2CX1 0001 /dev/sdd dir: /sys/bus/scsi/devices/3:0:0:0 [/sys/devices/platform/soc/soc:internal-regs/f10e0000.sata/ata4/host3/target3:0:0/3:0:0:0]do I take it that sda is on sata1, sdb on sata2, sdc on sata3 and sdd on sata4I have a spare drive that I can plug in, but I'd kinda like to make sure I put it on sdb or sddthanks

      The post was edited 2 times, last by LarryM04: second drive lost ().

    • You might be able to get an idea of what's going on Storage, RAID Management, Click on the array and then the Detail button.

      I can see that you're booting from a flash device, so;

      In most cases, /dev/sda is assigned to the first sata port. (The markings on boards may vary - the first port can be sata0 or sata1) /dev/sdb goes to the next sata port and so on.

      Regarding, "getting the drive(s) back in":
      It was booted out of the array for a reason. Have you looked at SMART data yet?
      You can find it under Storage, SMART, the Devices tab. Click on a drive and then the information button. This will give you extended SMART data and drive attributes.

      The most important smart stat's - to predict failure
      SMART 5 – Reallocated_Sector_Count.
      SMART 187 – Reported_Uncorrectable_Errors.
      SMART 188 – Command_Timeout.
      SMART 197 – Current_Pending_Sector_Count.
      SMART 198 – Offline_Uncorrectable.

      Any of the above could result in a drive being booted. If they're incrementing, failure is just a matter of time.

      The following might cause a drive to be booted and, usually, it's related to hardware or cabling.
      SMART 199 - UltraDMA CRC errors

      I'd try to replace /dev/sdd first, since it was the first to be booted, if you have a spare.

      Since you're running mdadm RAID, you might find this link useful at some point. -> Recovering RAID

      FYI: I'm going out of town for 3 days or so, starting tomorrow.
      Good backup takes the "drama" out of computing
      OMV 3.0.99, ThinkServer TS140, 12GB ECC, 32GB USB boot, 4TB+4TB zmirror, 3TB client backup.
      Backup Server:
      OMV, Acer RS-111, 4GB, 32GB USB boot, 3TB+3TB+4TB Rsync'ed disks+SNAPRAID
      2nd Data Backup
      R-PI 2B, 16GB boot, 4TB WD (USB)

      The post was edited 1 time, last by flmaxey: edit ().

    • Thank you very much for the reply. The hardware is a Helios system and I've also been working with the Kobol people trying to figure this out. Long story a little shorter, after a few hardware tests what I ended up doing was using the "Wipe" on one of the physical drives not in the raid, then going to the raid tab and recovering it back in. These are 8TB drives so in 700+ minutes I should be able to repeat for the other one and then be back to full strength. :) Not sure why neither drive showed there until I wiped (not even a brand new drive), but okay, on my way now.

      While I was initially convinced it was a hardware problem, that the wipe (which I assume means "re-format") is now allowing the drive to be recovered, I now assume this means there was some sort of data problem on the drive which the wipe cleared. Right?

      One more question to any raid expert: I had these four drives set up in RAID 10, meaning stripped and mirrored. Does it pair two physical drives and stripe data between them, pair the other two, then mirror the two pairs. Or is that it makes two drives mirrors, then strips the two mirrored pairs. The difference is subtle, I'm just curious.

      thanks, Larry
    • (Without knowing the age of the drives - I assumed they were relatively new.)
      I'd still be somewhat concerned about the reason "why". On the other hand, mdadm RAID has been known to throw a drive out of an array without a clear explanation.

      The wipe function (quick wipe) clears the partition table and any non-persistent partition flags. It doesn't actually wipe the drive. LVM, mdadm RAID (on occasion), and GPT partition flags have been known to survive the wipe process. If you want to wipe a drive, removing ALL partition information and flags, the only truly reliable way I know of is to use DBAN. (Since it starts in the boot sector, just a few minutes is all that's needed.)

      On the RAID question, you're talking about the difference between RAID 10, 2 mirrors that are striped together, and RAID 01 two drive strips, that are mirrored. Here's a decent explanation of the two -> here. The synopsis is at the bottom of the article.
      Good backup takes the "drama" out of computing
      OMV 3.0.99, ThinkServer TS140, 12GB ECC, 32GB USB boot, 4TB+4TB zmirror, 3TB client backup.
      Backup Server:
      OMV, Acer RS-111, 4GB, 32GB USB boot, 3TB+3TB+4TB Rsync'ed disks+SNAPRAID
      2nd Data Backup
      R-PI 2B, 16GB boot, 4TB WD (USB)
    • Yes, they were brand new drives.

      As to why - the hardware guys blame the current draw of the drives. These are 4 Seagate Barracuda's that I ripped out of USB drives (I got all four 8TB drives for just under $125 each). I thought being 7200 RPM drives would be a good thing. I looked it up and the current draw is 2A, on a NAS drive 1.8A. I dunno, 200 milliamps doesn't sound like a big difference to me, but that's their thought.
    • A higher spin rate generally provides better access times and higher transfer rate. However, with faster rotation speeds comes higher current and significantly more heat.

      This is just an opinion:
      I don't buy their explanation. I'm not saying that it's not true, but if it is, their PS design criteria seems to be lacking. We're talking about a difference of 10%. Good design in power supplies would demand far more than 10% for margin. Further, motor start up surge current is far higher than when they're spinning in normal operations.

      I've seen pic's of the Helios box. It appears to be designed for 4 drives and for RAID operation. If it was me, I'd design for 15K RPM drives (at their start up current) with a minimum of 30% margin beyond that. And I'd try to find the worst case scenario, highest current drives. Why? As PS's burn in, their rated max stable output tends to decrease a bit. The potential return rate might create a problem for the company.

      On the other hand, this wouldn't be the first time support pointed a finger when they couldn't explain odd behavior.
      Good backup takes the "drama" out of computing
      OMV 3.0.99, ThinkServer TS140, 12GB ECC, 32GB USB boot, 4TB+4TB zmirror, 3TB client backup.
      Backup Server:
      OMV, Acer RS-111, 4GB, 32GB USB boot, 3TB+3TB+4TB Rsync'ed disks+SNAPRAID
      2nd Data Backup
      R-PI 2B, 16GB boot, 4TB WD (USB)
    • Oh the Helios box is a really nice little system. Up to four 3.5" disks, two USB-3's, and gigabyte ethernet, runs a Debian Linux and OMV, in a case that's not much bigger than the 4 drives, and a very reasonable price. Other than a little beefier power supply the only change I'd like to see is a HDMI port so I could run conky on it.

      I use mine to hold tons of movies and tv shows for my kodi devices - it even has mysql so they all keep in sync.

      Very nice little box.