RAID5 Failure Test

deeragon · 1. März 2019

I am a noob to OMV so please excuse my ignorance. First thing I have to say is I am really enjoying playing around with OMV4.

I created a RAID5 with 3 HD (4th HD is for the OS). Since I will be putting all my valuable family digital photos on this device I wanted to ensure I could do some simulated RAID5 failure/recovery before I transfer my TB of photos. Here was my test:
I disconnected the SATA cable to #3 HD while it was still running and as expected I saw clean, degraded under the RAID5 I setup.
I was still able to access a couple of files I put on the storage, good.
I reconnected the SATA cable and OMV4 recognized #3 HD with same name, but the RAID5 was still clean, degraded
I tried to do "Recovery" on the RAID5 but no HD showed up in the selection box.
I rebooted and repeated the same process with no success.

I was able to recover the RAID5 by going to the "Disks" selecting #3 HD and doing a "Wipe"
Then I went to the "RAID Management" and selected the RAID5 to do the "Recovery" which then the #3 HD showed up...all good back to normal, it rebuilt the RAID5.

QUESTION: Is there an easier way to re-initiate #3 HD in the RAID5?
It seems unnecessary to have to rebuild the RAID5.

crashtest · 1. März 2019

Zitat von deeragon

Is there an easier way to re-initiate #3 HD in the RAID5?

Outside of commands on the command line, I doubt it. This is the nature of RAID. The way you're failing drives and recovering might be useful to let you see what the process is, but it's not the way a drive would fail in a real world. (The wipe was necessary to reset the drive's partition info and flags.) When a drive fails, you're going to be replacing it with another drive, not attempting to reinitialize a failed drive in place. BTW: Before attempting to add a new drive to any RAID array type, the wipe operation is required (or at least a very good idea).

Are you sure about doing RAID? It's not going to protect your data, as many think it will. There are plenty of horror stories on this forum, and others, about dead or dying arrays and data loss. If you want to preserve "one of a kind data", that can't be restored for another source (those photos you mentioned), you need "backup".

deeragon · 1. März 2019

Thanks for the quick reply flmaxey.

Would you know the command lines to do this quickly?

As for backup, yes I have everything on external usb hard drives. Any other suggestions for backups?

crashtest · 1. März 2019

Zitat von deeragon

Would you know the command lines to do this quickly?

Quickly? - not really. That was a caveat, because if it can be done in the GUI, it can be done on the command line. The "dd" command can wipe drives. If you're interested in trying dd, there are a few examples -> here. Wiping with dd, as in this example dd if=/dev/zero of=/dev/sdb (where zeros are written to every bit) will not be fast.

Zitat von deeragon

As for backup, yes I have everything on external usb hard drives. Any other suggestions for backups?

Without knowing how you're doing backup, skill level, etc., all I can do is point you to this -> guide. The Rsync command line on page 61 (the current version) can be adapted, and automation in Scheduled jobs might be useful to you.

And here's a direct link to the Guides section forum. You might find other items of misc interest there.

geaves · 1. März 2019

Zitat von deeragon

QUESTION: Is there an easier way to re-initiate #3 HD in the RAID5?

Command line using mdadm --assemble if you look through the raid section you should find the command to do this, but that would only work because of the simulation as the drives partition info and flags are still on the drive. This will give some useful information regarding information should your system display degraded or simply not there.

crashtest · 1. März 2019

Zitat von geaves

Command line using mdadm --assemble

But that would just drop a drive back into the array, as if it was merely disconnected. I thought the idea was to speed up the processes of a rebuild.
(Maybe I missed, @deeragon 's point.)

geaves · 1. März 2019

mdadm --assemble etc would have put the drive back into the array without the need to wipe it, as the question was "Is there an easier way to re-initiate #3 HD in the RAID5?"
If the OP had rebooted the Raid would have come up inactive removing the cables placed the raid in a clean/degraded state, connecting the cables back and running mdadm --assemble --verbose --force /dev/md? /dev/sd[???] (the ? being replaced with the correct information) would have brought the raid back up to clean.

So the easier way in the above scenario is the command line.

crashtest · 1. März 2019

Again, I suppose I missed the point. (I thought it was about failure testing, not physically disconnecting and reconnecting a drive.)
In any case, I have to applaud the effort to test failure scenarios.

geaves · 1. März 2019

Zitat von flmaxey

(I thought it was about failure testing, not physically disconnecting and reconnecting a drive.)

It was, but I think (and no disrespect to @deeragon) simply pulling a drive and/or cables to 'test' a drive failure is one thing, but plugging the drive/cables back in and expect the raid to come back up clean is another. The only way the raid may have come back after doing the 'test' and reconnecting the cables, was to reboot.

deeragon · 1. März 2019

No disrepect taken. Thank you very much for all the useful information provided. I will spend this weekend playing around some more and read the reference links provided along with suggestions.

crashtest · 1. März 2019

(In OMV4)
Just remember, to simulate an actual drive failure and what it takes to rebuild, you'll either need to disconnect a drive (physically) or use the Remove button in the GUI. Then "wipe" the drive, in Storage, Disks, to remove partition data and make the drive appear to be new (or at least unused). Thereafter, adding the same drive back using the Recover button will be much like adding a new drive in a recovery process.
____________________________________________

Here's what I think you should keep mind, for a few years down the road:

- The RAID5 recovery process works with relatively new drives. That is to say, if you have one drive that has an unknown defect, and it dies young, the recovery process will integrate a new drive into the array. This is the "ideal" scenario.
- What is most likely to happen is, one of the drives in the array will fail when it gets "old". Unfortunately, the remaining drives in the array will be "geriatric" as well.
- The recovery process for drives on the large side (1 TB and up) takes awhile. It can take hours to, potentially, days. (Depends on drive size and other factors.) During the restripe, all of the remaining older drives are being flogged - the process is a drive torture test.
- If a second older drive fails in the process, and this is a known issue, the array is lost with it's data.

This plays into human nature where all works fine, we get complacent over time, and there's a failure out of the blue. This is why RAID can actually be dangerous. The focus should on solid, known good, backup. (And it would be a pity to "think" you have good backup to find, when you need it, that it's not there or you can't restore it.)

Jetzt mitmachen!