How to remove HD from RAID5 for check sectors

    • OMV 4.x

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • How to remove HD from RAID5 for check sectors

      Hi all, I have a configuration with 4x4TB WD disks and one of them present some smart error, I opened an RMA and before sending a new one wd asks me to do a low-level format and verify If the error persists ( you can see smart data attached).

      For first step I powered off my OMV box, removed the sata cable and started again OMV with 3 disks, I expected to see my raid configuration working in degraded state but in RAID management I cannot see my RAID configuration and obviously my SMB share dows not works. Reverted the process everything is good and clean.

      How can I do this check? I would like to do It while my RAID configuration continues to work in degraded state, and when I will replace the disk ( with a new one or mine fixed with low-level format ) restore the 4-disks RAID configuration.

      Thank you
      Images
      • Cattura.JPG

        96.52 kB, 1,382×720, viewed 94 times
    • vatastala wrote:

      For first step I powered off my OMV box, removed the sata cable and started again OMV with 3 disks, I expected to see my raid configuration working in degraded state but in RAID management I cannot see my RAID configuration and obviously my SMB share dows not works. Reverted the process everything is good and clean.
      This is standard behaviour for mdadm (software raid), put simply you have to 'tell' mdadm what to do unlike a hardware raid which has a controller to look after it.

      To remove the drive from the GUI -> Raid Management -> Select the Array -> on the menu select Remove -> in the dialogue box select the drive to remove, click OK. The drive can now be removed from the system and the raid will appear as clean/degraded.

      Effectively what you did makes sense but it would have left the array as inactive after a restart.
      Raid is not a backup! Would you go skydiving without a parachute?
    • geaves wrote:

      vatastala wrote:

      For first step I powered off my OMV box, removed the sata cable and started again OMV with 3 disks, I expected to see my raid configuration working in degraded state but in RAID management I cannot see my RAID configuration and obviously my SMB share dows not works. Reverted the process everything is good and clean.
      This is standard behaviour for mdadm (software raid), put simply you have to 'tell' mdadm what to do unlike a hardware raid which has a controller to look after it.
      To remove the drive from the GUI -> Raid Management -> Select the Array -> on the menu select Remove -> in the dialogue box select the drive to remove, click OK. The drive can now be removed from the system and the raid will appear as clean/degraded.

      Effectively what you did makes sense but it would have left the array as inactive after a restart.
      In fact, what happened scares me ... If a drive breaks one day suddenly, will I never see my RAID again? How will I recover it? I'm also scared to do the operation you say, I have a lot of data on the disks and I don't have any more space to backup...
    • geaves wrote:

      vatastala wrote:

      For first step I powered off my OMV box, removed the sata cable and started again OMV with 3 disks, I expected to see my raid configuration working in degraded state but in RAID management I cannot see my RAID configuration and obviously my SMB share dows not works. Reverted the process everything is good and clean.
      This is standard behaviour for mdadm (software raid), put simply you have to 'tell' mdadm what to do unlike a hardware raid which has a controller to look after it.
      To remove the drive from the GUI -> Raid Management -> Select the Array -> on the menu select Remove -> in the dialogue box select the drive to remove, click OK. The drive can now be removed from the system and the raid will appear as clean/degraded.

      Effectively what you did makes sense but it would have left the array as inactive after a restart.
      In fact, what happened scares me ... If a drive breaks one day suddenly, will I never see my RAID again? How will I recover it? I'm also scared to do the operation you say, I have a lot of data on the disks and I don't have any more space to backup...
    • vatastala wrote:

      In fact, what happened scares me ... If a drive breaks one day suddenly, will I never see my RAID again? How will I recover it?
      Ok, Raid5 will only allow for one drive failure, the fact you are about to return one, and to do that you need to remove it from the array and your server. This leaves the current array in a clean/degraded state another drive failure and your data is gone!! Believe me this can happen and it has happened to me but only in work environment.

      You have five choices,
      1. Switch to Raid6 if you insist on using a Raid option, this will allow for two drives to fail but reduces the space available.
      2. Initiate a backup either USB or another internal and backup data you don't want to lose.
      3. Use something like MergerFS and Snapraid, Snapraid uses a drive for parity, search the forum there is plenty of information.
      4. Use ZFS this is a more fault tolerant, but having used it previously it's something I don't want to revisit.
      5. You have four drives, you could use two for data and two running rsync, rsync can run after hours as scheduled job, this is sort of Raid1 and I use sort of loosely. What rsync will do is sync the data drive to the second giving you your data in two places, the working drive and the backup drive, so if one fails you get another drive and sync it back.

      Most home users implement a Raid because they believe it's 'the thing to do' as most hardware NAS solutions do this, but there is big difference with those, they implement a Raid controller with software but it's specific to that NAS.

      Software raid (mdadm) does not do that it is in itself software and a user needs to understand not just how to set it up but how to recover from any given problem, most of which has to be done from the command line, this again is something some users are not comfortable with.

      My sig tells a story in itself most users assume their is safe on a raid there are users on here using large raid arrays but they back it up because their experience tells them that that the raid could fail. I for one moved from Raid to MergerFS and Snapraid but I also have a drive that runs rsync to back up the stuff I don't want to lose, something that some users for get to do.

      You read some users experiences on here and someone like @Adoby uses HC2's with a large drive, but one backs up another, it's a different approach, just because you want a NAS does not mean you need a box full of hard drives running a raid setup.

      Sorry, rant over :D
      Raid is not a backup! Would you go skydiving without a parachute?
    • geaves wrote:

      vatastala wrote:

      For first step I powered off my OMV box, removed the sata cable and started again OMV with 3 disks, I expected to see my raid configuration working in degraded state but in RAID management I cannot see my RAID configuration and obviously my SMB share dows not works. Reverted the process everything is good and clean.
      This is standard behaviour for mdadm (software raid), put simply you have to 'tell' mdadm what to do unlike a hardware raid which has a controller to look after it.
      To remove the drive from the GUI -> Raid Management -> Select the Array -> on the menu select Remove -> in the dialogue box select the drive to remove, click OK. The drive can now be removed from the system and the raid will appear as clean/degraded.

      Effectively what you did makes sense but it would have left the array as inactive after a restart.
      In fact, what happened scares me ... If a drive breaks one day suddenly, will I never see my RAID again? How will I recover it? I'm also scared to do the operation you say, I have a lot of data on the disks and I don't have any more space to backup...

      geaves wrote:

      vatastala wrote:

      In fact, what happened scares me ... If a drive breaks one day suddenly, will I never see my RAID again? How will I recover it?
      Ok, Raid5 will only allow for one drive failure, the fact you are about to return one, and to do that you need to remove it from the array and your server. This leaves the current array in a clean/degraded state another drive failure and your data is gone!! Believe me this can happen and it has happened to me but only in work environment.
      You have five choices,
      1. Switch to Raid6 if you insist on using a Raid option, this will allow for two drives to fail but reduces the space available.
      2. Initiate a backup either USB or another internal and backup data you don't want to lose.
      3. Use something like MergerFS and Snapraid, Snapraid uses a drive for parity, search the forum there is plenty of information.
      4. Use ZFS this is a more fault tolerant, but having used it previously it's something I don't want to revisit.
      5. You have four drives, you could use two for data and two running rsync, rsync can run after hours as scheduled job, this is sort of Raid1 and I use sort of loosely. What rsync will do is sync the data drive to the second giving you your data in two places, the working drive and the backup drive, so if one fails you get another drive and sync it back.

      Most home users implement a Raid because they believe it's 'the thing to do' as most hardware NAS solutions do this, but there is big difference with those, they implement a Raid controller with software but it's specific to that NAS.

      Software raid (mdadm) does not do that it is in itself software and a user needs to understand not just how to set it up but how to recover from any given problem, most of which has to be done from the command line, this again is something some users are not comfortable with.

      My sig tells a story in itself most users assume their is safe on a raid there are users on here using large raid arrays but they back it up because their experience tells them that that the raid could fail. I for one moved from Raid to MergerFS and Snapraid but I also have a drive that runs rsync to back up the stuff I don't want to lose, something that some users for get to do.

      You read some users experiences on here and someone like @Adoby uses HC2's with a large drive, but one backs up another, it's a different approach, just because you want a NAS does not mean you need a box full of hard drives running a raid setup.

      Sorry, rant over :D
      I understand everything and totally agree, but for the moment I can take the "risk" because my disks are really new, for sure another one can die but I will not use the remaining 3 disks to much till my new one come back, I'll take them powered off so for the moment It's ok...

      So, regarding my question, do you think I can go ahead with the operation you suggested?
    • By using RAID you combine the capacity of several drives and add a number of redundant drives. This is nice and very easy to do. The redundancy may allow you to continue to access the files on the RAID if no more than the number of redundant drive(s) fail. But if more than the redundant number of drive(s) fail, you loose EVERYTHING! Unless you have good backups.

      And typically when a drive in a RAID fail you replace it and rebuild the array. Or you desperately start to backup the files, to save them. This means a lot of extra work for the remaining drives. And if one or more of the remaining drives are old and also close to failing, this is when they are the most likely to fail. And you loose EVERYTHING. This problem is why RAID5 with modern big drives is a bad idea. It is likely that more drives fail while you try to rebuild or backup the broken RAID. RAID6 might help a bit.

      So if you use RAID it is MORE important to have good backups. Not less. The direct opposite of what some seem to believe.

      This is why I don't use RAID. Instead I make sure I have good backups, in several generations and some even at different locations.

      Typically you don't need backups for everything. Just for the files you really don't want to loose.
      OMV 4: 9 x Odroid HC2 + 1 x Odroid HC1 + 1 x Raspberry Pi 4
    • geaves wrote:

      vatastala wrote:

      So, regarding my question, do you think I can go ahead with the operation you suggested?
      To remove the drive using the GUI, yes, the GUI allows you to remove and add a drive even grow it.
      Ok, so I'm going to remove It, verify that everything is ok and my data is accessible in samba even if the pool is in degraded state, and switch off the box until I'll have a new disk. At that point, I'll add It to the pool and rebuild the array.

      Thank you very much
    • vatastala wrote:

      geaves wrote:

      vatastala wrote:

      So, regarding my question, do you think I can go ahead with the operation you suggested?
      To remove the drive using the GUI, yes, the GUI allows you to remove and add a drive even grow it.
      Ok, so I'm going to remove It, verify that everything is ok and my data is accessible in samba even if the pool is in degraded state, and switch off the box until I'll have a new disk. At that point, I'll add It to the pool and rebuild the array.
      Thank you very much
      Before doing that, I tried to remove another drive and the behavior is the same, everything disappear in the GUI...I really don't understand why, I mean I expect to see my pool degraded...If in the future one drive will fail, I'll never see my degraded pool? I'll lose my data? Very strange...
    • vatastala wrote:

      Before doing that, I tried to remove another drive and the behavior is the same, everything disappear in the GUI...I really don't understand why,
      Re read post 2, mdadm (software raid) does not behave like this you cannot simply 'pull' a drive, basically it has no intelligence it requires input, unlike a hardware raid that has an 'intelligent' controller
      Raid is not a backup! Would you go skydiving without a parachute?
    • geaves wrote:

      vatastala wrote:

      Before doing that, I tried to remove another drive and the behavior is the same, everything disappear in the GUI...I really don't understand why,
      Re read post 2, mdadm (software raid) does not behave like this you cannot simply 'pull' a drive, basically it has no intelligence it requires input, unlike a hardware raid that has an 'intelligent' controller
      Ok so you mean that one day, when a drive will fail, I will input something and I'll see my pool degraded?
    • geaves wrote:

      vatastala wrote:

      Ok so you mean that one day, when a drive will fail, I will input something and I'll see my pool degraded?
      If a drive fails on it's own the array will show clean/degraded, if you 'pull' a drive and reboot the array will not show in the GUI that's because it comes back up as inactive.
      Ok perfect It worked, on boot I have to click ctrl+d for the default conf and I seem It degraded.

      Thank you very much for the help and clarifications :)
    • Hi all, after some time WD sent to me the new disk, I checked It and It seems ok, recertified disk...

      I added It to the raid pool and It reconstructed the array, that is clean...

      The problem is...at next reboot I see that there's the CTRL+D message, how is possible? I have to do many times or wait some minutes before have the GUI ready...



      Also, the filesystem is no longer mounted and I see some errors, I explain...

      On boot I see that fsck cannot be done, maybe there are errors, so I try to launch fsck -f /dev/md0 from CLI but I receive this output:

      root@openmediavault:~# fsck -f /dev/md0
      fsck from util-linux 2.29.2
      e2fsck 1.44.5 (15-Dec-2018)
      fsck.ext4: Inode checksum does not match inode while checking journal for /dev/md0
      e2fsck: Cannot proceed with file system check

      /dev/md0: ********** WARNING: Filesystem still has errors **********

      What can I try to do? I cannot longer access my data...

      Also, I see on the GUI a new entry but I remember that there were only 2, /dev/sda1 ( the OS ) and /dev/md0, why there's a new entry?
      Images
      • Cattura3.JPG

        34.98 kB, 1,009×228, viewed 9 times
      • Cattura3.JPG

        47.31 kB, 934×161, viewed 8 times

      The post was edited 1 time, last by vatastala ().

    • vatastala wrote:

      on boot I have to click ctrl+d for the default conf and I seem It degraded.
      This from your post 14 should set alarm bells ringing that should not be necessary and would indicate there is an issue with the filesystem!!

      You then see the message again in your latest post, something is wrong with your system, my guess is something of your own making, your next choice is to download and create a SystemRescueCD read the documentation and use that to hopefully correct any errors.
      Raid is not a backup! Would you go skydiving without a parachute?
    • Trying with that CD I wasn't able to recover data, anyway no problem because in the mean time WD replaced my disk, I backed up my data so no problem...

      But now the question is another...My filesystem is totally corrupted, I created an ext4 filesystem and after adding the new disk, the RAID reconstruction was CLEAN but after the first reboot everything has gone...WHY??? I cannot believe this happened, for sure I did something wrong but If I want to install again OMV and create a new array with a filesystem like ext4 I don't want this again in the future...

      Looking at post 5 you gave to me different solution to create my OMV NAS, but I don't really know what to choose...

      Thank you

      The post was edited 2 times, last by vatastala ().

    • vatastala wrote:

      I cannot believe this happened, for sure I did something wrong
      Yes I would agree with that.

      vatastala wrote:

      but If I want to install again OMV and create a new array with a filesystem like ext4 I don't want this again in the future...
      There is no reason why it should and there are a number of reasons a filesystem can become corrupt some of which are beyond the end users control. The Ctrl-D option means the system has booted to recovery/emergency mode due to a problem with the filesystem until that is resolved anything else is a total waste of time.

      So the question is why are you using a Raid option and to help you answer that read this post if you want further clarification on my post 5 then I can do that and I'll tag some users who use specific set ups in relation to that post or have a look at the guide here
      Raid is not a backup! Would you go skydiving without a parachute?
    • geaves wrote:

      vatastala wrote:

      I cannot believe this happened, for sure I did something wrong
      Yes I would agree with that.

      vatastala wrote:

      but If I want to install again OMV and create a new array with a filesystem like ext4 I don't want this again in the future...
      There is no reason why it should and there are a number of reasons a filesystem can become corrupt some of which are beyond the end users control. The Ctrl-D option means the system has booted to recovery/emergency mode due to a problem with the filesystem until that is resolved anything else is a total waste of time.
      So the question is why are you using a Raid option and to help you answer that read this post if you want further clarification on my post 5 then I can do that and I'll tag some users who use specific set ups in relation to that post or have a look at the guide here
      Hi @geaves, thank you for your suggestions.

      I would comment that almost everything was done apart RSYNC ( but I backed up my data during the RMA with WD ), so the configuration was correct, I had alarm enabled by email, periodic SMART full checks and so on...

      So, what I cannot find now is how to enable periodic checks of the filesystem, there's a task scheduler and I would like to enable some automation that report to me by email any errors, in this way I will be able to act and fix them.

      What do you thin kabout this?
    • vatastala wrote:

      So, what I cannot find now is how to enable periodic checks of the filesystem,
      What you need to be able to do is an fsck periodically on the Raid itself as it was that that failed as per your image in post 15, hence that's why the system went into emergency mode.

      What you could try is set up a scheduled job weekly after hours and enter the command fsck /dev/md0 (if your raid is set as md0) save it, select it, then click run, this will at least confirm that it will work, then send the output to email.
      Raid is not a backup! Would you go skydiving without a parachute?