Raid 10 fails every time (2 devices fail)

  • Hello,
    I've just been trying to setup this omv NAS for about a week now. I gave up on raid 10 because this happened in the first place but then got paranoid about redundancy again so now I am trying again and need help.


    The issue:
    When setting up my raid for my 4x 2tb hard drives (well one is actually a 3) I wanted to go with Raid 10. Upon creating the raid it never seems to finish resyncing and upon creating a file system the first 2 drives will fail and the raid will become (clean, degraded). This only seems to happen on raid 10 because upon setting up linear for the last few days I've been able to write and read from the server with no problems. Upon creation of the raid 10 it displays (PENDING). Upon changing monitors to check the server terminal I see that 2 drives have supposedly failed, these devices being sda and sdc, and it will attempt to continue on two devices. Restarting the server gives the same error message and nothing I seem to do fixes this. So this is where I stand.


    I'm new here so if you want any files or settings logs I will attempt to provide them.




    EDIT:

    • Offizieller Beitrag

    Upon creating the raid it never seems to finish resyncing and upon creating a file system the first 2 drives will fail and the raid will become (clean, degraded).

    First, the array must finish the initial sync before putting a file system on it. I ran a RAID10 sync on tiny little 5Gb virtual drives. That took about 5 minutes. Scaling this up to 2TB drives means your sync might take 6 hours. (With all the variables, that's a very rough estimate.) When the sync is finished, then you'd put a file system on the array.



    Second, if these drives are not new, have you looked at their SMART stat's? If one or more have failing stat's, mdadm RAID may kick them out of the array. Under Storage, SMART, enable the SMART service, and edit each of your devices and enable SMART on them. In attributes look for the following:


    SMART 5 – Reallocated_Sector_Count.
    SMART 187 – Reported_Uncorrectable_Errors.
    SMART 188 – Command_Timeout.
    SMART 197 – Current_Pending_Sector_Count.
    SMART 198 – Offline_Uncorrectable.
    SMART 199 - UltraDMA CRC errors


    With the exception of 199, errors in the above attributes are an indicator that a drive failure may be in the making and might be a reason why the array is kicking member drives out. 199 is usually a cable/connector issue but it causes I/O errors.
    ________________________________________________________________________________


    but then got paranoid about redundancy

    Paranoia is a good thing - it can help you keep your data but using an effective backup method is important. You do know that RAID is not about "redundancy", it's about "availability", and I can't think of a good reason to run RAID1 (or a variant, RAID10) at home.


    Provided that they're healthy, you have enough disks to achieve real backup. Solid backup is far better than the false feeling of protection one may think they're getting from RAID1.

  • It could be any number of things. I have been running a nas at my house for 10 years now.


    What I have found through trial and error and a lot of pulling my hair out;


    1) I made the mistake of buying the cheapest sata cables and cursed my hard drives for being bad. I changed the cables out to better made ones and the drives worked flawlessly. Even new cables can be bad as it appears no one does quality control anymore to check products work out the door.


    2) Drives, even new ones, can have bad sectors. Usually your manufacturer will have software that can be downloaded from their
    website that will allow you to check the drives for bad sectors and 0 out any data that may be on them. This is a time consuming process.


    3) Bad memory. I bought more expensive ECC memory that my mb manufacturer recommended. I made the mistake of throwing in memory I had laying around. It initially appeared to run fine, but it was causing data rot and corruption. If you do use non ecc memory or something you have laying around, run it through your bios memory checker if available and software made to run memory through its paces.


    Joe

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!