Posts by geaves

    Do you think putting in a PCI SATA card would be a sufficient test to see if the controller failed

    Possibly I'll tag chente as he has been doing some investigations into PCI Sata cards and their chipsets.


    The problem you have is, it's going to be very hit and miss, as trial and error requires patience and notes in the process used.

    The continual resyncing would suggest a hardware issue rather than a drive issue, this could be cabling, controller, power supply


    Basically the drives are unable to stay in sync with each other

    would replacing the protected hdd help in the rebuild?

    No

    can the other 2x drive rebuild the data in to the new one ?

    No

    Am I correct in saying that Raid5 minimum disk is 3x so the 2x good one should be able rebuild the 3rd new one

    No


    The minimum requirement to build a Raid5 is 3 drives, once an array is created the metadata/signature is written to each drive within the array, in your case that information contains a reference to 4 drives. That is confirmed by the output from mdadm --examine


    Raid5 allows for 1 drive failure within an array, any more the array is toast/dead all data is lost


    Whatever happened or whatever you unknowingly did the array cannot be rebuilt from 2 drives

    why would be a protected mbr on that drive

    Those are usually found on NTFS formatted drives, that are sold like that, but NTFS formatted drives are found in external usb enclosures, never on single drives bought for a nas or a workstation replacement, at least that's my experience, but I'll stand corrected if not.

    This tells the whole story

    Code
    root@omv6:~# mdadm --assemble --force --verbose /dev/md127 /dev/sd[ac]
    mdadm: looking for devices for /dev/md127
    mdadm: /dev/sda is identified as a member of /dev/md127, slot 3.
    mdadm: /dev/sdc is identified as a member of /dev/md127, slot 1.
    mdadm: no uptodate device for slot 0 of /dev/md127
    mdadm: no uptodate device for slot 2 of /dev/md127
    mdadm: added /dev/sda to /dev/md127 as 3
    mdadm: added /dev/sdc to /dev/md127 as 1
    mdadm: /dev/md127 assembled from 2 drives - not enough to start the array

    mdadm is looking for 4 drives, even if sdb was working as it should, the array would be in a clean/degraded state


    This

    Code
    mdadm --examine /dev/sdb
    /dev/sdb:
       MBR Magic : aa55
    Partition[0] :   4294967295 sectors at            1 (type ee)

    tells you there is a protected mbr on that drive, which there shouldn't be

    any suggestion in what file system and raid to use that make easier to recover data if something goes wrong again

    It's not the filesystem or the raid, it's the use of sata controller cards in general, whilst they are good in general I would not use them for the likes of Raid or zfs, for individual drive expansion they are probably fine.


    Looking at the information I can find on that Fujitsu it is supposed to be a Raid card, the card should be flashed or capable of IT mode, or set to JBOD, for software raid to work

    To get the complete information on that card you need to do search for something like hwinfo --short or lshw --short but you should be able to drill further to get precise information


    One thing you must understand, that array had 4 drives at some point, otherwise this line -> Raid Devices : 4 from mdadm --examine would be different, mdadm is software so it's only as good as the information it has retained, it can't lie

    That's the output I expected, there's nothing to be done, the output from --examine wants 4 devices you only have 2 that mdadm recognises as being part of an array, the array could be assembled with 3 out of 4 drives as Raid5 allows one drive failure

    OK I don't know why you run mdadm --examine again, but let me clarify;


    this line -> Raid Devices : 4 from mdadm --examine is looking for/wanting 4 drives within that array,


    blkid can only find 2


    fdisk and your image can find 3 x 6TB, so unless those drives can assemble without error in a clean/degraded state, there's nothing to be done


    post the output of mdadm --detail /dev/md127

    This still makes no sense!! The --examine output says there should be 4 devices in that Raid5 and you've run the --examine on /sdb and /sdd twice!!


    Do not shutdown or restart the machine the drive references can change, post the output again of the following;


    fdisk -l | grep "Disk "


    blkid


    if the system fails to locate at least 3 of those drives in the array and mdadm fails to rebuild the array with 3 of the 4 drives in the array, the array is toast, Raid5 allows for ONE drive failure only

    Nothing the array appears to be toast, so, assuming the 3 x 6TB drives are/were part of the array post the output of the following for each drive;


    mdadm --examine /dev/sd? replace the ? with each drives reference e.g. mdadm --examine /dev/sda etc

    The array is inactive,


    mdadm --stop /dev/md127


    mdadm --assemble --force --verbose /dev/md127 /dev/sd[ac]


    this should rebuild the array in a clean/degraded state, what's odd here is that blkid does not see /dev/sdb and fstab shows no reference to an array and it appears some of the output is missing


    Initially resync the array when it's finished reboot and come back

    after running that it shows online

    It should display as clean/degraded in raid management

    but not mounted or something

    When an mdadm array becomes inactive and has to be reassembled it doesn't always mount automatically, either reboot where omv will pick the array from fstab, or storage -> file systems click the 'play' icon' -> mount an existing file system, from the drop down, the array should be available. Select it and click save

    What's the problem, the output from cat /proc/mdstat says the array is active, so it should be listed under raid management

    I couldn't say, the one who can answer you best is Geaves

    Not sure, my thought was to remove one of drives that was reading no uptodate device, restart the server and look at mdstat and mdadm --detail was the other drive with no uptodate device now being read correctly and was part of the array again.

    But if it still registered with the same error and as a spare then the array is toast and there's nothing more to be done.


    The difficulty we have as a forum of users is information about what hardware works and what hardware to avoid, or use at own risk. The second problem is the cheap Chinese stuff which has no name, if users want to extend their drive capacity a raid card flashed to IT mode is a safer bet that than these pcie expansion cards.

    Well thanks guys, I'll bow out of this one then, I really should ask before I get involved what hardware is the raid set up on


    Courtyard yes you were lucky, sometimes that happens, but when it goes wrong it goes wrong big time, but I commend you for having a backup even if it's not a complete backup, you have something to restore :thumbup:

    Ok you've restarted the server as the drive references have changed


    The drives in the array are currently [cdef] sdd and sdf are showing as spares, this is one of the issues, the second is #9 the error -> no uptodate device for slot 1 and the same for slot 3


    Going back to your initial post where you stated Steam reported disk errors, this could be i/o errors which could be related to hardware issues where the data being copied is experiencing intermittent write problems to the drive/s.

    This could be either the drive/s themselves or the connectivity of the drive, sata cable, power cable, backplane (backplane is a drive bay where a drive is plugged in i.e. 4 port drive bay these typically have/use a backplane.


    Can you give some info in relation to how the drives are connected, this might be a hardware issue.

    But I don't get it... Why

    If this is hardware related it would explain why it 'just happened' if it's drive related one would have expected SMART issues warning of a possible problem