RAID Disappeared - need help to rebuild

  • Re,


    this looks bad - have any drive SATA-errors too? From plain data given, i would "sdf" change first, then "sdc", then "sdb" ... and may be use anonther vendor/type of harddrives ... btw. which drives do you use actual?


    Sc0rp

    Thanks for the guidance. Can't check for SATA errors right now (none that I'm aware of), as the server is off at home (figured it would be best to switch it off until I could start drive replacements). They're all Seagate Barracuda 3TB drives. I've already replaced one of the others previously, and I have a new spare ready and waiting. I was wondering whether I'd be better off switching to a different model, for the next swaps. I'd read that Barracuda weren't intended/good for always on servers, and had seen Seagate IronWolf drives recommended. Any thoughts on that?

  • Re,


    IronWolfs are relativly "young" drives, but it seems currently, that they are worth a try (but they operate at 7,2k rpm ... more performance for more temp!) ... WD Red are the usual/common recommendation for NAS-setups - they are realy good suited for private/home office use (you can change to WD Red Pro if you need more performance ...).


    Sc0rp

  • Thanks for that. Any issue with mixing Reds with Barracudas for a short while? And if I was to add 4TB drives, the NAS would (for now) use 3TB but I could grow the array at a later date (once all drives are upgraded to 4TB) to maintain data and expand capacity?

  • Hey guys ... work's been crazy this week, and I've only just got round to trying to progress this. The first thing I did was start backing up some of the files that I knew would be a ball-ache to replace. All was going well for a good hour or so, and then a read error was reported and the NAS disappeared from the network (I assume it was unmounted). I rebooted the NAS, and was struggling to even get OMV to boot. Next, I removed all drives from the NAS, and OMV booted no issue. So, I added the drives back in and swapped out what was sdb (the drive with the increasingly high number of pending/offline uncorrectable sectors). I'm still waiting for another 2 drives to replace sdf and sdc (in that order).


    After swapping out sdb, repairing the degraded RAID seemed to go fine, and it completed with no reported issues. The RAID was mounted, and all folders/files were visible to the network again. However, I went to delete some files, and immediately faced a read error (Windows dialogue: Location is not available. W:\ is not accessible. The request could not be performed because on an I/O error).


    The OMV GUI is still showing the RAID as active/clean/mounted. mdadm --examine /dev/sd[abcdef] reports:



    Nothing jumps out as being odd, and all details seem to match from one HDD to the next.


    I've pasted updated logs and messages here:


    http://sprunge.us/UCLE
    http://sprunge.us/PWVK


    I'm really not sure what to do next (aside from swapping out the other 2 HDDs that are showing SMART errors. I did notice something in the logs about xfs_repair - should I be attempting this now (and basic question, is it run at HDD level, or at RAID/device level?).


    As always, any insight and recommendations would be great. Thanks in advance,
    Brian

  • Re,


    i have to less time currently to dig deeper in this, but you cann issue a
    touch /forcefsck
    in the terminal and then reboot - this forces filesystemchecks at the next boot (even XFS) ... may be this will help.


    Sc0rp

  • Hi Sc0rp, Appreciate your assistance, especially when you're so busy. I've done as suggested, but the NAS seems to hang on reboot and the array isn't unmounted. I connected a monitor directly to the device so I can see system messages. The following look relevant:


    Turning off quotas ... quotaoff: Cannot resolve mountpoint path /media/folder ID: Input/output error (the same is repeated for several shared folders)...Unmounting local filesystems ...[213192.927608] INFO task umount:3569 blocked for more than 120 seconds.[213912.932051] Not tainted 3.16.0-0.bpo.4-amd64 #1[213192.936445] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message


    The same 3 messages are now repeating every 120 secs (with different numeric values at the start of the message). What now? Leave it running? Force shutdown?

  • I've managed to do as you suggested now (not sure how I got there, but fsck ran). Also ran xfs_repair -n, which comes back with loads of issues, and eventually: Inode allocation btrees are too corrupted, skipping phases 6 and 7
    No modify flag set, skipping filesystem flush and exiting.


    I guess if the RAID has been rebuilt after disks had dropped out there's going to be inconsistencies all over the place.


    Will swap out the other 2 discs when they arrive, and then maybe run xfs-repair and see what's salvageable but I'm thinking this might just have to be given up as a bad job, and rebuild my media collection from scratch (and of course, create a back-up next time!).

  • Re,

    I guess if the RAID has been rebuilt after disks had dropped out there's going to be inconsistencies all over the place.

    From my expirience: the inconsistences occured WHILE the drive was dying ... but the result remains the same ...


    Will swap out the other 2 discs when they arrive, and then maybe run xfs-repair and see what's salvageable but I'm thinking this might just have to be given up as a bad job, and rebuild my media collection from scratch (and of course, create a back-up next time!).

    Yeah, may be you can finally "force" xfs_repair to get a clean state (at least you can zero the journal ...) just search the inet for "man xfs_repair" :D


    Btw. i had also many problems while using xfs over an old areca-hw-raidcontroller, due to a bug in the driver (kernel-module), but i never lost data ...


    If you have ever the chance to make your current array new, consider using ZFS-Z1 or ZFS-Z2 instead, it's more convinient nowadays ... and as a special benefit 4 me, @tkaiser is then in charge :P ... uhm, just kidding ... a bit.


    Sc0rp

  • Hey guys. Wanted to give you an update. I swapped out the 2 damaged drives and rebuilt the array successfully, both times. After a lot of deliberation, I then tried xfs_repair -L, and held my breath.


    The good news is that it would appear that only files saved to the NAS in November were corrupted. In my testing, at least, all files I’ve watched that were saved prior to early November are playing with no issues.


    It’s a Christmas miracle!


    I’ve now got all alerts active so I’ll know of any similar issues in future. And when finances allow, I’ll setup a proper backup solution.


    Thank you both @Sc0rpand @tkaiser for all your help and advice. I wouldn’t be sitting here watching our Christmas movies right now, without you!


    Wishing you both an awesome holiday season!

  • The good news is that it would appear that only files saved to the NAS in November were corrupted. In my testing, at least, all files I’ve watched that were saved prior to early November are playing with no issues.

    And that's a strong argument for filesystems developed in this century and not the last. Since with modern approaches like ZFS or btrfs you would now simply run a scrub over night and the scrub would list you each and every corrupted file without you having to waste any time on this or having to fear that you now have to live with a lot of corrupted files you simply don't know yet (very common problem in a lot of installations where I had to help with data recovery)

  • Unfortunately, that's a fear I'm going to have to give in to, and just replace any corrupted files as I come across them. The end result, as it stands, is much more positive than I first expected, so for now I'm a lot happier. And be sure that when I get chance to setup a proper backup solution, I'll first upgrade the filesystem and migrate the existing content as per your suggestion. Thanks again for your help, and Merry Christmas!

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!