Posts by Sc0rp

    Re,

    I'm still getting the same error, even when adding the ending slash?

    Uhm ... just issue an
    ls -la /root to see what's going on in this dir, if there is a file called "20172417-raid-issue" (the line then starts with "-rwx"), delete it with:
    rm /root/20171124-raid-issue and try again ... may be you have to create the subdir first:
    mkdir /root/20171124-raid-issue


    Sc0rp

    Re,

    I'll do as you suggested with the logs (though the code you suggested is returning an error, currently: `/root/20171124-raid-issue' is not a directory), and give the reassembly a go.

    Sorry that was my error, while fast typing ... add a slash on the end:
    cp -v /var/log/messages* /root/20171124-raid-issue/


    After copying the files, you can do the search on this directory ... just change the path from "/var/log/" to "/root/20171124-raid-issue/" ...


    I can't understand about the missing drive ... I'm sure all 6 were in the array, previously.

    But the log's don't lie :P


    Is that it then? No chance of resurrecting or rebuilding?

    You can always try the reassemble with force. Chance is 50:50 ... md is save in this, but you have to expect data-loss since the fs-layer (xfs) is damged too ...


    And as always: RAID is not backup ... i hope you'll have a working backup.
    For the future you should keep in mind, that you hat to think about changing from RAID5 to ZFS-Z1 or move to SnapRAID/mergerfs ...


    PSU: ATX standard is good, i was only afraid of the next PicoPSU-setup ...


    HDD: Barracuda's are not problematic at all, my "old" 2TB-ones are working flawlessly 24/7 ... md-RAID5 @ OMV3 (of course with continous rsync-backup ... and UPS ... and email-noti ... and other scripts)


    Sc0rp

    Re,


    checked the logs ... here are the most recent error-lines:


    /var/log/syslog.1:Nov 23 22:46:35 openmediavault kernel: [23464678.453691] md/raid:md127: Disk failure on sdb, disabling device.
    /var/log/syslog.1:Nov 23 22:46:35 openmediavault kernel: [23464678.453691] md/raid:md127: Operation continuing on 4 devices.


    That means:
    - sdb was disable due to massive errors on the device (read errors) ... and with that, your array went from "degraded" to "dead"
    - 2nd line states, that there was one missing drive before that ... and with that, your redundancy was gone


    Don't you have an email-notification?
    Which drives (vendor/model) do you use in this setup?
    Which powersupply do you use?


    Sc0rp


    EDIT/ps: you should also check the backlogs ...
    ls -la /var/log | grep syslog (shows the backlogs for syslog)
    ls -la /var/log | grep messages (shows the backlogs for messages)
    do this both commands and remember the numbers, then do:
    zcat /var/log/syslog.X.gz | grep sdY (the X is a number between 1-7 - look at the lists from commands above, Y is a,b,c,d,e or f)
    zcat /var/messages.X.gz | grep sdY
    for each drive one for one (first start with Y=a), since the drive-naming between mdstat and the provided logs looks weired ...


    After that, you have to check all SMART stati of all drives, as @tkaiser mentioned already!


    Sc0rp

    Re,


    according to the output there are good news ... and bad news:
    - good: the "Magic" is the same on all drives
    - bad: the Event-counter is different (but it seems, they are close enough to reassemble)


    Any conclusion about the root-cause? That would be highly necessary ...


    May be, you can "copy" (read backup) the log-files for later searching ...
    cp -v /var/log/messages* /root/20171124-raid-issue (this will create the subdir "20171124-raid-issue" under the home of root)
    cp -v /var/log/syslog* /root/20171124-raid-issue


    After the files are copied, you can try to reassemble ...


    Sc0rp

    Re,

    Just for my personal understanding: the above /proc/mdstat output talking about two RAIDs with different members can be ignored?

    I hope so ... since the state for the both /dev/sdf and /dev/sdc in the md126-array are both "spare", i hope the backup superblock is intact for the assembling with the remaining disks. Btw. the "/dev/mdX" numbering is more OS related, than array related ... md is working here more like hardware-controllers.


    You can issue the command:
    mdadm --examine /dev/sd[abcdef]
    to bring clearance of the stati of all drives - i hope there are no greater "event missmatches" on the array ...


    Sc0rp

    Re,


    the command will be something like:
    mdadm --assemble /dev/mdX /dev/sd[abcdef] (change the X to 0,126 or 127 ... whatever you want)
    if that fails try to force it:
    mdadm --assemble --force /dev/mdX /dev/sd[abcdef]


    BUT:
    You should find the root-cause for this behavior first - just check the logs for any error message!
    cat /var/log/messages | grep KEYWORD (KEYWORD is something like md, sda, sdb, sd..., raid, ...)
    cat /var/log/syslog | grep KEYWORD


    Sc0rp

    Re,

    Is the same for linear, one drive goes south array dies also, is on the kernel wiki.

    Nope, it may stand on the kernel-wiki, but i had ages bevor an RAID0 linear setup ... you can read the remaining disks. It's not easy, of course, but possible ... but anyway, today i would do this never again!


    Sc0rp

    Hi,


    can you please provide a screenshot or similar?


    Sometimes the message is normal, since mdadm works with auto-detection ... but not "repeatedly".


    Sc0rp

    Re,


    since i'm not an developer even a programmer at all - i'll repeat myself:

    Nope, cause i'm to stupid


    I can only provide the plattform/hardware ... sry anyway ... therefore i wrote:


    If you need some special tasks, please contact me.


    Since i have the "Test-NAS" build, which can be (mis-) used for any testing tasks (currently at OMV 4.x), i'm able to test whatever you want, but i need the instructions ...


    Sc0rp

    Re,

    RAID0 is really a bad idea for a NAS. If one of your drives dies, then you lose everything on all the drives. I suggest you get rid of that setup ASAP!

    That's not correct for linear-mode! There you'll only loose the data from the failed drive and the (both) files, which are held at the beginning and the end of the failed drive.


    Sc0rp

    Re,

    I could do the whole ZFS thing, but I have mixed feelings about the whole scrubbing feature.

    Hmm, scrubbing an array (of which technology ever) is pure basic "array management". You need this to evaluate your redundacny and integrity - how will you achive this without "scrubbing" (or any other ongoing messurment)?


    Scrubbing belongs to it, that's a fact. Avoiding this, you can not use any array-technology ...


    I'm eventually going to have (8) 8TB drives.

    Yeah, when it is a more static content (media-archive), then SnapRAID is a good choice, when you will build up a workgroup-grade NAS, you should go with ZFS. But you have to learn and understand both technologies (as well as that goes with RAID).


    At least, with a very good backup-strategy you may not need redundancy at all :P


    Sc0rp

    Re,

    I've heard of UFS but have no experience with it. I'm eventually planning on switching to RAID 6. How would I go about growing it then?

    RAID6 is possible but old fashion, why not use ZFS instead? ZFS-Z2 in case of RAID6 equivalent ... but it depends highly on the usecase, because you can also setup a SnapRAID/mergerfs construct (deals with redundancy with up to six parity drives too).


    Sc0rp

    Re,


    the ASUS P5G41T-M LX uses an "Atheros AR8131 Gigabit NIC" ... which is well known having problems in linux due to have no working driver (kernel-module), since it uses different modules for different revisions. I found this article, but it is very old ... may be this will help you.



    If you have console access, pls issue an
    lspci


    ... and post the output.


    Sc0rp

    Re,

    but if you have an idea what im doing wrong, I'll be very thankful for a hint or a solution for my dilemma.

    I assume, since you have a (U)EFI partition, your old drive used a GPT scheme, instead of the old MBR sheme ... so the command
    dd if=/mnt/backup/grub_parts.dd of=/dev/sda bs=512 count=1 is wrong because GPT uses 34x512Byte ...


    Sc0rp

    Re,


    may be the best strategy for you is not RAID0 ... did you hear about UnionFS (in case of OMV v3.x is it mergerfs)?


    I dunno why OMV can not "grow" a RAID0 - and i'm missing the screenshot of the "Physical disks" and the "file systems", but you cann grow your array via console as well. (SSH or local)


    Sc0rp

    Re,

    isnt it pretty normal to have high read and writes?

    No, that's not normal for system-drive ... in case of the writes. May be you have to less RAM, or you have misconfigured a "cache system" ...


    If you have lot's of writes on the SSD, you should use a pro-grade SSD anyway (like Samsung Pro series) ... but you should check, which configuration causes the high write-load, because that will destroy any SSD - and affects of course ALL members of an RAID1! (and, for RAID1, it may double the read-speed and read-I/O's, but the write-speed and the write I/O's remains the same!).


    unless the OS is loaded into ram like ESXi.

    Since ESXi is also based on a unix/linux core, the behavior ist the same for all "OS" this days ... read the files from disk and buffer as much of them in RAM (Cache-Area) as there is space ... the rest is used for file-caching (FS related).


    If the OS dies, you reinstall. It's usually faster than messing around with clonezilla.

    That depends on the configuration of the whole NAS ... CloneZilla is a bare metal backup incl. network (server/client), so U can use another host as server and other possibilities ...


    RAID1 as boot-device brings only more reliability if different drives of the same technology are used (e.g. one HDD from Seagate and one from WD - of course with the same base specs), on SSDs i didn't recommend that, there is a good backup-strategy better (rsync to a backup-dir, Image, internal backup-plugin, ...)


    Sc0rp