Mobo and other HW suggestions wanted for new OMV box with ECC

  • On the i7-965 system:

    Code
    $ dmesg | grep -i edac
    [ 8.791019] EDAC MC: Ver: 3.0.0
    [ 8.801505] EDAC MC0: Giving out device to module i7core_edac.c controller i7 core #0: DEV 0000:3f:03.0 (POLLED)
    [ 8.801561] EDAC PCI0: Giving out device to module i7core_edac controller EDAC PCI controller: DEV 0000:3f:03.0 (POLLED)
    [ 8.801566] EDAC i7core: Driver loaded, 1 memory controller(s) found.

    omv 5.6.13 usul | 64 bit | 5.11 proxmox kernel | omvextrasorg 5.6.2 | kvm plugin 5.1.6
    omv-extras.org plugins source code and issue tracker - github


    Please read this before posting a question.
    Please don't PM for support... Too many PMs!

  • Well I have no idea why our outputs are so different and why you have 2 devices polled. I was hoping the MC0 wouldn't be present for you, and that it would indicate the ECC status for those who have it. Not sure why I have a ton of South bridge stuff either. I was hoping it would correspond to cores, but some appear to be duplicated and if you could the unique values you get 17.

  • Here is the dual E5620 system:


    Code
    # dmesg | grep -i edac
    [ 4.524352] EDAC MC: Ver: 3.0.0
    [ 4.586637] EDAC MC1: Giving out device to module i7core_edac.c controller i7 core #1: DEV 0000:fe:03.0 (POLLED)
    [ 4.586694] EDAC PCI0: Giving out device to module i7core_edac controller EDAC PCI controller: DEV 0000:fe:03.0 (POLLED)
    [ 4.587210] EDAC MC0: Giving out device to module i7core_edac.c controller i7 core #0: DEV 0000:ff:03.0 (POLLED)
    [ 4.587252] EDAC PCI1: Giving out device to module i7core_edac controller EDAC PCI controller: DEV 0000:ff:03.0 (POLLED)
    [ 4.587602] EDAC i7core: Driver loaded, 2 memory controller(s) found.

    omv 5.6.13 usul | 64 bit | 5.11 proxmox kernel | omvextrasorg 5.6.2 | kvm plugin 5.1.6
    omv-extras.org plugins source code and issue tracker - github


    Please read this before posting a question.
    Please don't PM for support... Too many PMs!

  • Markess +1 ECC is not affected by OS, if it does or doesn't work in FreeBSD, it will be the same in Linux. Make the jump!


    I run Linux on the desktop, so I'm already more comfortable with OMV than I ever was on FreeNAS. ECC isn't holding me back, its that I've got too many choices, which isn't really a bad thing! With the forums here, its great to be able to see what other people are trying, what works and what doesn't. With my limited physical space I need to be heat/power adverse, but I also like ECC/ZFS. So before I invest in more hardware, I'm still checking to see if I can do both. On the other hand, I've had an OMV test rig (made from spare parts) up for a month now in parallel with the existing FreeNAS box. Its been rock solid, and draws 80% less power. So, I may just leave it on and use it for a while.

  • Sounds like you're pretty much set. I know that a lot of people swear by BSD based systems, but I've never got on with them - with the exception of (and please don't shoot me here) Macs (if you can really call a Mac a BSD system). I just don't like paying Apple prices for systems so my laptop is a Hackintosh. I'm a bit of a multi OS user to be honest. I can't avoid Windows because whether in a VM or not, there always seems to be something that I need to use that's Win only. But BSD as in FreeBSD/NetBSD I just don't like it. I don't get on with the package management. I know it may have been awesome way back when, but so was Slackware (once) and I just don't think so much manual package management is what a modern OS should make you put up with. The ArchBSD project makes it a little less annoying but I would rather use Linux as, when push comes to shove, hardware support sucks on FreeBSD. Modern package managers like Apt and Pacman make Slack and BSD look awful. Also, as you have found- it appears to be more energy efficient. :-)

  • sigh. So it is back to using a known bad DIMM or good old hardware hacking. :thumbdown:


    I have access to bad dimms because the boss of the place here intercepted RMAs and paid new ECC DIMM replacements, to have some "company's official bad DIMMs for testing", but I doubt most people will ever see one. It's already rare for a normal DIMM to die, go figure for server-grade lifetime-warranty ones.


    Some time in the future when I finally get a cheep C206 mobo I'll go fetch the full-text of that
    article I linked above (when we were talking about hardware hacks like tape on pins of DIMM connector) and see if I can make a "temporarily bad" ECC dimm that gives ECC errors so we can confirm once and for all if the goddamn ECC hardware is working or not.


    Or, if you feel like doing it first, be my guest. :D


    Quote from Markess

    With my limited physical space I need to be heat/power adverse, but I also like ECC/ZFS.


    Did you check SnapRaid? It is more suited for home-server environments.
    Btrfs, SnapRaid and ZFS are the only way to do file checksums and correct bit-rot, SnapRaid does that only on a schedule, while Btrfs and ZFS do it live and on a schedule.
    Another feature that might be interesting is that a SnapRaid "array" is not a RAID, so when you are reading data only one drive is running.
    Also can have up to 6 drives for parity (on an ARM device only 3 due to performance reasons)
    Since its checksums and parity are a snapshot it can also restore accidental deletes or modifications, but you're better off using a true file versioning system than this.
    see here http://snapraid.sourceforge.net/compare.html


    OMV has a SnapRaid plugin.


    I use SnapRaid inside my Zyxel nsa325v2 (a hacked 2-bay commercial NAS box that is rather cheep and can run Debian ARM, and OMV on top of it), no problems.


    Quote

    Also, as you have found- it appears to be more energy efficient.

    That's
    more like "BSD hardware support totally sucks even with 3 year old hardware while you can install linux on anything 1 year old or more and it will probably work with maybe some graphics issues"


    He is overwhelmed by how much stuff he can run linux on. :P


    His ZFS rig is made with pre-sandy bridge processors and chipset. It's less power-hungry than a pentium-based system, but it's still a lot by modern standards.


    Also, its processing power is probably on par with the lower-end xeons or i5s of modern day (sandy-bridge onwards) that on idle use 5 watts or so ( plus 5-15 eaten by the board/ram/stuff, depends).


    Quote

    (if you can really call a Mac a BSD system)

    The opensourced OS without their proprietary interface is called Darwin. Also used as a base for iOS (again with closed-source interface and
    parts).
    OSX or iOS are a BSD with proprietary additions.
    Apple devices, PlayStations and firewalls are the most common consumer devices running a BSD derivative.


  • Don't write off the possibility of checking ECC with an app yet. I heard back from the AIDA64 guys. They say that their software can tell you if ECC is enabled. I have emailed back asking for clarification. But in the mean time, maybe we can try and get some results from boxes with AIDA64 on? The only downside is that it requires a Windows environment. Unless you pay then use the Linux audit software (but OS is not important to ECC). Info on AIDA64 is here. Download a trial copy here. Make sure you get the green Engineer edition, it is the most comprehensive. Once installed, you should be able to get confirmation that ECC is enabled. To quote the support ticket email:


    Quote

    You can check the ECC in the Motherboard / Chipset point in the side menu and under the North Bridge / [Error Corruption]. If AIDA64 appear Not Supported, then you should try to change the memory mode to ganged mode in the bios.


    Hope this helps, but feel free to ask further questions.


    I have emailed back to ask if this actually confirms if it is running, or just the ability of the system to have ECC. I am very tempted to buy a copy of this, if it gives us the answer we are after. :)



    It does look pretty good... I've never really looked at it before. I might fire it up in a VM and take a look.



    Well, I've always said I prefer Linux from a hardware support point of view (and many others). I thought it was Christmas when I came across OMV ;)


    The opensourced OS without their proprietary interface is called Darwin. Also used as a base for iOS (again with closed-source interface and
    parts).
    OSX or iOS are a BSD with proprietary additions.
    Apple devices, PlayStations and firewalls are the most common consumer devices running a BSD derivative.


    True, but it's so far from just plain BSD now that I meant its like comparing Debian to Ubuntu.

  • Quote from ellnic

    Don't write off the possibility of checking ECC with an app yet. I heard
    back from the AIDA64 guys. They say that their software can tell you if
    ECC is enabled.

    The problem is that they probably won't tell us how they detected this and if it is indeed detectable at all. So I cannot be sure. Hey, we aren't trusting the manufacturers, why are we trusting random software developers that want to sell us an extremely expensive software suite?


    The only realistic way it could work is if hardware supports ECC error injection (generate fake errors), and I have no idea of what hardware supports that.
    And even then I have exactly 0 understanding of how it works so they could make something that appears to work when tested, but then does not in practice.


    With a bad dimm or a "temporarily bad" dimm there isn't much they can realistically do to screw me. It either fills the log of errors or something isn't working.


    The general plan is this http://bluesmoke.sourceforge.net/testing.html
    and there is also an article but is behind a (relatively cheap) paywall http://ieeexplore.ieee.org/xpl….jsp%3Farnumber%3D6157166


    It's basically masking a data pin, so that the data coming through it would not be written nor read, and that causes an ECC error when data has to move through it.
    Will of course have to check the interface schematics because I need to know what I'm masking, that text has 10 years, now we have DDR3, which isn't the same as DDR1 like the thing in the example.
    And if I can't mask the pin with tape or with plastic I can always use fingernail paint on it.
    It all goes into the "do not try this at home" bin, but hey.


    If it works, it will end all this software weaseling.


    Quote

    True, but it's so far from just plain BSD now that I meant its like comparing Debian to Ubuntu.

    More like comparing Android to a desktop distro. Apart from kernel and some basic utilities, it's different.

  • I'm travelling this week with just my mobile phone, so won't try any clever quoting.


    @bobafetthotmail I'll try the ECC recommendations you gave a few posts back when I get back, but I'm drawing the line at heat gunning the memory! No destructive testing .


    You and @ellnic both mention BTRFS. Last time I looked into it,there was still a bug where to file system ate disk space for logs and housekeeping, but didn't report it, and disks were filling up without users being aware. Is that fixed now, do you know? If so, that may be an option too at some point over ZFS etc.

  • @bobafetthotmail I'll try the ECC recommendations you gave a few posts back when I get back, but I'm drawing the line at heat gunning the memory! No destructive testing .

    I was talking of covering electrical contacts interrupting data transmission over some data lines, while I suspect that it is at most going to crash the system, I strongly advise to not do it unless you know what you are doing.


    Other methods cited are a bit crude but consider that the post I cited has 10 years. Things were more sturdy back then, you know. (not kidding, bigger traces, larger components, cruder systems).


    Quote

    You and @ellnic both mention BTRFS.

    It's improving.


    It is now usable on smaller scales, as long as you stay within RAID1 and RAID0 and have the latest kernel (and keep an eye open for new bugs).
    RAID5 and RAID6 are NOT working properly if at all and are still under development.


    For example, Netgear is using a RAID1 of btrfs in their consumer NAS http://kb.netgear.com/app/answ…elevant-maintenance-tasks
    (also a good simplified read about some kinds of mainteneance it needs)


    And SUSE is using btrfs as standard filesystem instead of ext4 from openSUSE 13.2 onwards. https://news.opensuse.org/2014…m-btrfs-on-opensuse-13-2/


    You can thank Facebook for that. They are employing the btrfs main devs and let them run tests on their webservers (= less important servers, the servers where the data is stored are the database servers, which remain off-limits).

  • @Markess I wouldn't say that BTRFS is suitable for anything but mirroring or just a plain root partition etc. as @bobafetthotmail has mentioned, it's under heavy development, but I would stick to ZFS for now. Once BTRFS is usable, these ECC concerns will be greatly reduced as it's not meant to require it.


    @bobafetthotmail I know you don't want to trust AIDA64, but the information they have given me is not because they are under the impression that they are going to get the sale. As far as they are aware, I've downloaded a trial for 1 thing - and I could just be using it for that with no intention to purchase. I heard back from them with regards to if AIDA64 is reliable in telling us if ECC is actually enabled and this is what they said:


    Quote

    You should only trust the Chipset page. So just need to read this menu to know more about the ECC support.


    Maybe give it a try and see what it tells you? :-)

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!