ECC Usage and Recommendations

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • ECC Usage and Recommendations

      Hello.

      I was reading some threads here about ECC Hardware and about which hardware supports ECC. But I never found information about how OMV utilises present ECC?
      Can anyone give me a hint about that, given that:

      - I have capable hardware with ECC support
      - ECC is active/enabled (Hardware/BIOS)
      - ECC is active/enabled (Kernel)

      What happens when an Error occurs, given that ECC is only capable of detecting an Error, but not correcting it?

      Thanks in advance.
    • When I first read this I was trying to understand the statement.....OMV, Kernel, software has nothing to do with ECC Ram, provided your motherboard supports it it's usually enabled, but not always within the bios....this is a quote/statement from another nas forum;

      The generic arguments in favor of ECC RAM are still valid, of course. A machine with non-ECC memory can suffer memory corruption, and it’s possible for some of those errors to get to disk. That would happen regardless of the filesystem you’re using.

      and the wiki explains what ECC memory is.
    • gaelic wrote:

      Can anyone give me a hint about that, given that:

      - I have capable hardware with ECC support
      - ECC is active/enabled (Hardware/BIOS)
      - ECC is active/enabled (Kernel)

      What happens when an Error occurs, given that ECC is only capable of detecting an Error, but not correcting it?

      Thanks in advance.
      - ECC operates independent of the OS (and the kernel), much like a hardware RAID controller does.
      - ECC corrects errors.
      - ECC must be activated in BIOS, for ECC memory modules to work. (On most ECC capable motherboards.)

      OMV is aware of ECC but, again, doesn't interact with it.

      You can check ECC modules with the following command line

      Source Code

      1. dmidecode --type memory

      When you see:
      Total Width: 72 bits
      Data Width: 64 bits
      The extra 8 bits, in Total Width, is ECC.

      With this command

      Source Code

      1. dmidecode --type System Event Log
      You'll see ECC Single-bit, Multi-bit, and Parity memory errors (if any) are registered.

      While not common, single bit errors happen over time. (Cosmic rays they say... :) ) Regardless of the source, if the error is written to disk, data and the OS will slowly become corrupted. It's just a matter of time and whether or not the corruption hits something critical.
      ______________________________

      If you're real interested in the topic, check out the EDAC project. They wrote a Linux driver that works with MOBO's and ECC with utilities that will show memory errors. (Search for EDAC on sourceforge.) Unfortunately, my mobo is a proprietary item that's not supported. Maybe it will work with your MOBO.

      EDIT: There's a difference between UDIMM (unbuffered) ECC and RDIMM (Registered) ECC. The two are not compatible.
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.99 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.99 Erasmus - Rsync'ed Backup
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119

      The post was edited 2 times, last by flmaxey: edit ().

    • My MB supports UDIMM ECC RAM and I have 16GB of UDIMM ECC RAM installed and enabled in the BIOS.

      But I do not see a 72 bit width:

      ~@omv:~$ sudo dmidecode --type memory | grep bits

      Total Width: 64 bits
      Data Width: 64 bits
      Total Width: 64 bits
      Data Width: 64 bits

      Thoughts?
      OMV 4.x - ASRock Rack C2550D4I - 16GB ECC - Silverstone DS380
    • CPU is Intel Atom C2550D4I integrated on the board from the factory, not something I built from parts. Board is ECC capable and ECC is enabled in BIOS.

      Specs:

      asrockrack.com/general/product…l=C2550D4I#Specifications

      RAM is Crucial 16GB Kit (2 x 8GB) DDR3L-1600 ECC UDIMM

      crucial.com/usa/en/ct2kit102472bd160b
      OMV 4.x - ASRock Rack C2550D4I - 16GB ECC - Silverstone DS380

      The post was edited 2 times, last by gderf ().

    • I have one of the exceptions, a Core i3-4150, a "4th generation" Intel product. Again, my set up is a Lenovo Thinkserver so it's proprietary, but it will accept a Xeon processor.

      @subzero79 is right about CPU's. i3, i5, etc. doesn't mean much. I'd check the CPU itself (it's full designation), to its spec's.
      ___________________________________________________

      @gderf - that's an interesting MOBO/CPU combo. A ton of sata ports, fanless, low power consumption, and with ECC support. It seems to be taylor made for a NAS server.
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.99 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.99 Erasmus - Rsync'ed Backup
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119
    • flmaxey wrote:

      @gderf - that's an interesting MOBO/CPU combo. A ton of sata ports, fanless, low power consumption, and with ECC support. It seems to be taylor made for a NAS server.
      Exactly.

      I have been very, very happy with it too. The case is excellent also. It will hold 8x 3.5in drives in removable hot swapable externally accessable trays, plus 4x 2.5in drives internally.

      12 SATA ports on board, 12 drive bays in the case. Runs headless and I keep it in my gun safe :)
      OMV 4.x - ASRock Rack C2550D4I - 16GB ECC - Silverstone DS380

      The post was edited 1 time, last by gderf ().

    • That cpu supports ecc (I would be ilogic to ship an embedded cpu without ecc support I guess for that type of board).
      Check maybe the board changelog for bios updates. Don’t know if there are other methods for checking ecc ram besides the one mentioned here.
      New wiki
      chat support at #openmediavault@freenode IRC | Spanish & English | GMT+10
      telegram.me/openmediavault broadcast channel
      openmediavault discord server
    • gaelic wrote:

      I assume 1 Bit are corrected by the Kernel itself. (true/false?)
      But what happens if a 2 Bit Error occurs which is only detectable but not correctable?
      The 'kernel' doesn't even notice when single bit errors are corrected (happens one layer below) and what happens when uncorrectable errors occur largely depends on where the bit flip happens (can be everything from minor data corruption over application weirdness to kernel panic).

      ECC DRAM without monitoring is a bit useless so you usually do

      Source Code

      1. apt install edac-utils
      2. edac-util --report=ce # corrected errors
      3. edac-util --report=ue # uncorrected errors
      With ECC DRAM even if bit flips occur at least you can take notice and react accordingly.

      The post was edited 1 time, last by tkaiser: Typo ().

    • Yes it is. But it has not failed yet. It's been running 24/7/365 since June 20, 2015. Warranty is set to expire on June 20, 2018, but ASRock says they will make good on it if it fails for that reason even if the warranty has expired.

      Unfortunately, the CPU is not user replaceable, it's integral to the board.
      OMV 4.x - ASRock Rack C2550D4I - 16GB ECC - Silverstone DS380
    • tkaiser wrote:

      gaelic wrote:

      I assume 1 Bit are corrected by the Kernel itself. (true/false?)
      But what happens if a 2 Bit Error occurs which is only detectable but not correctable?
      The 'kernel' doesn't even notice when single bit errors are corrected (happens one layer below) and what happens when uncorrectable errors occur largely depends on where the bit flip happens (can be everything from minor data corruption over application weirdness to kernel panic).
      ECC DRAM without monitoring is a bit useless so you usually do


      Source Code

      1. apt install edac-util
      2. edac-util --report=ce # corrected errors
      3. edac-util --report=ue # uncorrected errors
      With ECC DRAM even if bit flips occur at least you can take notice and react accordingly.

      Thank you very much. That's the answer I was looking for. I know that edac-util exists, but never had the opportunity to try it yet. It also seems to be deprecated (last code update 2015).

      But error detection an reporting by OMV is not done as of now. Good to know, maybe I'll start a plugin as soon as my system is up and running :)
    • @gaelic please stop submitting the same post because you get what looks like an error. Read problem #4 here - Solutions to common problems
      omv 4.1.4 arrakis | 64 bit | 4.15 backports kernel | omvextrasorg 4.1.4
      omv-extras.org plugins source code and issue tracker - github.com/OpenMediaVault-Plugin-Developers

      Please read this before posting a question.
      Please don't PM for support... Too many PMs!
    • tkaiser wrote:

      Source Code

      1. apt install edac-util
      2. edac-util --report=ce # corrected errors
      3. edac-util --report=ue # uncorrected errors

      I'm running OMV 3.X
      When I run line 1 (apt install edac-util), the output is;

      Source Code

      1. Reading package lists... Done
      2. Building dependency tree
      3. Reading state information... Done
      4. E: Unable to locate package edac-util

      Am I missing a repository?
      (I've tired a few times, waiting a day in between, in the event that the repository might be off line.)
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.99 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.99 Erasmus - Rsync'ed Backup
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119