Snapraid sync -h or scrub results in thousands of errors going from OMV3 (snapraid 11.1) to OMV4 or 5 (snapraid 11.3), tests confirm OMV / Snapraid software issue.

  • snapraid sync -h or snapraid scrub -p new will result in thousands of errors after upgrading OMV3 to 4 or 5. Through a series of tests, I've eliminated any possible hardware issues, and can reliably replicate the issue by only installing a newer version of the software. If there was a hardware issue, I should see errors on all software versions, but I do not.


    Summary:

    OMV 3 with snapraid 11.1 = NO ERRORS (tested multiple times to make sure)
    Fresh install of OMV 4 or 5 with snapraid plugin (v11.3) = thousands of errors with just 2 test files.
    OMV 4 or 5 with snapraid 11.1(manual snapraid install, no plugin) = few errors, much better than v11.3 but still broken.  snapraid 11.1 only works on OMV3.


    Errors:

    snapraid sync -h

    Code
    Data change at file '/srv/dev-disk-by-label-250GB/test/Video-01.mp4' at position '6129'
    WARNING! Unexpected data modification of a file without parity!

    snapraid scrub -p new

    Code
    Data change at file '/srv/dev-disk-by-label-2TBblack01/Storage/Backup/8700k_full_b1_s1_v2.tib' at position '286007'
    WARNING! Unexpected data modification of a file without parity!
    Try removing the file from the array and rerun the 'sync' command!


    When I encountered the issue after upgrading, first thing I did was test my hardware. Note, before I upgraded, OMV3 w snapraid 11.1 had been running on this system 24x7 for over 3 years with 0 issues.


    Hardware stability tests:

    on problem machine ran these tests

    1. 17hr 8 full pass test with memtest86+ - no errors
    2. prime95 small fft's 4hrs - no error (max cpu temp 57c)

    3. prime95 blend test 4hrs - no errors

    4. SMART extended test on all drives - no errors


    Snapraid tests:

    Tested on 2 different machines, Core i5 4590 works, Core2quad q9550 does not.

    Each OMV version was fresh installed, not upgraded.

    Used 3 drives for tests, 2 data, 1 parity.

    2 different video files (1 on each data drive)

    Content and parity files deleted between test runs, data untouched between tests.

    1. ran snapraid sync -h

    2. ran snapraid scrub -p new


    Results:

    OMV 3 with snapraid plugin 3.73 (v11.1) = NO ERRORS (tested multiple times to make sure)

    A combination of OMV 4 or 5 with snapraid 11.3 = thousands of errors with just 2 test files. (errors on core2quad machine only)

    OMV 4 or 5 with snapraid 11.1(manual snapraid install, no plugin) = few errors, much better than v11.3 but still broken. snapraid 11.1 only works if on OMV3. (errors on core2quad machine only)


    Issue is specific to my hardware layout (LGA775 Core2quad q9550, P45 chipset), tested on newer hardware (HP z230 corei5 4590 C226 chipset) and there were no errors. Also, I moved 2 of the sata controllers from the problem machine to the newer hardware testbed, with test drives connected to these controllers, (to rule out the sata controllers) and used the same drives throughout all tests on both machines. I would guess the issue could probably be replicated with the same CPU / chipset as my problem machine.

    I've proven the issue can be predictably replicated just by upgrading the software from OMV 3 w/snapraid 11.1 to OMV 4 or 5 w/snapraid 11.3.

    Again, OMV 3 w/snapraid 11.1 works with no errors, I've put my 12TB of data back on OMV3 for now and had to resync the array.



    Problem machine System specs:

    Openmediavault 5.3.10-1

    Snapaid plugin 5.0.5 (snapraid V11.3)

    Motherboard: Gigabyte GA-EP45-UD3P (LGA775 P45 chipset)

    CPU: Core2quad q9550

    Memory: OCZ2N800SR4GK 4x2GB sticks (DDR2 800 5-4-4-15 2.1v)

    PCIE Sata Cards: 4x startech PEXESAT3221 2 port sata cards with ASM1062 controller

    Power Supply: OCZ750FTY - Fatal1ty Gaming Series 750 Watt 80+ Bronze

    Parity Drives: 2x Fantom GF3B8000EU (8TB drives connected via ESATA)

    Data Drives: 9 drives mixed 1TB, 2TB, 4TB, 6TB... mostly all WD, one Hitachi, can provide full details if needed.

  • omvnj2323

    Added the Label OMV 5.x
  • omvnj2323

    Changed the title of the thread from “Snapraid breaks on LGA775 upgrading from OMV3 (snapraid 11.1) to OMV4 or 5 (snapraid 11.3)” to “Snapraid sync -h or scrub results in thousands of errors going from OMV3 (snapraid 11.1) to OMV4 or 5 (snapraid 11.3), tests confirm OMV / Snapraid software issue.”.
  • Ok, now I'm stumped because I thought I would reproduce the same errors on another, almost identical board and cpu I dug up, but I do not. This would make me jump to a possible hardware issue that was not revealed in any stress tests, but it still doesn't explain why snapraid runs perfectly clean in OMV3 w/snapraid 11.1, and has for years until I upgraded.


    Again this is what I'm seeing on the problem machine.

    OMV 3 with snapraid 11.1 = NO ERRORS (tested multiple times to make sure)
    Fresh install of OMV 4 or 5 with snapraid plugin (v11.3) = thousands of errors with just 2 test files.
    OMV 4 or 5 with snapraid 11.1(manual snapraid install, no plugin) = few errors, much better than v11.3 but still broken.

    snapraid 11.1 only works if paired with OMV3.



    I mentioned I dug up a nearly identical Gigabyte P45 chipset board and core2quad cpu. The only real difference between the motherboards is the problem machine has 2x PCIE 2.0 slots and 2x 1GB realtek r8169 NIC's. The test board I dug up only has 1 PCIE 2.0 slot and 1 onboard NIC. The CPU's are identical. Both boards are running the latest BIOS. The BIOS's are basically identical and configured the same. Same 3 drives used for all testing. You would think at this point this sounds like bad hardware on the problem machine, but remember, OMV3 / snapraid 11.1 scrubs without errors. Almost every block is an error when running OMV4 or 5 w/snapraid 11.3.


    Problem machine specs:

    Motherboard - GA-EP45-UD3P rev 1.6 (P45 express chipset, ICH10R)

    CPU - Core2quad q9550


    2nd test machine specs:

    Motherboard - GA-EP45-UD3R rev 1.1 (P45 express chipest, ICH10R)

    CPU - Core2quad q9550

  • The last test was moving the RAM from the problem machine to the test bed. Still no errors on OMV 5, snapraid 11.3 on test bed. Remember, the only issue I've ever had with the problem machine is only with snapraid, but not on OMV 3, snapraid 11.1. I just scrubbed the full 12TB of data with no errors on that version. Not sure what the deal is with that board and CPU. Lets just say I'm done with this hardware and moved to a more consolidated approach without drives hanging out of the system.


    I've since moved my data over to a dell R320 with an LSI SAS9207-8e HBA adapter, HP MSA60 storage shelf. All works fine. Will be testing a netapp storage shelf with 6Gbps dell controller once I get the parts.

  • I resolved by moving to different hardware. If it was the only hardware I had, I may have persisted. I spent a lot of time and effort trying to figure it out, but I had a much better hardware platform to move to.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!