Alright, I wanted to give an update on this as I have had some success in troubleshooting and the pools seem to be holding steady right now.
These are the steps I have troubleshooted so far in case anyone needs to follow the same steps:
- Switched location of drives - this caused more drives in random pools to see checksum errors, so no consistency or answers here
- Scrubbed and cleared the pools in question - afterwards more checksum errors showed up randomly but never an I/O error which indicates something in data/power transfer
- Checked HBA's - Two Dell PERC Cards and an internal motherboard HBA. All worked good with the correct firmware
- Switched mini SAS 8087 cables to backplanes - this was to see if certain cables were causing the problem. Nothing found as the checksum errors afterwards were random across cables
- Mapped out drives In a grid according to the Slot Location on the Norco 4224 case to see if errors were occurring on certain bays. Nothing initially conclusive on this but more info further down.
- Replaced PSU - This was to see if power was the issue. PSU tester did show old PSU not giving enough power so this may have been part of the cause
- Finally, looked at grid of drives (17 total) in relation to the backplanes - This I believe was the issue. All drives in question were part of the bottom 3 backplanes of the case.
- Top 3 backplanes in case go to MB, directly to internal HBA while bottom 3 go to two HBA cards
- Given the notoriety of Norco and their backplanes I had some extras on hand an swapped them in
- So far after two days (includes scrubbing and clearing the ZFS errors) there have been no more Checksum error
I am going to keep checking on the drives, running a couple smart tests and transferring some data on them to see if anything else pops up but it sure seems to be pointing to the backplanes given the testing. It is really frustrating to try and solve these issues as there are so many moving parts in the setup of these server cases but this along with many other threads make me want to switch to a Supermicro server. I can't do it right now but it is definitely in the upgrade path in the future.
Thank you for the help on this, and if anything changes I will post updates to this.