I've been running my grounds-up built Dell C2100 rack server since November and while drives have run cooler than ever and have shown nothing but green lights on the SMART screen, I am on what looks like either the second or maybe even the 3rd drive starting to fail out of the blue, looking for some help.
I do not run a RAID first of all, stubbornly, I like to get the maximum space out of a file system, and I run desktop drives in here. The server just is for serving media to my home and running some apps to search for media, the usual stuff. It's not getting hammered and lives in a cool basement. Drive temps run 26-29 C.
Anyway, here's what happened this time.
On a routine SSH session I realized that one of my mountpoints (a 3TB Toshiba drive) had gone "read only" which is a red flag to me that OMV found something wrong in its file system. SMART screen shows green dot next to the drive. I'm afraid to reboot OMV because the last time I did that, I wasn't able to ever re-mount the drive to get the media off. I tried an rsync command to backup the files off the bad drive to another empty drive and it froze. I'm trying a basic cp command now in the shell a few files at a time and still getting Input/Output errors on every file. I tried a short self-test on the device and got this:
Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
What am I doing wrong here? Why are drives going read-only out of the blue? Am I missing something obvious the way I have things set up? Is there anything I can do to rescue this drive before I reboot OMV or unmount the drive and probably lose all my data on an entire volume AGAIN?
My drives are all set up as:
Advanced Power Management: 128 - Minimum power usage without standby (no spindown)
Automatic Acoustic Management: Maximum performance, maximum acoustic output
Spindown time: Disabled
Write cache: off
I do not run routine SMART self-tests and maybe I should be doing that moving forward (anyone have any suggestions on what tests they run?). All that said, I don't get how drives are just failing in months inside of this thing, without any warning signs.
Anything else I can do here or am I basically just going to have to expect to lose a drive every few months out of nowhere because I'm not running RAID or using network-level drives?
Happy to provide any additional detail from logs, fstab, etc.
Thanks for anyone who can help with this frustration.