Bad HDD screwing everything up?

    • OMV 2.x
    • Resolved
    • Bad HDD screwing everything up?

      Hi all,

      I have OMV 2.0 running on my media server with 1 smaller (500GB) drive for the OS, 2 4TB Seagate drives (cheap, ripped from external HDDs) and 1 Hitachi 3TB for content, and 1 4TB Seagate drive for SnapRAID.

      Lately, my system has had trouble booting up. It will hang with a blinking cursor after the BIOS screen. It will start up properly if I manually power it off and power it back on however. One of my 4TB content drives has also been dropping in and out. It will be fine and SMART tests indicate it is good, but it will stop showing details in the SMART menu about once a day (temp and serial number disappear) drop out of my listed physical disks periodically.The web gui also seems to be lagging and losing connection (i often get "Communication failure with details - Communication failure").

      I tried changing out the SATA cable to the drive to no avail. This morning I woke up to find that the drive in question /dev/sdc had dropped, and when it came back online it had registered as /dev/sdf/ and therefore was not part of my aufs data pool. I could of course re-add it to the pool, but it's becoming quite alarming.

      My question therefore is how do I ensure this drive is bad if SMART reports no errors (short and long selftests), but it's dropping off. Also, could these other issues I am seeing (hanging forever on boot, unresponsive web gui) be a function of this drive failure?

    • Hi, the problem you have are exactly the same problems I had a while back on my Ubuntu server (hanging forever on boot). The only solution for me at the time was booting in save mode and removed the mount.
      Since you have problems with a drive and the results are the same as the problem I had, I'm pretty sure its the same thing.
      The easiest solution would obiously removing the mount and see if the problem still occurs.

      For testing your drive there are quit a few options availble, but software can't test incidentally failures in the electronics or cable, so passing a test problably only ensures you the platers are okay. It doesn't tell you if the whole drive is always working fine.
      You could put that drive in one of the enclosures you had your seagate drivers in, just to make sure its not the cables or connector in your motherboard and such.

      p.s. Me and a lot of other guys I know use the same ripped external drive as you do, it contains the ST4000LM016, which is actually a very decent and energy efficient drive.
    • Update:

      I removed the problematic drive, and it seems to have restored normal functionality to the OMV web GUI.

      However, I am still having the problem of the system hanging at bootup. I can restart manually (power switch) and it will boot up normally, but if I try to restart from terminal or via the web GUI, it hangs at the blinking cursor where the OS usually starts.

      Any thoughts?