Severe Installation Issues

    • Severe Installation Issues

      Hi, I'm totally stumped. I run OMV 3 via USBon a custom built box. It's been pretty solid for the last 3 years, until it started going offline every now and then. Unfortunately, I was not able to image the USB, as it was indicating maybe bad sectors on the drive. Eventually, it has stopped coming back and since it's headless, I didn't know the exact issue. Reboot just left it in an errored state where it was unreachable.

      Fast forward, I tried reinstalling OMV 4 onto a new 64 GB SanDisk Ultra drive. All went smoothly, until I connected the internal hard drives again, and then strange errors started coming up. Since then, I've tried many different things, including connecting one hard drive at a time to see if it was due to hardware failure, reinstalling OMV, even different versions, using different USB ports, reformatting and trying again, googling and trying to follow suggestions, etc.

      The random errors keep coming up. I've taken screen shots. It keeps going to InitRamFs prompt??!?!!

      I'm really struggling to comprehend what the heck is happening, can anyone offer a suggestion? Surely it's something stupid that I've done???
      Images
      • IMG_20190624_123608366_compress3.jpg

        588.73 kB, 4,032×3,024, viewed 42 times
      • IMG_20190625_163540209_compress4.jpg

        604.34 kB, 4,032×3,024, viewed 37 times
      • IMG_20190624_110810894_compress64.jpg

        564.43 kB, 4,032×3,024, viewed 30 times
      • IMG_20190625_170132117_compress50.jpg

        580.05 kB, 4,032×3,024, viewed 34 times
    • Thanks for the advice, I'll give it a try tonight. Sometimes is spits errors with or without the drives attached, it's a bit inconsistent which is odd. Different errors depending on which version of OMV was installed.

      Also, I installed Linux Mint and was able to reboot fine into it this morning. In fact, connected the drives and all worked fine as well, except one didn't show up AHH! I've backed up the important stuff, but it means I probably have lost some other stuff...which is not great...it was my only Seagate drive, but it was also the oldest one. Maybe I've finished killing the drive with all the restarts?

      Strange that every time I installed OMV though there was a grub corruption? But installing Linux Mint was fine?
    • gabs247 wrote:

      Strange that every time I installed OMV though there was a grub corruption? But installing Linux Mint was fine?
      It's not grub corruption. It's probably a drive device name change. If grub is set to boot from /dev/sdd by the installer (we'll say this is your USB drive), and connecting additional drives results in a device name change of the USB drive to /dev/sdc , grub doesn't find a bootable drive at the location expected (/dev/sdd). So, you're sent into initramfs. That's one possibility.
      __________________________________________________________________

      Another is that your hardware has a genuine problem. Reconsider the following:

      gabs247 wrote:

      It's been pretty solid for the last 3 years, until it started going offline every now and then. Unfortunately, I was not able to image the USB, as it was indicating maybe bad sectors on the drive. Eventually, it has stopped coming back and since it's headless, I didn't know the exact issue. Reboot just left it in an errored state where it was unreachable.

      Fast forward, I tried reinstalling OMV 4 onto a new 64 GB SanDisk Ultra drive. All went smoothly, until I connected the internal hard drives again, and then strange errors started coming up.

      gabs247 wrote:

      Also, I installed Linux Mint and was able to reboot fine into it this morning. In fact, connected the drives and all worked fine as well, except one didn't show up AHH!

      gabs247 wrote:

      Sometimes is spits errors with or without the drives attached, it's a bit inconsistent which is odd.
      If you have a drive port that shows up and disappears, intermittently, that might explain why Linux Mint works (only needs a single drive) where OMV doesn't work right (with more drives connected). It might also explain why grub is confused. If a drive port is appearing and disappearing, drive device names may be changing as well. Intermittent problems are, in their very nature, "inconsistent".
      ____________________________________________________________________

      You should try the sequence in @macom 's post. Maybe it's a grub issue.

      Otherwise, I'd look real close at your hardware.

      The post was edited 1 time, last by crashtest ().

    • Hi, I was able to do some backing up from the Seagate that keeps disappearing. This is while using Linux Mint.

      I resintaled OmV 4, and it failed to boot. I cahnged USB slot, and it booted. Then plugged in drives, and it fsiled. I tried a couple of drives avoiding the Seagate, failed to boot. I tired randomly plugging in certain drives until it would boot, then added one at a time, and it kept booting. Finally, got all 4 data drives to show up and boot fine, after a few reboots, was still ok...don't know what this means though.

      I didn't try updating grub, but if it starts being unreliable tonight, I'll try that. My bios is up to date. I tied the usb2 slots rather than usb3 slots, but still was failing sometimes.

      The real test, will be plugging it back into position, away from my monitor etc...and seeing if it appears on the network,..or fails to boot...

      Stay tuned...
    • I really don't get it, hopefully someone can explain this to me one day.

      It hasn't skipped a beat since the other night. I've been slowly reconfiguring it and it keeps booting fine and hard drives showing.

      Could it have been something like boot drive was installed in sda and when restarting it was seeing the hard drives without boot and wasn't adjusting the boot device and why randomly adding the drives back one at a time"fixed' it...?

      But I'm keeping mi fingers crossed it's fixed :)

      Thanks for the advice, I was really confused (and still am) about this box.
    • gabs247 wrote:

      Thanks, ok, so apart from looking for loose cables or smoking parts inside, are there any particular things I should look into for diagnosing hardware issues?
      If only these problems were that simple, with visually identifiable burnt parts. :)

      - (The Power Supply) - There are a number of slight failures that can result in AC ripple getting onto a supply's DC rails. It doesn't take much to cause weird and inconsistent behavior.
      - (The Mobo) - While CMOS circuity is extremely power efficient, it's weakness is power transients where as little as 70 volts can blow (or arc through) gates. A simple and unnoticed static discharge, from your ungrounded hand, is far more than enough. Unfortunately, blown gates may still function and it may be years before the effects are noticed. Equally unfortunate is that they may work fine, for months at a time, before issues crop up again.
      - (A hard drive) where the interface board is experiencing intermittent issues or is failing, can produce the symptoms you've seen and it only takes one drive to do it.
      _____________________

      The intermittent problem is the scourge of an Electronics Technician, or it used to be. Unfortunately, since most circuitry in modern times is Large Scale Integrated and surface mounted, there's very little chance of repair. That leaves replacing individual assemblies and observing the result.

      If the problem returns, you could try memtest86 to look at your ram sticks. Unplugging one drive at a time might reveal something. And if you have a compatible PS, in a workstation that will work with your server, a swap is worth a try. That leaves the Mobo where, with most components disconnected, a live distro can be used for burn in testing. What it boils down to is a process of elimination that takes time.
      _____________________

      As an example:
      I had a Mobo that was doing weird things and it was getting really slow for no explainable reason. The diagnostics software that came with it revealed nothing and a rebuild, from scratch, did nothing to improve the situation. Since I had two identical boxes, I had a healthy PC as a parts supply and for comparison. I unplugged everything that wasn't essential to boot up, swapped out the PS for a known good one, replaced memory from the 2nd PC and, finally, swapped out the hard drive. Nothing improved performance or eliminated the odd behavior. The answer? I trashed the Mobo.

      The post was edited 1 time, last by crashtest ().

    • That drive had gone into read only mode, and can't remount as read write.

      I've transferred all data from it and will disconnect it, but I haven't had any issues with the box in general, touch wood. So maybe the failing hard drive was causing the symptoms until it worked itself out?

      Thanks again for spending the time to reply :)
    • And, I didn't touch wood

      After being stable for a week, it's gone totally unstable this morning. It kept losing connection via SSH, and I could barely type a few commands before it's kick me off.

      I could check the web login, but again, keeps freezing and kicking me out. I noticed the drive I just synced all the data too is showing up missing now, and I just rebooted so there's definitely something wrong. The drive that went read only is now read write...but the supposedly ok drive is now totally offline!

      I don't know, I'm about to throw the whole box out the window...
    • gabs247 wrote:

      And, I didn't touch wood
      What?? That's first requirement!! (Sorry, an attempt at humor. :) )
      ___________________________________

      This is sounding, more and more, like classic ESD damage.

      What I would do is disconnect as much as possible from the Mobo and boot up on memtest86 (the free version). I think I'd go with a bootable USB thumbdrive. Run long memory tests for a day or more. (Maybe much longer?) This would be a combination burn-in test for the Mobo and a check of your ram sticks. If the problem shows up again, maybe look for a power supply in one of your clients, that will work with the NAS Mobo, and swap it in. ((Inspect the needed PS connections to be sure you're not dealing with something proprietary.))

      If the cut down setup works, keep adding components until the failure shows up. But, to be honest, the failure interval (a week) seems to be pretty long and note that the failure interval can vary significantly either way (shorter or longer).
      _____________________________________

      BTW: you can build OMV, on a USB thumbdrive, using any workstation you have. (Just make sure that you install OMV onto the USB thumbdrive, NOT the workstation's hard drive. You'd also need to install the flash memory plugin after the install is complete.) With a client booted up on OMV and a drive dock (or connected internally to a SATA port), you may be able to copy data off of your drives.
      _____________________________________

      I hate to mention this but you might have done this to yourself when you built your PC. Handling components outside of their anti-stat bag requires care; only touching the edges of boards (not traces or chip legs) and carefully keeping yourself grounded throughout the process or using something like -> this.

      In the bottom line, if you don't have time for tests and want a NAS right now, buying something might be the way to go.

      The post was edited 1 time, last by crashtest ().

    • good advice above
      power supplies are a common failure on pretty much everything, its why i only use corsair for my PC builds
      not forgetting the power supply needs to be powerful enough to power all the components also
      there is a massive peak power draw from the components at start up in comparison to the standby/ idle draw, the power supply needs to be oversized to allow for that

      cables need to be properly connected - check them

      hard drives failing are another common failure

      you can check this by plugging the drive in to another computer and use Crystal disk info to see the SMART errors

      ALWAYS have an offline copy of your data
      NEVER trust one system with your data, it is at risk always

      The more complicated a system the higher the chance of things going wrong...and the longer it will take to identify and fix
      Fan of OMV, but not a fan of over-complication.

      The post was edited 1 time, last by clunkyfunkymonkey ().