Severe Installation Issues

  • Hi, I'm totally stumped. I run OMV 3 via USBon a custom built box. It's been pretty solid for the last 3 years, until it started going offline every now and then. Unfortunately, I was not able to image the USB, as it was indicating maybe bad sectors on the drive. Eventually, it has stopped coming back and since it's headless, I didn't know the exact issue. Reboot just left it in an errored state where it was unreachable.


    Fast forward, I tried reinstalling OMV 4 onto a new 64 GB SanDisk Ultra drive. All went smoothly, until I connected the internal hard drives again, and then strange errors started coming up. Since then, I've tried many different things, including connecting one hard drive at a time to see if it was due to hardware failure, reinstalling OMV, even different versions, using different USB ports, reformatting and trying again, googling and trying to follow suggestions, etc.


    The random errors keep coming up. I've taken screen shots. It keeps going to InitRamFs prompt??!?!!


    I'm really struggling to comprehend what the heck is happening, can anyone offer a suggestion? Surely it's something stupid that I've done???

  • Thanks for replying. I tried OMV 3 as well, but some screenshots were from OMV 4. Similar random errors occurred on some restarts with both OMV 3 and omv 4. They didn't happen every time I restarted.


    I just tried installing Linux Mint, and it seems to have installed correctly, booting fine. I'm not sure why OMV is not?

  • Thanks for the advice, I'll give it a try tonight. Sometimes is spits errors with or without the drives attached, it's a bit inconsistent which is odd. Different errors depending on which version of OMV was installed.


    Also, I installed Linux Mint and was able to reboot fine into it this morning. In fact, connected the drives and all worked fine as well, except one didn't show up AHH! I've backed up the important stuff, but it means I probably have lost some other stuff...which is not great...it was my only Seagate drive, but it was also the oldest one. Maybe I've finished killing the drive with all the restarts?


    Strange that every time I installed OMV though there was a grub corruption? But installing Linux Mint was fine?

  • Strange that every time I installed OMV though there was a grub corruption? But installing Linux Mint was fine?

    It's not grub corruption. It's probably a drive device name change. If grub is set to boot from /dev/sdd by the installer (we'll say this is your USB drive), and connecting additional drives results in a device name change of the USB drive to /dev/sdc , grub doesn't find a bootable drive at the location expected (/dev/sdd). So, you're sent into initramfs. That's one possibility.
    __________________________________________________________________


    Another is that your hardware has a genuine problem. Reconsider the following:

    It's been pretty solid for the last 3 years, until it started going offline every now and then. Unfortunately, I was not able to image the USB, as it was indicating maybe bad sectors on the drive. Eventually, it has stopped coming back and since it's headless, I didn't know the exact issue. Reboot just left it in an errored state where it was unreachable.


    Fast forward, I tried reinstalling OMV 4 onto a new 64 GB SanDisk Ultra drive. All went smoothly, until I connected the internal hard drives again, and then strange errors started coming up.

    Also, I installed Linux Mint and was able to reboot fine into it this morning. In fact, connected the drives and all worked fine as well, except one didn't show up AHH!

    Sometimes is spits errors with or without the drives attached, it's a bit inconsistent which is odd.

    If you have a drive port that shows up and disappears, intermittently, that might explain why Linux Mint works (only needs a single drive) where OMV doesn't work right (with more drives connected). It might also explain why grub is confused. If a drive port is appearing and disappearing, drive device names may be changing as well. Intermittent problems are, in their very nature, "inconsistent".
    ____________________________________________________________________


    You should try the sequence in @macom 's post. Maybe it's a grub issue.


    Otherwise, I'd look real close at your hardware.

  • Hi, I was able to do some backing up from the Seagate that keeps disappearing. This is while using Linux Mint.


    I resintaled OmV 4, and it failed to boot. I cahnged USB slot, and it booted. Then plugged in drives, and it fsiled. I tried a couple of drives avoiding the Seagate, failed to boot. I tired randomly plugging in certain drives until it would boot, then added one at a time, and it kept booting. Finally, got all 4 data drives to show up and boot fine, after a few reboots, was still ok...don't know what this means though.


    I didn't try updating grub, but if it starts being unreliable tonight, I'll try that. My bios is up to date. I tied the usb2 slots rather than usb3 slots, but still was failing sometimes.


    The real test, will be plugging it back into position, away from my monitor etc...and seeing if it appears on the network,..or fails to boot...


    Stay tuned...

  • I really don't get it, hopefully someone can explain this to me one day.


    It hasn't skipped a beat since the other night. I've been slowly reconfiguring it and it keeps booting fine and hard drives showing.


    Could it have been something like boot drive was installed in sda and when restarting it was seeing the hard drives without boot and wasn't adjusting the boot device and why randomly adding the drives back one at a time"fixed' it...?


    But I'm keeping mi fingers crossed it's fixed :)


    Thanks for the advice, I was really confused (and still am) about this box.

  • Thanks, ok, so apart from looking for loose cables or smoking parts inside, are there any particular things I should look into for diagnosing hardware issues?

    If only these problems were that simple, with visually identifiable burnt parts. :)


    - (The Power Supply) - There are a number of slight failures that can result in AC ripple getting onto a supply's DC rails. It doesn't take much to cause weird and inconsistent behavior.
    - (The Mobo) - While CMOS circuity is extremely power efficient, it's weakness is power transients where as little as 70 volts can blow (or arc through) gates. A simple and unnoticed static discharge, from your ungrounded hand, is far more than enough. Unfortunately, blown gates may still function and it may be years before the effects are noticed. Equally unfortunate is that they may work fine, for months at a time, before issues crop up again.
    - (A hard drive) where the interface board is experiencing intermittent issues or is failing, can produce the symptoms you've seen and it only takes one drive to do it.
    _____________________


    The intermittent problem is the scourge of an Electronics Technician, or it used to be. Unfortunately, since most circuitry in modern times is Large Scale Integrated and surface mounted, there's very little chance of repair. That leaves replacing individual assemblies and observing the result.


    If the problem returns, you could try memtest86 to look at your ram sticks. Unplugging one drive at a time might reveal something. And if you have a compatible PS, in a workstation that will work with your server, a swap is worth a try. That leaves the Mobo where, with most components disconnected, a live distro can be used for burn in testing. What it boils down to is a process of elimination that takes time.
    _____________________


    As an example:
    I had a Mobo that was doing weird things and it was getting really slow for no explainable reason. The diagnostics software that came with it revealed nothing and a rebuild, from scratch, did nothing to improve the situation. Since I had two identical boxes, I had a healthy PC as a parts supply and for comparison. I unplugged everything that wasn't essential to boot up, swapped out the PS for a known good one, replaced memory from the 2nd PC and, finally, swapped out the hard drive. Nothing improved performance or eliminated the odd behavior. The answer? I trashed the Mobo.

  • That drive had gone into read only mode, and can't remount as read write.


    I've transferred all data from it and will disconnect it, but I haven't had any issues with the box in general, touch wood. So maybe the failing hard drive was causing the symptoms until it worked itself out?


    Thanks again for spending the time to reply :)

  • And, I didn't touch wood


    After being stable for a week, it's gone totally unstable this morning. It kept losing connection via SSH, and I could barely type a few commands before it's kick me off.


    I could check the web login, but again, keeps freezing and kicking me out. I noticed the drive I just synced all the data too is showing up missing now, and I just rebooted so there's definitely something wrong. The drive that went read only is now read write...but the supposedly ok drive is now totally offline!


    I don't know, I'm about to throw the whole box out the window...

  • And, I didn't touch wood

    What?? That's first requirement!! (Sorry, an attempt at humor. :) )
    ___________________________________


    This is sounding, more and more, like classic ESD damage.


    What I would do is disconnect as much as possible from the Mobo and boot up on memtest86 (the free version). I think I'd go with a bootable USB thumbdrive. Run long memory tests for a day or more. (Maybe much longer?) This would be a combination burn-in test for the Mobo and a check of your ram sticks. If the problem shows up again, maybe look for a power supply in one of your clients, that will work with the NAS Mobo, and swap it in. ((Inspect the needed PS connections to be sure you're not dealing with something proprietary.))


    If the cut down setup works, keep adding components until the failure shows up. But, to be honest, the failure interval (a week) seems to be pretty long and note that the failure interval can vary significantly either way (shorter or longer).
    _____________________________________


    BTW: you can build OMV, on a USB thumbdrive, using any workstation you have. (Just make sure that you install OMV onto the USB thumbdrive, NOT the workstation's hard drive. You'd also need to install the flash memory plugin after the install is complete.) With a client booted up on OMV and a drive dock (or connected internally to a SATA port), you may be able to copy data off of your drives.
    _____________________________________


    I hate to mention this but you might have done this to yourself when you built your PC. Handling components outside of their anti-stat bag requires care; only touching the edges of boards (not traces or chip legs) and carefully keeping yourself grounded throughout the process or using something like -> this.


    In the bottom line, if you don't have time for tests and want a NAS right now, buying something might be the way to go.

  • good advice above
    power supplies are a common failure on pretty much everything, its why i only use corsair for my PC builds
    not forgetting the power supply needs to be powerful enough to power all the components also
    there is a massive peak power draw from the components at start up in comparison to the standby/ idle draw, the power supply needs to be oversized to allow for that


    cables need to be properly connected - check them


    hard drives failing are another common failure


    you can check this by plugging the drive in to another computer and use Crystal disk info to see the SMART errors


    ALWAYS have an offline copy of your data
    NEVER trust one system with your data, it is at risk always


    The more complicated a system the higher the chance of things going wrong...and the longer it will take to identify and fix

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!