Posts by crashtest

    There's very little point in debating how drives fail, anecdotally. There's plenty of opinions (volumes) on the net about this topic.
    -> How hard drives fail.


    I think it is ultimately about OMV not being the one that "marked" it as dead, therefore it never happened as far as it's concerned.

    Let me reemphisize that OMV is a NAS "application". For the most part, when it comes to core packages like smartctl, OMV is simply "the messenger". It creates command lines for action, given to the OS, and it relays command line messages and output to the GUI. The core app (OMV) and plugins (like md) are skipping the command line drudgery and fat finger mistakes, while guiding new users toward productive ends. That's what OMV does in a nutshell. OMV helps users to make sense of Debian Linux - that's it.
    ______________________________________________________________________________

    While you've provided your opinions of what's happening, you haven't provided hard information (SMART stat's, sys logs, etc.). Without sitting in front of your console, I can't even begin to explain one of the myriad of scenarios (potentially infinite) of hard drive failure or how it's being interpreted by OMV. The best thing to do (for your data) is to acknowledge that the drive has issues and remove the slightly crippled or bad hard drive from the array before it causes serious damage and move on.

    Along those lines, I think you have the information needed to replace the drive.

    Electrically, it is the same. If the drive dies, what distinguishes it from a missing drive? It's the same thing in terms of just disappearing.

    There are two major componentes of a spinning hard drive - the media (heads and platters) and the interface board. A missing drive would be registered if the drive's interface is not connected to the port. It is possible (but unlikely) that a drive can still be connected and appear to be missing. (In such a case, the interface board would not respond to the motherboard's queries.)

    Drives can fail in a great many ways. Rare is the event were a drive dies in a "lights out" fashion but it can happen that way. Unfortunately, drives tend to fail slowly, with an increasing potential to corrupt ever greater amounts of data as they fade out.


    Apparently OMV doesn't care about read errors until the SMART count gets to a certain level?

    First this in not about "OMV". The SMART package (smartctl) and Debian read and handle SMART stat's. SMART was implement to give users a bit more visibility into the health of a drive and / or to help determine if a drive is failing.


    Second, you'd need to post the actual SMART stat's, for the drive, for a guess or an interpretation.

    Hard drive read errors happen - there literally billions (trillions?) of operations as bits and bytes are read and written. To counter that, there's error correcting code in the interface board that corrects read and write errors on the fly. Issues come into play when errors can't be corrected anymore.


    Lastly, drive OEM's (Seagate, WD, HGST, etc.) implement SMART in the manner they want to. Interpreting SMART stat's can be significantly different between the brands.


    If you're interested in what's going on with your drive, as in if there's a good chance it's on the verge of failure, consider the following. BackBlaze did a study of SMART stat's that were the most likely to indicate a future failure. The following was the result:


    SMART 5 – Reallocated_Sector_Count.

    SMART 187 – Reported_Uncorrectable_Errors.

    SMART 188 – Command_Timeout.

    SMART 197 – Current_Pending_Sector_Count.

    SMART 198 – Offline_Uncorrectable.


    Any one count, in the raw counts of the above, is not a real concern. If they start to increment upward, it's time to worry a bit. Somewhere around 3 to 5, to be safe, I'd be ordering another drive.

    Finally, there's the following:
    SMART 199 - UltraDMA CRC errors


    Usually 199 is hardware and cable related, but it can be something to do with the drive or mobo interfaces as well.

    The rest of spinning drive SMART stat's, their meaning, impact, etc., would have to be determined on drive OEM's websites. They might be serious, informational, or nothing at all.


    First and foremost: Do you have backup? If not, you might consider getting a big external drive and backing up the array first. Why? Along with the possibly of making "FAT finger mistakes", working with a RAID array has it's hazards.
    _________________________________________________________________________

    The following GUI process would apply "if" you have the physical room and an extra port connection, for adding a new drive before removing the drive with issues.

    - First, physically install the new drive.
    - Under Storage, Disks. Find the new drive and wipe it.
    - Go to Storage, Multiple Device.
    - Highlight your md? array, and click the Recover icon.
    - Add the new drive as a spare.
    - Then you can physically remove the disk with issues, noting the following:

    **Note: it's necessary to know what the device name is for the drive you want to remove. Every reboot can affect drive device names so the name /dev/sd?, must be rechecked after every reboot . This can be done under Storage, Multiple Device and cross verified by Storage, Disks.

    If you want to do this on the command line, following is a short summary of command line actions:


    mdadm --add /dev/md? /dev/sd? # add the new drive a spare

    mdadm --fail /dev/md? /dev/sd? # mark the drive with issues as failed

    mdadm --remove /dev/mdX /dev/sd? # remove failed drive

    A more complete explanation for the command line process can be found -> here.

    I'm.thinking now I'll just have to spend a day and reinstall from scratch

    If something is corrupted, in the OS or other, a rebuild is the logical choice. If you rebuild your OS on a thumb-drive, note that cloning thumb-drives is easy. With a clone you'd have 100% OS backup, that can easily be restored and have you up and running in a matter of minutes -> cloning flash media. (You might take the time to read the -> Operating System Backup section for the rational.)

    Even if you manage to paint yourself into a corner by misconfiguration, or if something goes south with an update, being able to "punt" back to a known working boot drive could save a lot of time and aggravation.
    ___________________________________________________________________________

    With the above noted, OS backup won't save you from actual hardware failures. (But it would help in making that determination more quickly.) I would be concerned about the ata10 errors. Again, rebuilding to a thumb-drive might give you some insight into, whether on not, your boot drive is failing.

    I also have a backup of the omv os drive from two weeks ago, and so i'm wondering if the best thing is to unpack that onto a spare hard drive just to see what were all the net settings,


    This is, probably, your best option. (BTW: You might want to stick with your backup until there's a break at the college.)

    If you think about it, without knowing what you've done, you're asking someone to speculate on something you've personally setup.

    _____________________________________________________________________

    Generally speaking; making two changes at once, to save time, is not a good idea. If something odd happens, the result is guesswork on what might be the issue.

    The last few days I've been working to try to recover my omv after a failed upgrade and relocation of hard drives to new enclosures.

    Bad sectors (increasing) means the drive is sliding toward complete failure. There's no knowing exactly when it will fail outright. And, yes, you should be concerned because the drive may be corrupting your files.

    I'm wondering if I should remove the disk until I get a new one, or if that's even possible?

    If you're running a RAID1 mirror, RAID5, or an equivalent you can remove the drive, but realize that there is some risk involved in trying to run a RAID array without a member disk.

    Otherwise - if RAID is not involved:
    1. You might think about getting an external (or internal) drive and -> rsync'ing your data to it, as soon as possible. (If the disk is large enough, you could rsync an entire RAID array.)
    2. Shut down the server until you get a replacement.

    Note: Time is not on your side.

    I must say that waiting until you're on the verge or having an actual data disaster is not the time to start thinking about backup.

    For a review of SMART stat's related to SSD aging and failure, this -> Backblaze article is worth a read. While I'm not running SSD's for storage, I believe I'd be more interested in SMART 169 or 202 (depending on the brand) for life remaining.

    You might consider OMV's notifications and filesystem reporting as demo'ed -> here.


    And a ref for setting up SMART monitoring in OMV -> here


    When it comes to SMART Tests, with an SSD, I might consider a SHORT test once or twice a month. A SHORT test does not "write data" so there shouldn't be any impact to an SSD's life.

    On the other hand, a LONG test is geared toward checking the magnetic media of spinning drives. (I suspect a LONG test, which writes data to the drive, would result in unnecessary wear to a SSD.) I wouldn't schedule a LONG test for an SSD.
    ______________________________________________

    From the sound of it, you might consider looking at backing up your data. -> Backup

    Note that a flashing interface "light" is a receive function.

    Have you checked the switch or router? (Switch the cable to another port.)
    What kind of boot drive are you using? (Spinning drive, SSD, thumbdrive, etc.)
    What hardware platform are you using? amd64, single board computer (SBC)? etc.
    Have you considered that you might have an actual hardware problem?

    Realize that you haven't provided enough to even take a guess.

    Today, I caught a bee swarm! My wife noticed it in an apple tree, in the garden.

    I got out my homemade bee vacuum and sucked them into their new home - a large brood box with frames and foundation. It was kind of tricky because I did it after dark.

    "Hopefully", they'll adopt the hive and won't try to leave it tomorrow. If they're there, early in the AM, I plan to block them in for a few days. I have a feeder with sugar water inside the hive. It's a socialist attractant that's kind of like food stamps. :- )

    But since i never deal with rsync I am looking for a comprehensive guide how to set this up on omv and covers also topics like permissions etc. ?

    Rsync copies file permissions. That's why it's useful for mirroring individual network shares or full hard drives.

    Some NAS Permissions topics are covered -> here.

    OK,,, I put a note in the Imager section, that the user created by the Imager (whatever name the installer may chose) and the Rasp-PI OS default user pi will not have access or OMV "stuff" OR they will have to be entered into the user section of the GUI.

    In the hope of preventing other install issues, I also added a "Do not deviate" FOREWORD.

    First off, it seems I'm going to have to put a note in the guide, not to deviate from the guide.

    _______________________________________________________

    With the above noted and with emphasis on #2 below, nowhere in the guide does it say to:

    1. Flash an SD-card with Rasp. PI OS.
    (Then)
    2. Install users of your choice, configure WiFI, add additional non-OMV packages, etc., etc.
    3. Install OMV.

    Where #2 is concerned, it is IMPOSSIBLE to to predict or anticipate what users many do, at any point, before, during, or after:
    (1) Flash the card
    (2) Install OMV.

    There are an infinite number of ways to break an install.


    Until I created a new user and it worked fine. As it happens, this method of installation adds OMV onto an existing raspberry pi OS installation, so the default user is picked up by OMV and can get folders shared without any additional user setup in the UI... BUT technically, it IS created before Samba is running; it's even created before OMV is even installed. So that might explain why I had to run `smbpasswd -a username` for samba to start working for that user.

    Take a look -> here. It's relatively easy to provide samba access to existing Windows users on the LAN.

    Of course, if it's not obvious, the info in this document applies AFTER OMV is installed AND you'll need to install users as the document illustrates - in the GUI.

    At the end of the doc, under Permission Notes, using an externally linked doc:
    You can set up a "universal access" user VIA the Windows Credential manager.

    If you're using DHCP at the router (which would mean the router is the proxy for your upstream resolver), have you tried rebooting the router?

    Even if the server address is provided by DHCP, the DNS server can be manually set under Network, Interfaces, (highlight and edit your device) and set your DNS server under Advanced Settings. (Maybe 1.1.1.1 or 8.8.8.8 for a test?)

    First:
    Did you do the installation on a NEW, good quality, SD-card (the preferred approach).
    Or, at the minimum, did you run a full test on the SD-card you used (as laid out in the install guide)?

    Second:
    The scripted install method relies on (1) a good internet connection and (2) software repos's that are solidly on-line.
    I'm not sure what area of the world you're in but, FOSS software mirrors are supplied by admin's and companies that are donating free server time. With that in mind, try your installation when software repo's and network connections are more likely to be on-line which is, typically, Monday through Friday.

    So if this first line backup would happen to be the 2nd disk in the 2-bay NAS, then I should expect this recovery to go smoothly.

    If you're primarily interested in easy backup, syncing two disks using rsync works well enough. It's easy to setup and, if the primary disk fails, easy recovery is easy as well. (A matter of minutes, not hours.)