Why is OMV so bad dealing with disk failures?

rs232 · 9. Mai 2023

First of all I love OMV so please don't be mislead by the title. Still I would like to make a point and ask some guidance.

Last week I had a single drive failure so I ordered a replacement and installed this back into the system.

This is a VM on Proxmox btw.

The first thing that stands out is: OMV doesn't like at all not to have the broken disk connected, I can even delete the shares that were pointing to it.

I can't remove the filesystem as the trash icon (delete) is greyed out.

I haven't even tried changing samba config as I'm already stuck here.

So bottom line:

- Is there a guidance on how to deal with HD failure (the new one will have different uuid of course)

- Can this scenario in general be taken into account and allow the user a safe recovery rather than config file editing etc?

Thanks

crashtest · 10. Mai 2023

The reason why you can't delete the drive is it's still referenced. You'll see that under Storage, File systems.

Is the data gone? Do you have backup? (If you have backup, you could save yourself some configuration work by restoring the backup to the new drive and redirecting your Shared Folders.)

Assuming the data is gone and there is no backup, you'll need to uninstall the services associated with the failed drive in reverse order of configuration.

Generally that will be:
- Delete Samba or NFS shares associated with the drive.
- Delete Shared Folders associated with the drive.
- Delete SMART tests that are scheduled for the drive.

- Delete other misc. services that may be associated with the drive (Rsync jobs ,etc.)

At this point check the File Systems window to see if the drive is still referenced. If you've deleted everything, the "Referenced" check mark will be gone.

At that point, you'll be able to Unmount the drive and delete it.

rs232 · 10. Mai 2023

Zitat von crashtest

The reason why you can't delete the drive is it's still referenced. You'll see that under Storage, File systems.

Is the data gone? Do you have backup? (If you have backup, you could save yourself some configuration work by restoring the backup to the new drive and redirecting your Shared Folders.)

Assuming the data is gone and there is no backup, you'll need to uninstall the services associated with the failed drive in reverse order of configuration.

Generally that will be:
- Delete Samba or NFS shares associated with the drive.
- Delete Shared Folders associated with the drive.
- Delete SMART tests that are scheduled for the drive.
- Delete other misc. services that may be associated with the drive (Rsync jobs ,etc.)

At this point check the File Systems window to see if the drive is still referenced. If you've deleted everything, the "Referenced" check mark will be gone.

At that point, you'll be able to Unmount the drive and delete it.

Alles anzeigen

Thank you I think my point was more generic. I do have an offsite backup this is not my point here.

Why shouldn't be allowed (as a user) to remove a referenced disk/share that no longer exists for whatever reason? At the moment you get the most colorful (disruptive) errors when a reference object doesn't exist any more.

This is what my post if about.

All the steps you suggest will at some point fail if the underlying filesystem disappears. I had in the past luck removing a share but the filesystem markes as missing wouldn't go until I wdit the config manually (trash icon greyed out). I don't know if you ever experienced a drive disappearing in OMV but on some pages you get an error loading (e.g. filesystem/shares) this error if often disruptive so it's not only linked to the failed element, it prevents from working elements from being displayed at all.

I'm not new here, I've been using OMV for a decade or so. This is an issue with 6.X must address. Funky ajax interface is really nice to have but it needs to be functional as well and cover for this exceptional cases.

P.S. about my issue in the OP: as usual I had to edit the config manually to have the GUI operational again.

crashtest · 10. Mai 2023

Zitat von rs232

Why shouldn't be allowed (as a user) to remove a referenced disk/share that no longer exists for whatever reason? At the moment you get the most colorful (disruptive) errors when a reference object doesn't exist any more.

Apparently, you're in a camp where, with 3 clicks of a mouse, users would be allowed to destroy what could be (potentially) hours of work? Let's set that aside for a moment.

When a drive is physically added to OMV, it's "recognized". This is automatic.

1. When a drive is formatted and mounted: The drive is assigned entries in fstab and it's given entries in OMV's DB (which makes the following possible).
2. Shared Folders are now possible, which makes data available to users other than root. Shared Folder entries are logged into OMV's config DB. (which makes the following possible).
3. SMB or NSF network shares may now be created. (Which, finally, sets up a functioning NAS.)
4. All of the above configuration details interact with users that may have been added to the system and additional services such as Rsync and SMART testing, among others.

Since the above is a layered approach, it stands to reason that it should be removed in the reverse order. When a building is constructed, it's not deconstructed by removing the foundation first.

___________________________________________________________________________

As it seems, you're of the belief that users (many of which do not understand the background intricacies involved in setting a NAS up) should be able to go straight to #1, and and simply delete the drive? If that was allowed OMV's config DB could not be maintained in a consistent state.

I could go on about the ramifications of allowing something like that, from within the GUI, but there's plenty of evidence of what happens when users force a "short cut" to #1, using the command line, in this forum.

Be aware that the easy to use GUI attracts beginners that have little to no idea of what is required to setup and create usable network shares. When given a free hand in configuration, more than one user has said (in basic terms) "why did you allow me to do that"?

In the bottom line, what you're asking for is not going to happen. It would make OMV unsupportable as users stumble onto ever more creative ways to unintentionally destroy their systems.

rs232 · 10. Mai 2023

Zitat von crashtest

Apparently, you're in a camp where, with 3 clicks of a mouse, users would be allowed to destroy what could be (potentially) hours of work? Let's set that aside for a moment.

When a drive is physically added to OMV, it's "recognized". This is automatic.

1. When a drive is formatted and mounted: The drive is assigned entries in fstab and it's given entries in OMV's DB (which makes the following possible).
2. Shared Folders are now possible, which makes data available to users other than root. Shared Folder entries are logged into OMV's config DB. (which makes the following possible).
3. SMB or NSF network shares may now be created. (Which, finally, sets up a functioning NAS.)
4. All of the above configuration details interact with users that may have been added to the system and additional services such as Rsync and SMART testing, among others.

Since the above is a layered approach, it stands to reason that it should be removed in the reverse order. When a building is constructed, it's not deconstructed by removing the foundation first.
___________________________________________________________________________
As it seems, you're of the belief that users (many of which do not understand the background intricacies involved in setting a NAS up) should be able to go straight to #1, and and simply delete the drive? If that was allowed OMV's config DB could not be maintained in a consistent state.

I could go on about the ramifications of allowing something like that, from within the GUI, but there's plenty of evidence of what happens when users force a "short cut" to #1, using the command line, in this forum.

Be aware that the easy to use GUI attracts beginners that have little to no idea of what is required to setup and create usable network shares. When given a free hand in configuration, more than one user has said (in basic terms) "why did you allow me to do that"?

In the bottom line, what you're asking for is not going to happen. It would make OMV unsupportable as users stumble onto ever more creative ways to unintentionally destroy their systems.

Alles anzeigen

Thank you for the answer. Well it seems like we have a different opinion. The fact that we can only reverse the step that generated the config when it comes to remove the missing elements perhaps should be rethought. May be we need a wizard to guide the user through?

I believe there are a tons of things OMV could do to ease/simplify the resolution of these situation. Perhaps the most straight forward is to add a voice to the first-aid to do a sanity check on the config file. Things like remove unreferenced elements in the config, spot un-closed markup, first-aid on the web interface, automatic config backup daily, etc are just simple very effective approaches IMHO. These btw are things I already do manually, visually. Why not automate? I'm not here demanding or anything I'm just giving a feedback, mine is just an input to contribute to the vision of the next release. If this is not to be considered... well whatever. I took the time to write here because I truly believe things can be done different (better in my view), and some controls/automation are relatively easy. I am involved in other project and really have no time to pick up something else right now otherwise I would look into this from a practical point of view

chente · 10. Mai 2023

Zitat von rs232

- Is there a guidance on how to deal with HD failure (the new one will have different uuid of course)

What to do when a data disk breaks:

- Redirect shared folders to another drive:

Create a "newfolder" folder on another disk. If you don't have any other disks install sharerootfs and create the folder on the system disk /newfolder
Edit the shared folder, change the drive it points to, choose the folder you just created and accept changes.
Repeat this on each shared folder that was pointing to the broken drive.

- Delete monitoring and smart jobs on this disk.

- In Storage unmount the disk.

- Now format and mount the drive that will replace the broken drive, create the folders and redirect them to the new drive.

You didn't order this, it's just a gift

What to do when the system disk breaks:

If you don't have a backup. Reinstall OMV6

If you have cloned backup. Install the new cloned drive.

If you have a backup but the uuid does not match. Copy this to a script and run it as root:

Code

Inst=$(dpkg -l | awk '$2 == openmediavault-sharerootfs {print $1}')
if [ ! "$(Inst)" = "ii" ]; then
  apt-get --yes install openmediavault-sharerootfs
fi
uuid="79684322-3eac-11ea-a974-63a080abab18"
if [ "$(omv_config_get_count "//mntentref[.='${uuid}']")" = "0" ]; then
omv-confdbadm delete --uuid "${uuid}" "conf.system.filesystem.mountpoint"
fi
apt-get install --reinstall openmediavault-sharerootfs

Zitat von rs232

Can this scenario in general be taken into account and allow the user a safe recovery rather than config file editing etc?

You don't need to edit the database at all, that's a bug.

Jetzt mitmachen!