BTRFS RAID1 3 Disks: Error Super Block at xxxxxxx devid 1,2,3 has bad generation 3088 expect 3089 / How to fix???

  • Hi folks


    After i have to hard power of my server because the system was not shutting down, i made a file system check and scrub.

    File system is a BTRFS RAID1 with three SSDs. I got 3 identical superblock errors, one for each drive!


    Journalctl:





    Does anyone is knowing a way to fix this problem! Filessystem is mounted! SMB, NFS, MiniDLNA, SSH services are running!


    Thank´s and best regards

    prtigger

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

    Edited once, last by prtigger1 ().

  • prtigger1

    Added the Label resolved
  • Seems to get fixed by itself....

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

  • crashtest

    Hi

    Thank you for your information!

    Do you have any advice for the

    right steps to do, if such a problem is

    not fixed by reboot?


    This was my first problem with the

    btrfs filesystem for a long time.

    My knowledge is limited with btrfs...


    Best regards

    prtigger

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

    • Official Post

    I'm running a BTRFS mirror as an extended test, but have not experienced your issue. That's partly because I got burned several years ago thinking BTRFS was truly "Copy on Write" meaning it should good for a mobile file system, and be able to gracefully handle a loss of power. Surprise! While that was several BTRFS versions past, I lost two basic volumes. Now, with that lesson learned, I have BTRFS in RAID1 on an UPS. You might want to think about doing the same.
    _______________________________________________________________________

    As it seems from what I could gather regarding your particular super-block error (1 count per drive):

    The first thing to do is reboot. That might fix the issue automatically.
    If it doesn't.


    - Backup any critical data first (Always wise)


    - Unmount the filesystem if you can ( umount /srv/dev-disk-by-uuid-a8a06053-0cc4-491d-adf2-01bb8a02fb44).


    - Run the recovery command on one drive ( /dev/sdc for example. The change will propagate to the RAID1 mirrors)
    btrfs rescue super-recover -y /dev/sdc


    - Remount the filesystem and re-run the scrub.
    btrfs scrub start -B -d /srv/dev-disk-by-uuid-a8a06053-0cc4-491d-adf2-01bb8a02fb44

    Verify the errors are gone.

    ____________________________________________________________________


    Avoid btrfs check --repair unless the above doesn't work. It's far more aggressive and there's a real risk of data loss.

  • crashtest


    Hi crashtest

    Thank you very much for your detailed informations!


    I´ve got a backup, but for the future i need a bigger external USB HDD...


    Unmounting the filesystem is always tough, because i've to stop all services and remove all shares to the filesystem.

    After that, it´s possible to stop and unmount the filesystem with OMV GUI.


    An UPS is a good investment! But in my case, happend yesterday, it would not help!!


    Because of this reported issue coming up randomly:


    Shutdown hang by failed unmounting srv volume (btrfs raid1, 3 disks) with bpo kernel 6.12.12


    I didn´t found a solution for the problem at this time... It may happen, when i initiate a shutdown command

    at the server console and a scrub is started right in the moment the server is starting a scrub job...


    Yesterday, i disabled the automated scrub job with the OMV variable....


    By the way, my server is only running, when i need it for backup my data from my clients. It´s just a data grave.

    Or for doing OMV updates... So, most of the time the server is powered off! That´s why i´m doing

    a lot of shutdowns... ;)


    Do you think there is a better filesystem for my simple, old fileserver system?


    Best regards and thank you again!

    prtigger

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

    Edited 4 times, last by prtigger1 ().

    • Official Post

    Do you think there is a better filesystem for my simple, old fileserver system?

    While it is getting better, I'm not a fan of BTRFS. I have one BTRFS RAID1 array, using an SBC (an Odriod HC4), in trying to make a half hearted attempt at following high level, user significant, BTRFS developments. (To "keep my toes in the water", so to speak.)

    To be fair, while I'm not closely following BTRFS errata, I don't see it's benefits (data scrubs, snapshots, etc.) being equal to it's power related issues and other odd events. Significant issues, triggered by power events, seem to a continuing theme over several years. "That" is my issue with BTRFS - a true "copy on write" filesystem should be completely impervious to power related events.


    Search this forum or the Internet in general. Inexplicable things happen with BTRFS that, on occasion, have no solution with the CLI utilities provided. Worst yet, if users chose the wrong rescue commands, where guidance is spotty at best, they can make matters far worse.


    If you want a data grave and, in your case, an off-line backup that's using USB connected drives; it makes sense to use a filesystem that's stone reliable that can shrug off a dirty shut down. The reasoning being, if you're forced to go back to an off-line backup to restore, would you want to be dealing with this super block situation or something similar, again?


    _______________________________________________________________________________________________


    Setting aside data integrity scrubs, to me, RAID 1 seems like a waste of a second disk. (Adding to that, RAID of any type, on USB connected disks, is risky.)

    Instead of using BRTFS - RAID1, on a SBC, if your setup was my only backup and using the hardware you currently have:

    I'd use EXT4, on a single volume, with the same backup interval you currently have scheduled. Using the second disk, I'd setup an -> Rsync disk mirror on a much longer interval, say 2 to 4 times longer than your normal backup interval. (As an example: If you backup once a week, the rsync mirror operation could be done once a month.) That would give you 2 full and completely independent backups, with one being recent and the second being an older full data archive. This would give you "2 each" backups of last resort. If malware is involved or if you'd need to retrieve an inadvertently deleted file, the oldest of the two backups might have something usable to retrieve.


    These are is my thoughts. Others may have better suggestions.

  • crashtest


    Hi crashtest

    Thank´s for your detailed advices!

    I guess, you don´t like md raid usage with ext4, too?


    A friend of mine is using XigmaNAS with the same server and drive hardware.

    He is swearing on ZFS Raid...

    I really have absolut no idea with ZFS.

    So, i don´t know if ZFS Raid is a better option with Debian based system like OMV.


    By the way, all my 3 SSDs have got a seperately SATA Port on the Supermicro server board.

    The backup USB HDD is connected to a client, because client USB ports are faster than the server USB ports.


    Best regards

    prtigger

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

    • Official Post

    I guess, you don´t like md raid usage with ext4, too?


    No. If I were to setup RAID, I'd want more than being able to replace a disk. When the extras are considered, SnapRAID is superior to mdadm or hardware RAID. With SnapRAID you get data integrity and file recovery along with disk replacements. Other than continuing operations with a failed disk and whole disk replacement, traditional RAID doesn't have any extra features and it can silently corrupt data.
    Lastly, remember that when operating with an SBC and with drives connected with USB, any kind of RAID (other than SnapRAID) is a bad idea.

    A friend of mine is using XigmaNAS with the same server and drive hardware.

    He is swearing on ZFS Raid...

    I really have absolut no idea with ZFS.

    So, i don´t know if ZFS Raid is a better option with Debian based system like OMV.

    I use ZFS mirrors (a rough equivalent to RAID1) and RAIDZ1 (a rough equivalent of RAID5). I would wholeheartedly endorse ZFS. My arrays have had power cut a number of times with zero issues. While they're done on the CLI, ZFS disk replacements are relatively easy and completely lacking what I call "computing drama" which seems to haunt BTRFS disk replacements.

    However, I only use ZFS with native SATA or SAS ports which deliver roughly equal bandwidth to each port. This is the issue with RAID over USB. USB is, inherently "serial" (one at a time) which can't provide roughly equal bandwidth to each array member drive at any given moment in time. This can, and does, cause problems.

    If you're interested in ZFS for your supermicro setup, take a look at -> this document. The doc will walk you through setting up a working ZFS array to get you started. While there is more to learn about ZFS, that can be done over time. For your operational top level server, ZFS is an excellent choice.

    _____________________________________________________

    Assuming you have one backup device and it's your SBC with USB connected drives:
    With 2 drives, I'd go with EXT4 and an rsync mirror. If you have 3 drives, I'd go with SnapRAID & mergerFS. That would provide RAID like drive aggregation along with data integrate features and the ability to restore files (as they were during the last SnapRAID "sync" operation). There is, however a learning curve. Doc's -> SnapRAID & mergerFS.

    Setting ZFS aside; all of the above, SnapRAID, mergerFS and rsync will work fine over USB.

  • prtigger1 I have a different view on your original post. An ungraceful shutdown (e.g. hard reset, power surge/failure. etc.) has the potential to corrupt any filesystem. As you're OS drive is most certainly formatted as EXT4, it would typically have to refer to its journal to recover the EXT4 filesystem when the fsck takes place on re-boot. But you wouldn't think of replacing the EXT4 filesystem because of this, on the contrary it's what you'd expect from an EXT4 journaling filesystem.


    Yet after happily using BTRFS for sometime, a few messages in the systemd journal have raised doubts in your mind about its continued use. Why do you have doubts when the BTFS raid1 filesystem mounted normally on reboot your system and showed no signs of error?


    If you think the enforced shutdown interrupted a BTRFS scrub, then you could have checked that on re-boot using btrfs scrub status /srv/.... the output would indicated whether is as interrupted or not and you could have followed that with a btrfs scrub resume if necessary. In any case, a btrfs scrub only performs writes when it encounters errors it can repair. You should really also be checking the output of btrfs dev stats /srv/... on a regular (daily) basis alongside using btrfs scrub to ensure a healthy filesystem.


    It's interrupted writes on a system crash that can cause problems of lost data and inconsistent filesystems. So what is BTRFS supposed to do in the case? Here's what the docs say about copy on write:


    Quote

    "copy-on-write, COW


    Also known as COW. The method that btrfs uses for modifying data. Instead of directly overwriting data in place, btrfs takes a copy of the data, alters it, and then writes the modified data back to a different (unused) location on the disk. It then updates the metadata to reflect the new location of the data. In order to update the metadata, the affected metadata blocks are also treated in the same way. In COW filesystems, files tend to fragment as they are modified. Copy-on-write is also used in the implementation of snapshots and reflink copies. A copy-on-write filesystem is, in theory, always consistent, provided the underlying hardware supports barriers." https://btrfs.readthedocs.io/en/latest/Glossary.html


    The last sentence is important - barriers in the case refers to "write barriers" otherwise known as FUA, flush to permanent storage. In other words, BTRFS guarantees that unless you have a disk with broken firmware or a something like a USB device that lies about FUA, the filesystem should always be consistent on disk as writes are atomic, never partially written. So you should never need a manual fsck after restarting from a hard power rest. BTRFS updates the superblocks last, about every 30 seconds by default, so in case anything goes wrong, it will simply use the latest (highest generation/transid) which has a valid checksum, etc.


    To my mind, your output shows BTRFS working as designed and reverting to the last good generation. What you should be thinking about & checking is what caused your system to require a hard reset in the first place. You may want to go back through your systemd journal to look for any other past btrfs error messages and also give thought of how best to monitor a btrfs filesystem on a machine that's only turned on occasionally.


    I woudln't be in a rush to change to another filesystem on the basis of this one event.

  • Hi guys!

    Thank you very much vor your great detailed comments.


    crashtest

    I´m not using any USB disk on this server. The Supermicro server is an old 2011 server, certified for Microsoft Windows Server 2008.

    Installed with 2GB and and the 32 bit version, two 512GB HDDs as mirror, in an professional business enviroment.

    Me and my friend rescued six servers from 'going to recycling yard', put in 4GB and three Data SSDs and a SataDOM SSD for system boot.

    That´s the 'new' OMV hardware we are talking about...


    Krisbee

    The main problem is, that i can´t reproduce the shutdown hang... Deaktivation of flashplugin wasn´t the solution. I still have no flashplugin

    installed.

    The 6.1.xx kernel is running stable, not tainted. With the 6.12.xx kernel i´ve got a tainted (512) kernel

    (cpuid-deps.c:117) problem...

    All newer (Proxmox) kernels have got the same issue with the Intel Atom D510 CPU... No idea, if this is a problem...


    Most important is, that the system shutdown or reboot, doesn´t produce a system hang... In the shutdown case, i have no access to the system

    anymore.... I have only one disk LED for all disks and it´s steady on! Only thing i can do is press and hold powerbutton after i wait a longer time!


    There ist a problem but i can´t say what it is... I´m trying 'Disabled btrfs scrub' now...


    By the way: I´m not a Linux specialist...


    Best regards

    prtigger

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

    Edited 3 times, last by prtigger1 ().

  • In addition the console picture oft the shutdown hang....

    This is the Btrfs Raid1 with the three Samsung SSDs.


    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

    Edited 2 times, last by prtigger1 ().

    • Official Post

    I´m not using any USB disk on this server. The Supermicro server is an old 2011 server, certified for Microsoft Windows Server 2008.

    Installed with 2GB and and the 32 bit version, two 512GB HDDs as mirror, in an professional business enviroment.

    RE USB and SBC's - I must have confused your post with another.

    In your case, with a Supermicro mobo (and setting aside a potential hardware issue), you can use any version of RAID or other storage technique that you like. Note that if you have a 32-bit CPU / mobo - 32-bit is becoming increasingly restricted on the OS side. (Debian 13 - Trixie, just out, no longer supports 32-bit.)

    As an info note:
    Recently I had an older Intel server motherboard that died for no apparent reason. It was being used in a cold storage role. As motherboards age, particularly if they have older electrolytic capacitors (they look like little cans), they can fail partially, intermittently or completely, as caps dry out. In my instance, while the OS would boot with numerous "failed" messages, ethernet addresses refused to configure and memory tests failed on each of 8 different sticks (highly unlikely).

  • crashtest

    Hi Crashtest

    Thanks, the Supermicro system is a 64 bit system. It was running stable with OMV since January 2024 starting with OMV 6 to OMV 7. Most of the time it was running with an md-raid 5 and a Btrfs single on top.


    Because of the limited performance, i changed to the Btrfs Raid1, loosing some capacity on data storage...


    All this time, i had no indication of any hardware issue with this system...

    I added a new original Supermicro power supply because of strange noise, that's all..

    The Supermicro mainboard is a 24/7 high quality hardware...


    I'm familiar with the electronic stuff, because that's what i learned many, many years ago.. ;)

    You are right! With this old components, you have to keep an eye on everything...


    Again, at this point no indication for hardware issues!


    Best regards

    prtigger

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

    Edited once, last by prtigger1 ().

  • In addition the console picture oft the shutdown hang....

    This is the Btrfs Raid1 with the three Samsung SSDs.


    When was the last time you checked the SMART status of the drives in the BRTFS RAID? The failure to umount the BTRFS filesystem may indicate a failing drive. You have no choice but to force a reboot in order to check the output of "smartctl -A /dev/sdX" substituting the letter X for each of SSD device letters. To quote:


    Quote

    There are five reliable attributes to read for fail-status. 5, 187, 188, 197, and 198. These are all raw values that aren't normalized or abstracted by the vendor. While it's possible to have a healthy drive with some low numbers in one of these categories, a high number or multiple attributes being > 0 indicates imminent failure.

    One thing you also do is when the system re-boots is to edit the boot command via the GRUB men screen by pressing "e" to edit. Then go the end of the line as shown in this example and backspace over the word "quiet". You'll see all the boot messages after doing this, and more importantly when you try to shutdown all the messages up to the point when the system hangs.


  • P.S. The tainted kernel message is not NB.

    The 6.1.xx kernel is running stable, not tainted. With the 6.12.xx kernel i´ve got a tainted (512) kernel (cpuid.c) problem...

    All newer (Proxmox) kernels have got the same issue with the Intel Atom D510 CPU... No idea, if this is a problem...



    512 means it just a warning message which relates to the cpuid command.


    I can't say whether your problem has anything to do with your use of the flashplugin, as I've never used it, but there has been some discussion on the forum recently about this and a possible replacement due to issued with systemd services. But first, I'd check the SSD drives smart data.

  • Krisbee

    All drive hardware (system/data) was new in December 2023. There is no heavy traffic on the system... Just a storage

    with around 700GB used of nearly 6TB total. SSD temperatures are ok, too... No indication for disk fails coming up...


    (I have to correct the kernel tainted warning: cpuid-deps.c:117)


    Best regards

    prtigger

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

    Edited once, last by prtigger1 ().

  • Krisbee


    Here are all drives:


    Boot:


    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

  • Data1:

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

  • Data2:

    Old (2011) Supermicro 1U Rackserver X7SPA-HF, Intel Atom D510, 4GB RAM (maximum) | System: Supermicro Sata DOM 64GB SSD | Data: 3*Samsung 870 QVO 4TB Sata (Btrfs Raid1) | OMV 8.x services: SSH, SMB, NFS, MiniDLNA | 6.17 bpo kernel

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!