Re-build BTRFS array


  • When I finally had my home server ready, one of the four disks was F. Well, I know how to do it in ZFS, but here I'm using BTRFS. What's the command to replace a faulty disk in a RAID 1 array? I need to copy the data from the working disk to the new disk and restore the partition to normal.


    Thanks!

  • The /dev/sdd has bad sectors, it does not mean that the RAID is broken.

    Your RAID1 system is running MDADM. BTRFS is the file system.


    First you need to identify the /dev/sdd faulty disc by serial number. Don't eject the disc now. You may shutdown your NAS server and find the faulty disc (unless you already have this information) by serial number. Reboot without ejecting the faulty disc. The worst thing that could happen would be to remove the wrong disc.


    I know how to replace a faulty disc in command line, but not using OMV so I wait for someone else to answer. The disc can be replaced very easily, don't panic.

    1 x Dell R720XD PowerEdge, with Debian 13+OpenMediaVault

    2 x 1TB nvme for raid1+btrfs system

    10Gbit network (SFP+)

    11 x 10Tb SAS drives + 1 spare drive (running) in zraid3 + 3 additional spares (not running)

    2 x 800GB SSD drives (unused)

    1 x LSI SAS2308 PCI-Express Fusion-MPT SAS-2 controller card in mini-mono format

    256Go RAM

    1 x APC RT2000XL UPS with 1 x SNMP APC 9631 card
    3 x Rpi 5-CM with nvme systems

    Edited 4 times, last by C-3PO ().

  • I know the serial of the disk with problems. And, if you know the command line to replace the bad disk with the new one, please, tell me and i test here.

    At 2025 i do the replace but the at server with proxmox and the FS is ZFS, i guess the process is similar.

  • I warn you: it's better to wait for someone explain you how this can be done in OMV, as I don't know OMV very much (see my profile: beginner). You are not in a hurry, a disc can keep bad sectors during months before failing.


    I fetched this from Claude AI and verified that it was Okay and made some additions:


    To replace a faulty disk in an mdadm RAID array (adapt it to your case):

    Please make sure that you understand what you are doing and read carefully.


    0. BEFORE EVERYTHING: BACKUP

    Make sure you have a backup of your NAS.

    An RAID array is not a backup and reconstruction might fail.
    So you need a backup.


    1. IDENTIFY THE FAULTY DISK

    Run: cat /proc/mdstat

    Or: mdadm --detail /dev/md0


    Note which disk is marked as faulty (e.g., /dev/sdd).

    This is not your case as your disc only has bad sectors.

    Note the serial ID of the faulty disc.


    2. MARK THE DISK AS FAILED (if not already auto-detected)

    mdadm /dev/md0 --fail /dev/sdd


    3. REMOVE THE FAULTY DISK FROM THE ARRAY

    mdadm /dev/md0 --remove /dev/sdd


    4. PHYSICALLY REPLACE THE DISK

    Power down if hot-swap isn't supported, swap the drive, power back on.

    Note the serial ID of your new disc.


    WARNING! WARNING! WARNING!

    4.1 After powering up find your new disc
    It may not be /dev/sdd and might have change naming.

    If you don't understand what you are doing, this can lead to a catastrophe.


    5. PARTITION THE NEW DISK TO MATCH THE OTHERS

    Hereafter we consider that naming of discs has changed, so adapt to your situation

    Copy the partition table from a healthy disk (e.g., /dev/sdc) to the new one (/dev/sdb):

    sfdisk -d /dev/sdc | sfdisk /dev/sdb


    6. ADD THE NEW DISK TO THE ARRAY

    mdadm /dev/md0 --add /dev/sdb

    mdadm will automatically start rebuilding.


    7. MONITOR THE REBUILD

    watch cat /proc/mdstat


    TIPS:

    - Check SMART data on the new disk first: smartctl -a /dev/sdb

    - Don't replace more than one disk at a time on RAID5/6 beyond redundancy limits

    - Update /etc/mdadm/mdadm.conf after replacement

    - Run update-initramfs -u (Debian/Ubuntu) to keep boot config in sync

    - On RAID1 boot drives, reinstall the bootloader on the new disk

    You may run this command to install grub on the new disc:

    grub-install /dev/sdb

    update-grub

    1 x Dell R720XD PowerEdge, with Debian 13+OpenMediaVault

    2 x 1TB nvme for raid1+btrfs system

    10Gbit network (SFP+)

    11 x 10Tb SAS drives + 1 spare drive (running) in zraid3 + 3 additional spares (not running)

    2 x 800GB SSD drives (unused)

    1 x LSI SAS2308 PCI-Express Fusion-MPT SAS-2 controller card in mini-mono format

    256Go RAM

    1 x APC RT2000XL UPS with 1 x SNMP APC 9631 card
    3 x Rpi 5-CM with nvme systems

    Edited 17 times, last by C-3PO ().

  • These steps are not necessary, and some don't apply to OP's situation at all.


    A RAID member (disk that is part of an array) can be failed & removed in one single step from the OMV GUI. When that is done, just shut the server down and replace the disk with a good one.


    It isn't necessary to copy anything from the failed drive - mdadm will automatically start rebuilding the array using the new disk and rewrite any data and/or parity information.


    One question for LkS45 though: RAID 1 implies two drives as a mirror of each other, but your screenshot is showing four drives. Are you running two separate RAID 1 arrays, or just using two additional drives individually along with one RAID 1?

  • cubemin: the array has not failed (we don't know), only bad sectors. There is no hurry.


    I looked at my own server with RAID1 to see how this could be done in the OMV interface.

    First, you need the openmediavault-md plugin. Under my system it was not installed by default, so I had to install it using OMV interface or a command line:


    Code
    apt-get install openmediavault-md

    Then the procedure is not straightforward. My RAID system is synchronized and I see no way to remove a disc from the web interface when the array is synchronized (which seems to be your case, you only have bad sectors), so I let other members of the forum answer and guide you.



    You probably need to follow cubemin advice : shutdown, replace faulty disc and reboot.


    I only issue A BIG WARNING. When you reboot your NAS with a new disc, the naming of discs might change. For example /dev/sdd might become /dev/sda and so on. This is one reason why people might loose a disc when using only command lines. Use OMV interface to identify disc naming based on disc serial ID.


    Before everything, make a backup. Good luck.

    1 x Dell R720XD PowerEdge, with Debian 13+OpenMediaVault

    2 x 1TB nvme for raid1+btrfs system

    10Gbit network (SFP+)

    11 x 10Tb SAS drives + 1 spare drive (running) in zraid3 + 3 additional spares (not running)

    2 x 800GB SSD drives (unused)

    1 x LSI SAS2308 PCI-Express Fusion-MPT SAS-2 controller card in mini-mono format

    256Go RAM

    1 x APC RT2000XL UPS with 1 x SNMP APC 9631 card
    3 x Rpi 5-CM with nvme systems

    Edited 8 times, last by C-3PO ().

    • Official Post

    cubemin: the array has not failed (we don't know), only bad sectors. There is no hurry.

    If there is hurry or not is another question (Would you use the disk, if it has bad sectors which cannot be reallocated).

    But you have to fail the disk from the GUI of OMV IF you want to exchange it by another one as cubemin wrote.

  • Dear Macom, how to fail a disc from GUI?

    1 x Dell R720XD PowerEdge, with Debian 13+OpenMediaVault

    2 x 1TB nvme for raid1+btrfs system

    10Gbit network (SFP+)

    11 x 10Tb SAS drives + 1 spare drive (running) in zraid3 + 3 additional spares (not running)

    2 x 800GB SSD drives (unused)

    1 x LSI SAS2308 PCI-Express Fusion-MPT SAS-2 controller card in mini-mono format

    256Go RAM

    1 x APC RT2000XL UPS with 1 x SNMP APC 9631 card
    3 x Rpi 5-CM with nvme systems

  • Thanks. It is not very straightforward to select the RAID and click "Remove", as in command line you don't remove the RAID, you only fail a disc.

    1 x Dell R720XD PowerEdge, with Debian 13+OpenMediaVault

    2 x 1TB nvme for raid1+btrfs system

    10Gbit network (SFP+)

    11 x 10Tb SAS drives + 1 spare drive (running) in zraid3 + 3 additional spares (not running)

    2 x 800GB SSD drives (unused)

    1 x LSI SAS2308 PCI-Express Fusion-MPT SAS-2 controller card in mini-mono format

    256Go RAM

    1 x APC RT2000XL UPS with 1 x SNMP APC 9631 card
    3 x Rpi 5-CM with nvme systems

  • It is not very straightforward to select the RAID and click "Remove", as in command line you don't remove the RAID, you only fail a disc.

    Good point, but then OMV wouldn't just let you remove the entire RAID while it's in use. That's what the Delete button (the trash can symbol) is for, and you'll notice it being greyed out.

  • Got it, thanks. This is the "Prohibited" circular icon.

    1 x Dell R720XD PowerEdge, with Debian 13+OpenMediaVault

    2 x 1TB nvme for raid1+btrfs system

    10Gbit network (SFP+)

    11 x 10Tb SAS drives + 1 spare drive (running) in zraid3 + 3 additional spares (not running)

    2 x 800GB SSD drives (unused)

    1 x LSI SAS2308 PCI-Express Fusion-MPT SAS-2 controller card in mini-mono format

    256Go RAM

    1 x APC RT2000XL UPS with 1 x SNMP APC 9631 card
    3 x Rpi 5-CM with nvme systems

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!