Raid 1 verschwindet beim Plattentausch

Jahresprogramm · 1. August 2019

Code

blkid
/dev/sdb: UUID="3ca82093-31bd-aba6-e3fc-2da78924c663" UUID_SUB="931edf09-fb4e-0c                                                                                                             6c-18d0-3503a69a0410" LABEL="Zangs-NAS:Zangs" TYPE="linux_raid_member"
/dev/sdc: UUID="3ca82093-31bd-aba6-e3fc-2da78924c663" UUID_SUB="6b249649-ad3c-9a                                                                                                             85-6dda-5b49a53af0ae" LABEL="Zangs-NAS:Zangs" TYPE="linux_raid_member"
/dev/sda1: UUID="E620-721A" TYPE="vfat" PARTUUID="9d5b7707-65db-433f-9b6a-c2cbb7                                                                                                             9b5dd8"
/dev/sda2: UUID="0ce16ee2-727d-4ac2-8e28-13fbf48155ab" TYPE="ext4" PARTUUID="5cf                                                                                                             cfa16-fe17-4afb-9424-954b3708bc37"
/dev/sda3: UUID="ec420058-796e-406f-962b-66e7cae4fd39" TYPE="swap" PARTUUID="bab                                                                                                             0249f-4d80-404f-85a8-6a10f9139157"
/dev/md127: LABEL="Raid" UUID="b090d450-8a3c-4b13-845c-b1876a8d7174" TYPE="ext4"

Code

mdadm --detail --scan --verbose /dev/md127
ARRAY /dev/md127 level=raid1 num-devices=2 metadata=1.2 name=Zangs-NAS:Zangs UUID=3ca82093:31bdaba6:e3fc2da7:8924c663
   devices=/dev/sdb,/dev/sdc

geaves · 1. August 2019

One more mdadm --detail /dev/md127 then see if the raid will stop.

Jahresprogramm · 1. August 2019

Code

mdadm --detail /dev/md127


    Update Time : Thu Aug  1 20:21:02 2019
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0


           Name : Zangs-NAS:Zangs
           UUID : 3ca82093:31bdaba6:e3fc2da7:8924c663
         Events : 14910


    Number   Major   Minor   RaidDevice State
       2       8       32        0      active sync   /dev/sdc
       1       8       16        1      active sync   /dev/sdb
/dev/md127:
        Version : 1.2
  Creation Time : Sat Nov 11 17:45:46 2017
     Raid Level : raid1
     Array Size : 1953383512 (1862.89 GiB 2000.26 GB)
  Used Dev Size : 1953383512 (1862.89 GiB 2000.26 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent


  Intent Bitmap : Internal


    Update Time : Thu Aug  1 20:21:02 2019
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0


           Name : Zangs-NAS:Zangs
           UUID : 3ca82093:31bdaba6:e3fc2da7:8924c663
         Events : 14910


    Number   Major   Minor   RaidDevice State
       2       8       32        0      active sync   /dev/sdc
       1       8       16        1      active sync   /dev/sdb

Alles anzeigen

Code

mdadm --stop /dev/md127
mdadm: Cannot get exclusive access to /dev/md127:Perhaps a running process, mounted filesystem or active volume group?

geaves · 1. August 2019

Have you turned everything off? Docker etc, because something is still using the raid.

Jahresprogramm · 1. August 2019

Zitat von geaves

Have you turned everything off? Docker etc, because something is still using the raid.

Yes. I do not know what else to switch off. I even turned off SMART monitoring. Everything was turned off. Only SSH is still running.

geaves · 1. August 2019

With the raid now up and everything off have you rebooted? to try again.

If the array cannot be stopped there are two options to sort this;

1. Use the systemrescuecd, but that would mean using for doing each drive change as you have to shutdown to replace a drive then restart, then go back to systemrescuecd.

2. Not sure about this, create a clean install of OMV on a new USB flash drive, boot with that check that it's working, shut down plug the raid back in and see if the raid is initialised -> will this work I don't know + you then have to put your previous boot drive back in and hopefully it will all function with the new drives. The problem with this is if doesn't work, you have a new raid set up but your current omv setup/config doesn't work, this is unknown territory for me.

I have help a few with raid problems and usually don't have a problem, but this is like a tunnel with no light at the end.

geaves · 1. August 2019

What's the hardware of your server?

Jahresprogramm · 2. August 2019

Zitat von geaves

What's the hardware of your server?

ASRock J4105M with a Celeron J4105.

geaves · 2. August 2019

Zitat von Jahresprogramm

ASRock J4105M with a Celeron J4105.

That should not be the problem then, I still believe this related to something running not allowing the raid to be stopped.

Option 1 above;

Install systemrescuecd in OMV-Extras, this takes time as it has to be downloaded, read the information under SystemRescueCD (another option is to download System Rescue CD create a CD from the ISO) OMV-Extras option allows you to boot once only, you would have to use this option twice. I have not used this!! This allows you to access OMV via the cli without anything running, I believe that's the idea.

SystemRescueCD

Stop the raid mdadm --stop /dev/md127 if this works it will return a confirmation
Remove a drive from the array /dev/sd[bc] -> mdadm --fail /dev/md127 /dev/sdc cat /proc/mdstat should show (F) next to sdc, mdadm --remove /dev/md127 /dev/sdc cat /proc/mdstat should confirm the removal of the drive.

This is the point you would shut down remove sdc and install a new drive, then reboot, if using the OMV-Extras route you can check the new drives reference under Storage Disks -> sd? (? being a new letter) make a note, boot into systemrescuecd from OMV-Extras.
Would you need to shutdown? according to your m'board the ports are Hot Swap, you could remove sdc from the machine and run fdisk -l to see if /dev/sdc is listed, if not hot swap works! Plug in the new drive and run fdisk -l again to find the new drives reference, the drive size here will be the giveaway.

Whatever the outcome of the above a new drive is installed;

wipefs /dev/sd? replace ? with the drive reference letter, this will wipe the drive.
mdadm --add /dev/md127 /dev/sd? replace ? with the drive reference letter
cat /proc/mdstat this will display the raid status it will either be syncing or it has failed, if it fails it means creating the exact partition on the new drive, today this should not be necessary.
If it syncs you will need to wait for it to finish, then run cat /proc/mdstat to confirm then mdadm --detail /dev/md127 to confirm the riads' configuration.

IF by some miracle the above works it will have to be done again to replace the second hard drive, you will then need to grow the array and resize the filesystem. Will this work? I have no idea, I am at a loss as to why the current raid will not stop, something is accessing it. TBH the time spent on this, a new raid setup could have been completed and your data restored, whilst that might be a PIA it's straightforward.

For 2 drives a better option is one for data and one running rsync, and have a spare should a drive fail.

Jahresprogramm · 2. August 2019

Zitat von geaves

SystemRescueCD

Stop the raid mdadm --stop /dev/md127 if this works it will return a confirmation....

Yeah, you're the best!

Code

root@sysresccd /root % mdadm --stop /dev/md127
mdadm: stopped /dev/md127

I now connect mouse, keyboard and monitor so I can get the bootloader.
That should start from the SystemRescueCD not a problem.

Jahresprogramm · 2. August 2019

OK, part one is done!I have of the SystemRescueCD booted.

Code

fdisk -l
Disk /dev/loop0: 469.9 MiB, 492683264 bytes, 962272 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes




Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x022cba24


Device     Boot Start        End    Sectors Size Id Type
/dev/sda1           1 4294967295 4294967295   2T ee GPT


Partition 1 does not start on physical sector boundary.




Disk /dev/sdb: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0xba980230


Device     Boot Start        End    Sectors Size Id Type
/dev/sdb1           1 4294967295 4294967295   2T ee GPT


Partition 1 does not start on physical sector boundary.




Disk /dev/sdc: 111.8 GiB, 120040980480 bytes, 234455040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 33553920 bytes
Disklabel type: gpt
Disk identifier: 3792DC2C-864A-41F8-8642-731B956CF465


Device         Start       End   Sectors   Size Type
/dev/sdc1      65535   1048559    983025   480M EFI System
/dev/sdc2    1048560 217969409 216920850 103.4G Linux filesystem
/dev/sdc3  217969410 234418694  16449285   7.9G Linux swap




Disk /dev/md127: 1.8 TiB, 2000264716288 bytes, 3906767024 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Alles anzeigen

RAID is alive

Code

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 sdb[2] sda[1]
      1953383512 blocks super 1.2 [2/2] [UU]
      bitmap: 0/15 pages [0KB], 65536KB chunk


unused devices: <none>

find serial number first

Code

/sbin/udevadm info --query=property --name=sda | grep ID_SERIAL
ID_SERIAL=ST2000VN001-1T4164_W520BPCG
ID_SERIAL_SHORT=W520BPCG

Jahresprogramm · 2. August 2019

remove sda from RAID

Code

mdadm --fail /dev/md127 /dev/sda
mdadm: set /dev/sda faulty in /dev/md127

Code

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 sdb[2] sda[1](F)
      1953383512 blocks super 1.2 [2/1] [U_]
      bitmap: 0/15 pages [0KB], 65536KB chunk


unused devices: <none>

Code

mdadm --remove /dev/md127 /dev/sda
mdadm: hot removed /dev/sda from /dev/md127

Code

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 sdb[2]
      1953383512 blocks super 1.2 [2/1] [U_]
      bitmap: 0/15 pages [0KB], 65536KB chunk


unused devices: <none>

now the disks are exchanged

geaves · 2. August 2019

Spoiler anzeigen

Hm, bit of smart work there, I would have had a sticker on each drive edge, so I knew which was which.

Jahresprogramm · 2. August 2019

ok, hotswap works. I do not have to reboot. the new plate is recognized immediately and runs as sda

Code

mdadm --add /dev/md127 /dev/sda
mdadm: added /dev/sda

Code

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 sda[3] sdb[2]
      1953383512 blocks super 1.2 [2/1] [U_]
      [>....................]  recovery =  0.3% (6768064/1953383512) finish=204.4min speed=158703K/sec
      bitmap: 0/15 pages [0KB], 65536KB chunk


unused devices: <none>

Now the sync must be completed. After that, I had recently rebooted and the raid was gone. Now I will change the second disk without reboot, right?

Jahresprogramm · 2. August 2019

Zitat von geaves

... I would have had a sticker on each drive edge, so I knew which was which.

in the try and reboot, the discs were sorted again and again. I was afraid that the sticker would not help

Jahresprogramm · 2. August 2019

sorry, had forgotten something. Current status:

Code

mdadm --detail /dev/md127
/dev/md127:
        Version : 1.2
  Creation Time : Sat Nov 11 16:45:46 2017
     Raid Level : raid1
     Array Size : 1953383512 (1862.89 GiB 2000.26 GB)
  Used Dev Size : 1953383512 (1862.89 GiB 2000.26 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent


  Intent Bitmap : Internal


    Update Time : Fri Aug  2 16:22:38 2019
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1


 Rebuild Status : 5% complete


           Name : Zangs-NAS:Zangs
           UUID : 3ca82093:31bdaba6:e3fc2da7:8924c663
         Events : 15055


    Number   Major   Minor   RaidDevice State
       2       8       16        0      active sync   /dev/sdb
       3       8        0        1      spare rebuilding   /dev/sda

Alles anzeigen

geaves · 2. August 2019

Zitat von Jahresprogramm

After that, I had recently rebooted and the raid was gone.

Are you referring to previous attempts?

Zitat von Jahresprogramm

ok, hotswap works. I do not have to reboot. the new plate is recognized immediately and runs as sda

Ok, that's good, did you wipe it before adding it?

Zitat von Jahresprogramm

Now I will change the second disk without reboot, right?

If the first drive change has worked, yes, but ensure you keep the two old drives just in case, because they are current raid drives.

The thing to remember is double check everything you on the command line before executing.

Jahresprogramm · 2. August 2019

Zitat von geaves

Are you referring to previous attempts?

yes

Zitat von geaves

Ok, that's good, did you wipe it before adding it?

yes

Zitat von geaves

The thing to remember is double check everything you on the command line before executing.

Of course. Sorry, if I express myself misleading. Google translate helps me

It will be tomorrow until the sync is done. I will record the status and continue with the exchange of the second disk.

geaves · 2. August 2019

Zitat von Jahresprogramm

Of course. Sorry, if I express myself misleading. Google translate helps me

how do think I'm managing

Well if this works, then the growing the raid and resizing the file system I hope can be done in the GUI.

Jahresprogramm · 3. August 2019

I'm afraid I have a problem again. The pretitioning of the disks is faulty.

Current status is: I have the second disk after instructions in RAID installed. sda was the first swapped disk, sdb was the second one. However fdisk -l reports error.

Code

cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active raid1 sdb[2] sda[3]
      1953383512 blocks super 1.2 [2/2] [UU]
      bitmap: 0/15 pages [0KB], 65536KB chunk


unused devices: <none>

Code

fdisk -l
Disk /dev/loop0: 469.9 MiB, 492683264 bytes, 962272 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes




Disk /dev/sdc: 111.8 GiB, 120040980480 bytes, 234455040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 33553920 bytes
Disklabel type: gpt
Disk identifier: 3792DC2C-864A-41F8-8642-731B956CF465


Device         Start       End   Sectors   Size Type
/dev/sdc1      65535   1048559    983025   480M EFI System
/dev/sdc2    1048560 217969409 216920850 103.4G Linux filesystem
/dev/sdc3  217969410 234418694  16449285   7.9G Linux swap




Disk /dev/md127: 1.8 TiB, 2000264716288 bytes, 3906767024 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes




Disk /dev/sda: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes




The primary GPT table is corrupt, but the backup appears OK, so that will be use                                                                                                             d.
Disk /dev/sdb: 3.7 TiB, 4000787030016 bytes, 7814037168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 7CC6F7FB-3B31-4190-B805-6EB0E0303A09

Alles anzeigen

I suspect we first need to partition the disks right away with a clean MBR?
I'm worried that the RAID will be inactive again when I reboot ...

Jetzt mitmachen!