Lost all drives

geaves · 26. Januar 2020

Zitat von LarryM04

As far as why I was in /mnt... I had to be somewhere.

No you don't, you login or ssh in as root and run the command, there is no need to change to any directory.

Have you the output of what I asked for please.

LarryM04 · 26. Januar 2020

root@helios4:~# cat /proc/mdstat
Personalities : [raid10]
md0 : active (auto-read-only) raid10 sdc[6] sdd[5] sda[7] sdb[4]
15627790336 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
bitmap: 0/117 pages [0KB], 65536KB chunk

unused devices: <none>

geaves · 26. Januar 2020

The raid is in auto-read-only that's why you can't see a file system;

mdadm --stop /dev/md0

mdadm --readwrite /dev/md0

when that's completed;

omv-mkconf fstab

post the output of; when it's all completed

cat /proc/mdstat

cat /etc/fstab

LarryM04 · 26. Januar 2020

Is this message expected?

oot@helios4:~# mdadm --stop /dev/md0
mdadm: stopped /dev/md0
root@helios4:~# mdadm --readwrite /dev/md0
mdadm: /dev/md0 does not appear to be active.

geaves · 26. Januar 2020

Zitat von LarryM04

mdadm: /dev/md0 does not appear to be active.

Interesting it usually complains that it's running, ok as it's stopped mdadm --run /dev/md0 then run --readwrite

LarryM04 · 26. Januar 2020

root@helios4:~# mdadm --run /dev/md0
root@helios4:~# mdadm --readwrite /dev/md0
mdadm: /dev/md0 does not appear to be active.

geaves · 26. Januar 2020

This is odd, what's the output of cat /proc/mdstat

LarryM04 · 26. Januar 2020

root@helios4:~# cat /proc/mdstat
Personalities : [raid10]
unused devices: <none>

geaves · 26. Januar 2020

Ok that's why, reboot the machine and check cat /proc/mdstat if it shows the same as your post 20 active (auto-read-only) then run the --readwrite command. If it's the same as the above it will need to be assembled again

LarryM04 · 26. Januar 2020

root@helios4:~# cat /proc/mdstat
Personalities : [raid10]
md0 : active raid10 sdc[6] sdd[5] sda[7]
15627790336 blocks super 1.2 512K chunks 2 near-copies [4/3] [U_UU]
bitmap: 10/117 pages [40KB], 65536KB chunk

unused devices: <none>

THE FILES ARE BACK!!!

Thank you, thank you, thank you!

I did notice that as the system came back up it ran an fsck on the raid and corrected errors.

Now that its back can you give me a synopsis of what happened? Note: I was a UNIX admin *years and years* ago, but got away from it, and UNIX/LINUX for a while so I'd really like to understand better. Again, thank you.

LarryM04 · 26. Januar 2020

Oh, I do see that the B drive dropped out. Is there something I can do not as drastic as "Wipe" and re-add?

geaves · 26. Januar 2020

Zitat von LarryM04

I did notice that as the system came back up it ran an fsck on the raid and corrected errors.

Sometimes a reboot just resolves what appears to be the obvious.

Zitat von LarryM04

Now that its back can you give me a synopsis of what happened?

The power outage caused the problem and probably poorly seated connections, but because mdadm is software if something goes wrong it doesn't know what it is, unlike a hardware raid. So you have to execute a few commands to find the cause the main one here being cat /proc/mdstat, that gives you the state of the array, blkid tells you what drives are part of it, from there you can make a start in putting it back together.
The most common problem for raid errors are power loss, and a user assuming that mdadm is 'hot swap', which it isn't, a swapped out drive causes the array to go inactive because it wasn't removed from mdadm.
In the UK power cuts are rare but I know from US users a UPS is a recommended necessity.

But at least it's running again

LarryM04 · 26. Januar 2020

The sad part is that the system IS on a UPS. Apparently I have it set wrong as it shut down within seconds of the power blip.

Anyway to get that B drive back in without wiping it first?

geaves · 26. Januar 2020

Zitat von LarryM04

Anyway to get that B drive back in without wiping it first?

Ok I'm at a loss with that, is that a USB drive?

LarryM04 · 26. Januar 2020

No, same sata 8 gig drive as the others. It dropped out after the reboot

md0 : active raid10 sdc[6] sdd[5] sda[7] <===<<< no sdb

geaves · 26. Januar 2020

Zitat von LarryM04

No, same sata 8 gig drive as the others. It dropped out after the reboot

Ok, is it Storage -> Disks

blkid

mdadm --detail /dev/md0

LarryM04 · 26. Januar 2020

root@helios4:~# blkid
/dev/mmcblk0p1: UUID="1f489a8c-b3a3-4218-b92b-9f1999841c52" TYPE="ext4" PARTUUID="7fb57f23-01"
/dev/sda: UUID="d1e18bf2-0b0e-760b-84be-c773f4dbf945" UUID_SUB="9495186e-6df6-a7b1-c67b-4fd4ca1d6468" LABEL="helios4:Store" TYPE="linux_raid_member"
/dev/sdb: UUID="d1e18bf2-0b0e-760b-84be-c773f4dbf945" UUID_SUB="253f9091-6914-fe71-ab40-68961aa3dbb6" LABEL="helios4:Store" TYPE="linux_raid_member"
/dev/sdc: UUID="d1e18bf2-0b0e-760b-84be-c773f4dbf945" UUID_SUB="3186ee11-0837-b283-c653-37e39d1923d8" LABEL="helios4:Store" TYPE="linux_raid_member"
/dev/md0: UUID="GmgEll-khiX-a7DB-5HNZ-KGRm-5vGq-1vPV4w" TYPE="LVM2_member"
/dev/sdd: UUID="d1e18bf2-0b0e-760b-84be-c773f4dbf945" UUID_SUB="0da721df-e67c-8141-cc93-afe7e2e66f7a" LABEL="helios4:Store" TYPE="linux_raid_member"
/dev/mapper/Store-Store: LABEL="Store" UUID="6c7b4b44-4cae-4169-95fe-d9a14d04e814" TYPE="ext4"
/dev/zram0: UUID="3537eff4-7cb1-46ad-8814-b3d735002195" TYPE="swap"
/dev/zram1: UUID="a00f6e11-8359-4182-beae-058c4ccb0375" TYPE="swap"
/dev/mmcblk0: PTUUID="7fb57f23" PTTYPE="dos"
/dev/mmcblk0p2: PARTUUID="7fb57f23-02"

root@helios4:~# mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Sun Feb 18 14:53:39 2018
Raid Level : raid10
Array Size : 15627790336 (14903.82 GiB 16002.86 GB)
Used Dev Size : 7813895168 (7451.91 GiB 8001.43 GB)
Raid Devices : 4
Total Devices : 3
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Sun Jan 26 10:07:54 2020
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : near=2
Chunk Size : 512K

Name : helios4:Store (local to host helios4)
UUID : d1e18bf2:0b0e760b:84bec773:f4dbf945
Events : 116671

Number Major Minor RaidDevice State
6 8 32 0 active sync set-A /dev/sdc
2 0 0 2 removed
7 8 0 2 active sync set-A /dev/sda
5 8 48 3 active sync set-B /dev/sdd

geaves · 26. Januar 2020

Zitat von LarryM04

2 0 0 2 removed

In the GUI select the raid and click on recover, a dialog comes up, if /dev/sdb is shown select it (check box) click OK, if not come back.

LarryM04 · 26. Januar 2020

No devices. I've had this happen before and from the gui I can only get the missing drive back in if I wipe it first.

geaves · 26. Januar 2020

Zitat von LarryM04

I've had this happen before and from the gui I can only get the missing drive back in if I wipe it first.

That's the correct way to do it, but we could try this first,

mdadm --stop /dev/md0

mdadm --add /dev/sdb /dev/md0

Jetzt mitmachen!