Beiträge von penguinpages

penguinpages · 6. Dezember 2021

Thanks for help. Great to see such a responsive / active forum ...

So far... this OSS project seems to be very good.

As a side note "...Raid is not a backup! Would you go skydiving without a parachute?..." Sure.. Once

But to that point: My concern is more about Rancidware then disk failure. Will this project support snapshot logic.

Would be cool to see option when RAID built to: Option in File Systems next step to set %80 for data and leave 15% for snapshots and 5% for "oh crap I ran myself out of space and need breathing room"

penguinpages · 6. Dezember 2021

Can you list outputs from configuration

Do you have mission critical services running such that restart of services and raid an issue? (aka what suggestions that impact data can be made)

# root cause a bit before restart

lsblk

mdadm -D /dev/md0

cat /proc/mdstat

# restart if this is option Ex: 4 drives in R5

mdadm --stop /dev/md0

mdadm --assemble --run --force --update=resync /dev/md0 /dev/sdb /dev/sdc /dev/sdd /dev/sde

# Post above output of "tail -f /var/log/syslog" while doing above in seperate shell

tail -f /var/log/syslog

Lets see if something shows its head as issue

PS: What are disk drive size... do you KNOW they are good drives

penguinpages · 6. Dezember 2021

So my path to corrective action then was / is as expected..

Questions:

1) Is there plans in UI to add "start / stop" services option: Ex: start / stop mdadm to see details / errors start / stop <services: smb, nfs, ftp etc.." once volume is back online

2) Can the array Utility "upgrade capacity" -> I started testing this via adding new drive as 150GB vs the other three 100GB.. to see if it would expand /dev/md0 and then fs etc..

penguinpages · 6. Dezember 2021

I am waiting for my hardware appliance parts to show.. So I built OpenMediaVault as VM on my VMWare cluster.

1CPU

1GB RAM

2 x 1Gb NIC

1 x 64GB SSD for OS

3 x 100GB SSD for RAID 5

I set and .. besides expected querks about SMB / NFS interactions on permissions (still working on those .. but kind of expected these as ACL to POSIX is a PITA)

My goal though was to test out RAID recovery steps and how clean / what tools they needed to do that.

First Test: Add a new hard disk and expand

Super easy.. 4.5 out of 5 on how simple this was. Only note is that it took me a few min to find details of % rebuild such that the new array size would be ready

Second Test: Replace a Drive

While system running... pull a hard disk and then recover.

At first this did ok.. Remove disk /dev/sdc Array still worked.. just noted critical state. But then I added back a new 150GB disk and it showed up as /dev/sdc (expected it as /dev/sde but ... meh..)

Rebuild -> Wizard was fine.. seemed to be chugging along ok.

Third Test: Hard Power during rebuild.

This is the last test I typically do where the array is in rebuild mode... and you just pull power on server. It then has to handle OS boot recovery as well as decide what to do with Array Ex: /dev/md0 when it is still critical but rebuild started, yet not complete.

Boot state was file system offline

But more concerning is it is not even listing the RAID device

This

root@pandora:~# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

md0 : inactive sde[3] sdd[2] sdb[0] sdc[4]

471599104 blocks super 1.2

unused devices: <none>

is where things went off rails.

1) OS Boot... fsck repaired file system on /dev/sda boot drive partitions - Good

2) System booted up and UI launched etc... - Good

3) All disk showing up - Good

4) /dev/md0 repair continue but while this is happening... services come online - Fail

Trying to root cause as well as document how to recover the system. Maybe this is already posted as a "how to repair a RAID state" in a document flow or video... but I did not see it.

## dmeg state <snip> to just drive messages

[ 1.660977] ata1.00: ATAPI: VMware Virtual IDE CDROM Drive, 00000001, max UDMA/33

[ 1.666086] scsi 1:0:0:0: CD-ROM NECVMWar VMware IDE CDR00 1.00 PQ: 0 ANSI: 5

[ 1.675319] sd 0:0:0:0: [sda] 134217728 512-byte logical blocks: (68.7 GB/64.0 GiB)

[ 1.675333] sd 0:0:0:0: [sda] Write Protect is off

[ 1.675335] sd 0:0:0:0: [sda] Mode Sense: 61 00 00 00

[ 1.675349] sd 0:0:0:0: [sda] Cache data unavailable

[ 1.675351] sd 0:0:0:0: [sda] Assuming drive cache: write through

[ 1.675614] sd 0:0:1:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB)

[ 1.675627] sd 0:0:1:0: [sdb] Write Protect is off

[ 1.675629] sd 0:0:1:0: [sdb] Mode Sense: 61 00 00 00

[ 1.675641] sd 0:0:1:0: [sdb] Cache data unavailable

[ 1.675643] sd 0:0:1:0: [sdb] Assuming drive cache: write through

[ 1.675861] sd 0:0:2:0: [sdc] 314572800 512-byte logical blocks: (161 GB/150 GiB)

[ 1.675874] sd 0:0:2:0: [sdc] Write Protect is off

[ 1.675875] sd 0:0:2:0: [sdc] Mode Sense: 61 00 00 00

[ 1.675887] sd 0:0:2:0: [sdc] Cache data unavailable

[ 1.675888] sd 0:0:2:0: [sdc] Assuming drive cache: write through

[ 1.676098] sd 0:0:3:0: [sdd] 209715200 512-byte logical blocks: (107 GB/100 GiB)

[ 1.676110] sd 0:0:3:0: [sdd] Write Protect is off

[ 1.676112] sd 0:0:3:0: [sdd] Mode Sense: 61 00 00 00

[ 1.676123] sd 0:0:3:0: [sdd] Cache data unavailable

[ 1.676125] sd 0:0:3:0: [sdd] Assuming drive cache: write through

[ 1.678553] sd 0:0:4:0: [sde] 209715200 512-byte logical blocks: (107 GB/100 GiB)

[ 1.678576] sd 0:0:4:0: [sde] Write Protect is off

[ 1.678578] sd 0:0:4:0: [sde] Mode Sense: 61 00 00 00

[ 1.678600] sd 0:0:4:0: [sde] Cache data unavailable

[ 1.678602] sd 0:0:4:0: [sde] Assuming drive cache: write through

[ 1.689953] sda: sda1 sda2 < sda5 >

[ 1.704065] sd 0:0:2:0: [sdc] Attached SCSI disk

[ 1.704096] sd 0:0:0:0: [sda] Attached SCSI disk

[ 1.704133] sd 0:0:1:0: [sdb] Attached SCSI disk

[ 1.707367] random: fast init done

[ 1.716059] sd 0:0:3:0: [sdd] Attached SCSI disk

[ 1.716263] sd 0:0:4:0: [sde] Attached SCSI disk

[ 1.718926] sr 1:0:0:0: [sr0] scsi3-mmc drive: 1x/1x writer dvd-ram cd/rw xa/form2 cdda tray

[ 1.718929] cdrom: Uniform CD-ROM driver Revision: 3.20

[ 1.799911] raid6: sse2x4 gen() 12636 MB/s

[ 1.867915] raid6: sse2x4 xor() 7265 MB/s

[ 1.935914] raid6: sse2x2 gen() 13208 MB/s

[ 2.003916] raid6: sse2x2 xor() 8459 MB/s

[ 2.071921] raid6: sse2x1 gen() 9450 MB/s

[ 2.139915] raid6: sse2x1 xor() 6012 MB/s

[ 2.139921] raid6: using algorithm sse2x2 gen() 13208 MB/s

[ 2.139922] raid6: .... xor() 8459 MB/s, rmw enabled

[ 2.139924] raid6: using ssse3x2 recovery algorithm

[ 2.140928] xor: automatically using best checksumming function avx

[ 2.141476] async_tx: api initialized (async)

[ 2.150874] md/raid:md0: not clean -- starting background reconstruction

[ 2.150895] md/raid:md0: device sde operational as raid disk 3

[ 2.150897] md/raid:md0: device sdd operational as raid disk 2

[ 2.150898] md/raid:md0: device sdb operational as raid disk 0

[ 2.151793] md/raid:md0: cannot start dirty degraded array.

[ 2.151954] md/raid:md0: failed to run raid set.

[ 2.151955] md: pers->run() failed ...

[ 2.172145] sr 1:0:0:0: Attached scsi CD-ROM sr0

[ 2.332210] Btrfs loaded, crc32c=crc32c-intel, zoned=yes

[ 2.404732] PM: Image not found (code -22)

[ 3.594298] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

[ 3.671619] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist.

[ 3.798104] systemd[1]: Inserted module 'autofs4'

## lsblk -->> Uh?!?! no "md0" ?????!!

root@pandora:~# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda 8:0 0 64G 0 disk

├─sda1 8:1 0 63G 0 part /

├─sda2 8:2 0 1K 0 part

└─sda5 8:5 0 975M 0 part [SWAP]

sdb 8:16 0 100G 0 disk

sdc 8:32 0 150G 0 disk

sdd 8:48 0 100G 0 disk

sde 8:64 0 100G 0 disk

sr0 11:0 1 800M 0 rom

## mdadm config ** Not sure about it listing spare=1 as capacity with 3 x 100GB and 1 x 150GB Disk = 294GB usable capacity... so... I think this is not correct

root@pandora:~# cat /etc/mdadm/mdadm.conf

# This file is auto-generated by openmediavault (https://www.openmediavault.org)

# WARNING: Do not edit this file, your changes will get lost.

# mdadm.conf

# Please refer to mdadm.conf(5) for information about this file.

# by default, scan all partitions (/proc/partitions) for MD superblocks.

# alternatively, specify devices to scan, using wildcards if desired.

# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.

# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is

# used if no RAID devices are configured.

DEVICE partitions

# auto-create devices with Debian standard permissions

CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system

HOMEHOST <system>

# definitions of existing MD arrays

ARRAY /dev/md0 metadata=1.2 spares=1 name=pandora:0 UUID=ec57ee88:ec272e66:dba45014:18600360

## RAID status

root@pandora:~# mdadm -D /dev/md0

/dev/md0:

Version : 1.2

Creation Time : Thu Dec 2 12:13:02 2021

Raid Level : raid5

Used Dev Size : 104791040 (99.94 GiB 107.31 GB)

Raid Devices : 4

Total Devices : 4

Persistence : Superblock is persistent

Update Time : Sun Dec 5 23:28:18 2021

State : active, FAILED, Not Started

Active Devices : 3

Working Devices : 4

Failed Devices : 0

Spare Devices : 1

Layout : left-symmetric

Chunk Size : 512K

Consistency Policy : unknown

Name : pandora:0 (local to host pandora)

UUID : ec57ee88:ec272e66:dba45014:18600360

Events : 1129

Number Major Minor RaidDevice State

- 0 0 0 removed

- 0 0 1 removed

- 0 0 2 removed

- 0 0 3 removed

- 8 64 3 sync /dev/sde

- 8 32 1 spare rebuilding /dev/sdc

- 8 48 2 sync /dev/sdd

- 8 16 0 sync /dev/sdb

root@pandora:~#

# Is RAID still recovering and so .. just wait... nope.... I left this run all night and no change

root@pandora:~# cat /proc/mdstat

Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

md0 : inactive sde[3] sdd[2] sdb[0] sdc[4]

471599104 blocks super 1.2

unused devices: <none>

# Kick rebuild in pants (no option in GUI as it did not list md device

root@pandora:~# mdadm --assemble --run --force --update=resync /dev/md0 /dev/sdb /dev/sdc /dev/sdd /dev/sde

mdadm: Marking array /dev/md0 as 'clean'

mdadm: /dev/md0 has been started with 3 drives (out of 4) and 1 rebuilding.

# Recheck block device listing /md0 is back...

root@pandora:~# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

sda 8:0 0 64G 0 disk

├─sda1 8:1 0 63G 0 part /

├─sda2 8:2 0 1K 0 part

└─sda5 8:5 0 975M 0 part [SWAP]

sdb 8:16 0 100G 0 disk

└─md0 9:0 0 299.8G 0 raid5

sdc 8:32 0 150G 0 disk

└─md0 9:0 0 299.8G 0 raid5

sdd 8:48 0 100G 0 disk

└─md0 9:0 0 299.8G 0 raid5

sde 8:64 0 100G 0 disk

└─md0 9:0 0 299.8G 0 raid5

Now back in GUI

Now that looks better

# With RAID back online.. do we restart file system or UI services to accept status now repairing

*maybe above error is not related... this is back to querks of ACL and POSIX I have not vetted out how the system is doing this.

Quesetions:

1) I know this is a bit hostile, but to trust my data, I need to know recovery expectations and paths. Why did it get stuck, and is there a miss configuration of UI / tool set on my part?

2) Is there a better "flow diagram" of repair pinned /posted that helps people review from base hardware to full volume back online.

Thanks,

penguinpages · 2. Dezember 2021

Good feedback. Appreciate info.

penguinpages · 2. Dezember 2021

I think the focus of this distro / OSS project is "Robust NAS for easy use". So I don't think it fair to ask outside those core tennents of the project for features.

But that being said..

1) Assuming the bear metal + base OS on USB key: Is there a published answer file (such as Kickstart) where you can get the get the deployment sufficient to complete set over SSH. (Ex: walk through bear metal install manual.. this saves ks.cfg in /root I copy that back to the boot usb key I installed from and modify grub to use that.. edit it to wipe OS boot drive . I now have re-deployment image from base metal, as well as how it is configured cotified in code.

2) install is done by above to have HTTP / SSH into single defined IP... Could use what GUI does , and saved as yml files. Then allow redeployment of all post configuration steps. Ex: I go into GUI and setup four dive R5 create file system. This seems to be shell command but if on back end it is just using ansible etc... then it would be helpful to have that so I could deploy IaC vs web click / GUI.

Again.. just curious if that exists that I can lean against. Knowing that the focus of the "OpenMediaVault" project is focused on GUI admin level.

penguinpages · 2. Dezember 2021

rebuilding with one nic.. no bond.. just vlans worked.. so some issue with way vmware handles bonding hand shake vs the OS.. As this is a test of method and process pre-appliance showing up... I will let this go as the hardware system will have full LACP bindings on switch side.

I will just consider this topic closed for time being.

##

I googled around and seeing bits and peices.. but is there an ansible module or playbook /yaml example set for deploying this IaC?

I realize the target customer for this is "ops" based with GUI, but I am trying to build / maintain lab IaC.. so just curious if this is in plan.

penguinpages · 2. Dezember 2021

thanks for notes:

I ran debug output into file so I could scrape it: netplan --debug generate > netplan_debug.out

cat netplan_debug.out

netplan apply

ip ad

I also noted curious state change in vSwitch "blocked" ... so let me do some root causing... I am going to reinstall and this time do vlans without bonding to rule that out .. aka.. tagging passes ok... and it is bonding mode issue.

One thing that would help would be a script to return to factory default.. or even better.. base load can only bind IP... ok.. back that state up and once I go "fancy" ... have a means to role back. I guess it would be something like deleting /etc/netplan and returning initial OS install file back.

Ex:

cp /etc/network/interfaces /root/intefaces.bak

<test>

If fail.. and need to get back to initial setup

cp /root/interfaces.bak /etc/network/interfaces

rm -rf /etc/netplan

not sure outside reboot how to hup that so network reloads... tried restart network.. but I think this distro has modified a few things to not allow that to work

thanks,

penguinpages · 1. Dezember 2021

Knowing that helped a lot.

Looking at configuration file.. they look fine... maybe someone else notices what is broken that

Not seeing what is wrong. Sorry for screen scrape as without networking I don't have means to get out as txt.

penguinpages · 1. Dezember 2021

I have for years used NAS function from router.. but they decided to push there cloud thing... so I have to find a new home for my NAS data. Most of my lab is Linux and so NFS but I do have to backup other things so SMB also helpful. Torrent / Object icing on cake.

4 x 2TB samsung SSD R5 with

-> two folders /movies /backup export as NFS/ SMB

-> bonus round: torrent system for downloads vs my rasberry pi doing it

-> bonus round: Object based volume to test backup of K8 environment

I purchased GnuBee Personal Cloud 1 | Crowd Supply because other OEM NAS units were to $$ and too many lockin issues on software

My lab has 10Gb for data such as vsan and K8 clusters (VLAN 101 172.16.101.0/24) and 1Gb for VM / general data (VLAN 100 172.16.100.0/24)

My plan was to deploy VM under my vmware cluster to document deployment and setup methods, as well as kick tires on admin day 2 tasks of platform while I await my new hardware to show. So I built VM 2vcpu 2GB RAM 1 x 64GB (OS) 3 x 100GB for Data 2 x NICs (aka.. same as hardware I ordereded)

I deployed OS: IP binding to first nic ens192 with IP. Then logged into GUI -> Created bond with just second nic ens224 -> Create both VLANs , but bind storage IP to just vlan 101 apply.

Test that trunk to VM works.. <ok>

Second phase is remove en192 -> add to bond -> bind IP to VLAN100... apply <ok>

Both IPs now running over VMWare trunk to bond with vlans and tags...

Now create self sign SSH keys and flip mgmt to ONLY HTTPS secure. <ok>

Now... reboot to test that this holds over reboot <fail>

Host is offline... no arp .. it "used to work pre-reboot" ...

But It is hard to debug when I am not sure where debian / this distro placed network configuration files

Nothing in /etc/network/interfaces or subdirectory include path /etc/network/interfaces.d

Question:

1) where are the network configuration files stored.. the base deployment had the configuration there.. but once bond / vlans created they move ????

2) with this NAS linux distro, I purchased that appliance due to too many querks with raspberry pi 3 and 4 not stable, and drive issues with using USB attached SATA.. Any other words of wisdom .. this NAS Linux project looks solid, but want to make sure my plans are not over reaching.

Beiträge von penguinpages

RAID Repair - Hard Reboot md0 offline

RAID reshaping stuck (4->5)

RAID Repair - Hard Reboot md0 offline

RAID Repair - Hard Reboot md0 offline

Test Build - VM - Bond +VLAN no communication after reboot

Test Build - VM - Bond +VLAN no communication after reboot

Test Build - VM - Bond +VLAN no communication after reboot

Test Build - VM - Bond +VLAN no communication after reboot

Test Build - VM - Bond +VLAN no communication after reboot

Test Build - VM - Bond +VLAN no communication after reboot