Beiträge von penguinpages

    Thanks for help. Great to see such a responsive / active forum ...


    So far... this OSS project seems to be very good.


    As a side note "...Raid is not a backup! Would you go skydiving without a parachute?..." Sure.. Once :)


    But to that point: My concern is more about Rancidware then disk failure. Will this project support snapshot logic.


    Would be cool to see option when RAID built to: Option in File Systems next step to set %80 for data and leave 15% for snapshots and 5% for "oh crap I ran myself out of space and need breathing room"

    Can you list outputs from configuration


    Do you have mission critical services running such that restart of services and raid an issue? (aka what suggestions that impact data can be made)


    # root cause a bit before restart

    lsblk

    mdadm -D /dev/md0

    cat /proc/mdstat


    # restart if this is option Ex: 4 drives in R5

    mdadm --stop /dev/md0

    mdadm --assemble --run --force --update=resync /dev/md0 /dev/sdb /dev/sdc /dev/sdd /dev/sde


    # Post above output of "tail -f /var/log/syslog" while doing above in seperate shell

    tail -f /var/log/syslog


    Lets see if something shows its head as issue


    PS: What are disk drive size... do you KNOW they are good drives

    So my path to corrective action then was / is as expected..


    Questions:

    1) Is there plans in UI to add "start / stop" services option: Ex: start / stop mdadm to see details / errors start / stop <services: smb, nfs, ftp etc.." once volume is back online


    2) Can the array Utility "upgrade capacity" -> I started testing this via adding new drive as 150GB vs the other three 100GB.. to see if it would expand /dev/md0 and then fs etc..

    I am waiting for my hardware appliance parts to show.. So I built OpenMediaVault as VM on my VMWare cluster.


    1CPU

    1GB RAM

    2 x 1Gb NIC

    1 x 64GB SSD for OS

    3 x 100GB SSD for RAID 5


    I set and .. besides expected querks about SMB / NFS interactions on permissions (still working on those .. but kind of expected these as ACL to POSIX is a PITA)


    My goal though was to test out RAID recovery steps and how clean / what tools they needed to do that.


    First Test: Add a new hard disk and expand

    Super easy.. 4.5 out of 5 on how simple this was. Only note is that it took me a few min to find details of % rebuild such that the new array size would be ready



    Second Test: Replace a Drive

    While system running... pull a hard disk and then recover.


    At first this did ok.. Remove disk /dev/sdc Array still worked.. just noted critical state. But then I added back a new 150GB disk and it showed up as /dev/sdc (expected it as /dev/sde but ... meh..)


    Rebuild -> Wizard was fine.. seemed to be chugging along ok.



    Third Test: Hard Power during rebuild.


    This is the last test I typically do where the array is in rebuild mode... and you just pull power on server. It then has to handle OS boot recovery as well as decide what to do with Array Ex: /dev/md0 when it is still critical but rebuild started, yet not complete.


    Boot state was file system offline


    But more concerning is it is not even listing the RAID device


    This

    root@pandora:~# cat /proc/mdstat

    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

    md0 : inactive sde[3] sdd[2] sdb[0] sdc[4]

    471599104 blocks super 1.2


    unused devices: <none>


    is where things went off rails.


    1) OS Boot... fsck repaired file system on /dev/sda boot drive partitions - Good

    2) System booted up and UI launched etc... - Good

    3) All disk showing up - Good

    4) /dev/md0 repair continue but while this is happening... services come online - Fail


    Trying to root cause as well as document how to recover the system. Maybe this is already posted as a "how to repair a RAID state" in a document flow or video... but I did not see it.


    ## dmeg state <snip> to just drive messages

    [ 1.660977] ata1.00: ATAPI: VMware Virtual IDE CDROM Drive, 00000001, max UDMA/33

    [ 1.666086] scsi 1:0:0:0: CD-ROM NECVMWar VMware IDE CDR00 1.00 PQ: 0 ANSI: 5

    [ 1.675319] sd 0:0:0:0: [sda] 134217728 512-byte logical blocks: (68.7 GB/64.0 GiB)

    [ 1.675333] sd 0:0:0:0: [sda] Write Protect is off

    [ 1.675335] sd 0:0:0:0: [sda] Mode Sense: 61 00 00 00

    [ 1.675349] sd 0:0:0:0: [sda] Cache data unavailable

    [ 1.675351] sd 0:0:0:0: [sda] Assuming drive cache: write through

    [ 1.675614] sd 0:0:1:0: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB)

    [ 1.675627] sd 0:0:1:0: [sdb] Write Protect is off

    [ 1.675629] sd 0:0:1:0: [sdb] Mode Sense: 61 00 00 00

    [ 1.675641] sd 0:0:1:0: [sdb] Cache data unavailable

    [ 1.675643] sd 0:0:1:0: [sdb] Assuming drive cache: write through

    [ 1.675861] sd 0:0:2:0: [sdc] 314572800 512-byte logical blocks: (161 GB/150 GiB)

    [ 1.675874] sd 0:0:2:0: [sdc] Write Protect is off

    [ 1.675875] sd 0:0:2:0: [sdc] Mode Sense: 61 00 00 00

    [ 1.675887] sd 0:0:2:0: [sdc] Cache data unavailable

    [ 1.675888] sd 0:0:2:0: [sdc] Assuming drive cache: write through

    [ 1.676098] sd 0:0:3:0: [sdd] 209715200 512-byte logical blocks: (107 GB/100 GiB)

    [ 1.676110] sd 0:0:3:0: [sdd] Write Protect is off

    [ 1.676112] sd 0:0:3:0: [sdd] Mode Sense: 61 00 00 00

    [ 1.676123] sd 0:0:3:0: [sdd] Cache data unavailable

    [ 1.676125] sd 0:0:3:0: [sdd] Assuming drive cache: write through

    [ 1.678553] sd 0:0:4:0: [sde] 209715200 512-byte logical blocks: (107 GB/100 GiB)

    [ 1.678576] sd 0:0:4:0: [sde] Write Protect is off

    [ 1.678578] sd 0:0:4:0: [sde] Mode Sense: 61 00 00 00

    [ 1.678600] sd 0:0:4:0: [sde] Cache data unavailable

    [ 1.678602] sd 0:0:4:0: [sde] Assuming drive cache: write through

    [ 1.689953] sda: sda1 sda2 < sda5 >

    [ 1.704065] sd 0:0:2:0: [sdc] Attached SCSI disk

    [ 1.704096] sd 0:0:0:0: [sda] Attached SCSI disk

    [ 1.704133] sd 0:0:1:0: [sdb] Attached SCSI disk

    [ 1.707367] random: fast init done

    [ 1.716059] sd 0:0:3:0: [sdd] Attached SCSI disk

    [ 1.716263] sd 0:0:4:0: [sde] Attached SCSI disk

    [ 1.718926] sr 1:0:0:0: [sr0] scsi3-mmc drive: 1x/1x writer dvd-ram cd/rw xa/form2 cdda tray

    [ 1.718929] cdrom: Uniform CD-ROM driver Revision: 3.20

    [ 1.799911] raid6: sse2x4 gen() 12636 MB/s

    [ 1.867915] raid6: sse2x4 xor() 7265 MB/s

    [ 1.935914] raid6: sse2x2 gen() 13208 MB/s

    [ 2.003916] raid6: sse2x2 xor() 8459 MB/s

    [ 2.071921] raid6: sse2x1 gen() 9450 MB/s

    [ 2.139915] raid6: sse2x1 xor() 6012 MB/s

    [ 2.139921] raid6: using algorithm sse2x2 gen() 13208 MB/s

    [ 2.139922] raid6: .... xor() 8459 MB/s, rmw enabled

    [ 2.139924] raid6: using ssse3x2 recovery algorithm

    [ 2.140928] xor: automatically using best checksumming function avx

    [ 2.141476] async_tx: api initialized (async)

    [ 2.150874] md/raid:md0: not clean -- starting background reconstruction

    [ 2.150895] md/raid:md0: device sde operational as raid disk 3

    [ 2.150897] md/raid:md0: device sdd operational as raid disk 2

    [ 2.150898] md/raid:md0: device sdb operational as raid disk 0

    [ 2.151793] md/raid:md0: cannot start dirty degraded array.

    [ 2.151954] md/raid:md0: failed to run raid set.

    [ 2.151955] md: pers->run() failed ...

    [ 2.172145] sr 1:0:0:0: Attached scsi CD-ROM sr0

    [ 2.332210] Btrfs loaded, crc32c=crc32c-intel, zoned=yes

    [ 2.404732] PM: Image not found (code -22)

    [ 3.594298] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.

    [ 3.671619] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist.

    [ 3.798104] systemd[1]: Inserted module 'autofs4'


    ## lsblk -->> Uh?!?! no "md0" ?????!!

    root@pandora:~# lsblk

    NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

    sda 8:0 0 64G 0 disk

    ├─sda1 8:1 0 63G 0 part /

    ├─sda2 8:2 0 1K 0 part

    └─sda5 8:5 0 975M 0 part [SWAP]

    sdb 8:16 0 100G 0 disk

    sdc 8:32 0 150G 0 disk

    sdd 8:48 0 100G 0 disk

    sde 8:64 0 100G 0 disk

    sr0 11:0 1 800M 0 rom


    ## mdadm config ** Not sure about it listing spare=1 as capacity with 3 x 100GB and 1 x 150GB Disk = 294GB usable capacity... so... I think this is not correct

    root@pandora:~# cat /etc/mdadm/mdadm.conf

    # This file is auto-generated by openmediavault (https://www.openmediavault.org)

    # WARNING: Do not edit this file, your changes will get lost.


    # mdadm.conf

    # Please refer to mdadm.conf(5) for information about this file.

    # by default, scan all partitions (/proc/partitions) for MD superblocks.

    # alternatively, specify devices to scan, using wildcards if desired.

    # Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.

    # To avoid the auto-assembly of RAID devices a pattern that CAN'T match is

    # used if no RAID devices are configured.

    DEVICE partitions

    # auto-create devices with Debian standard permissions

    CREATE owner=root group=disk mode=0660 auto=yes

    # automatically tag new arrays as belonging to the local system

    HOMEHOST <system>

    # definitions of existing MD arrays

    ARRAY /dev/md0 metadata=1.2 spares=1 name=pandora:0 UUID=ec57ee88:ec272e66:dba45014:18600360


    ## RAID status

    root@pandora:~# mdadm -D /dev/md0

    /dev/md0:

    Version : 1.2

    Creation Time : Thu Dec 2 12:13:02 2021

    Raid Level : raid5

    Used Dev Size : 104791040 (99.94 GiB 107.31 GB)

    Raid Devices : 4

    Total Devices : 4

    Persistence : Superblock is persistent


    Update Time : Sun Dec 5 23:28:18 2021

    State : active, FAILED, Not Started

    Active Devices : 3

    Working Devices : 4

    Failed Devices : 0

    Spare Devices : 1


    Layout : left-symmetric

    Chunk Size : 512K


    Consistency Policy : unknown


    Name : pandora:0 (local to host pandora)

    UUID : ec57ee88:ec272e66:dba45014:18600360

    Events : 1129


    Number Major Minor RaidDevice State

    - 0 0 0 removed

    - 0 0 1 removed

    - 0 0 2 removed

    - 0 0 3 removed


    - 8 64 3 sync /dev/sde

    - 8 32 1 spare rebuilding /dev/sdc

    - 8 48 2 sync /dev/sdd

    - 8 16 0 sync /dev/sdb

    root@pandora:~#



    # Is RAID still recovering and so .. just wait... nope.... I left this run all night and no change

    root@pandora:~# cat /proc/mdstat

    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]

    md0 : inactive sde[3] sdd[2] sdb[0] sdc[4]

    471599104 blocks super 1.2


    unused devices: <none>



    # Kick rebuild in pants (no option in GUI as it did not list md device

    root@pandora:~# mdadm --assemble --run --force --update=resync /dev/md0 /dev/sdb /dev/sdc /dev/sdd /dev/sde

    mdadm: Marking array /dev/md0 as 'clean'

    mdadm: /dev/md0 has been started with 3 drives (out of 4) and 1 rebuilding.


    # Recheck block device listing /md0 is back...

    root@pandora:~# lsblk

    NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT

    sda 8:0 0 64G 0 disk

    ├─sda1 8:1 0 63G 0 part /

    ├─sda2 8:2 0 1K 0 part

    └─sda5 8:5 0 975M 0 part [SWAP]

    sdb 8:16 0 100G 0 disk

    └─md0 9:0 0 299.8G 0 raid5

    sdc 8:32 0 150G 0 disk

    └─md0 9:0 0 299.8G 0 raid5

    sdd 8:48 0 100G 0 disk

    └─md0 9:0 0 299.8G 0 raid5

    sde 8:64 0 100G 0 disk

    └─md0 9:0 0 299.8G 0 raid5



    Now back in GUI

    Now that looks better



    # With RAID back online.. do we restart file system or UI services to accept status now repairing

    *maybe above error is not related... this is back to querks of ACL and POSIX I have not vetted out how the system is doing this.


    Quesetions:

    1) I know this is a bit hostile, but to trust my data, I need to know recovery expectations and paths. Why did it get stuck, and is there a miss configuration of UI / tool set on my part?

    2) Is there a better "flow diagram" of repair pinned /posted that helps people review from base hardware to full volume back online.



    Thanks,

    I think the focus of this distro / OSS project is "Robust NAS for easy use". So I don't think it fair to ask outside those core tennents of the project for features.


    But that being said..


    1) Assuming the bear metal + base OS on USB key: Is there a published answer file (such as Kickstart) where you can get the get the deployment sufficient to complete set over SSH. (Ex: walk through bear metal install manual.. this saves ks.cfg in /root I copy that back to the boot usb key I installed from and modify grub to use that.. edit it to wipe OS boot drive . I now have re-deployment image from base metal, as well as how it is configured cotified in code.


    2) install is done by above to have HTTP / SSH into single defined IP... Could use what GUI does , and saved as yml files. Then allow redeployment of all post configuration steps. Ex: I go into GUI and setup four dive R5 create file system. This seems to be shell command but if on back end it is just using ansible etc... then it would be helpful to have that so I could deploy IaC vs web click / GUI.



    Again.. just curious if that exists that I can lean against. Knowing that the focus of the "OpenMediaVault" project is focused on GUI admin level.

    <update>

    rebuilding with one nic.. no bond.. just vlans worked.. so some issue with way vmware handles bonding hand shake vs the OS.. As this is a test of method and process pre-appliance showing up... I will let this go as the hardware system will have full LACP bindings on switch side.


    I will just consider this topic closed for time being.



    ##

    I googled around and seeing bits and peices.. but is there an ansible module or playbook /yaml example set for deploying this IaC?


    I realize the target customer for this is "ops" based with GUI, but I am trying to build / maintain lab IaC.. so just curious if this is in plan.

    thanks for notes:

    I ran debug output into file so I could scrape it: netplan --debug generate > netplan_debug.out

    cat netplan_debug.out


    netplan apply

    ip ad


    I also noted curious state change in vSwitch "blocked" ... so let me do some root causing... I am going to reinstall and this time do vlans without bonding to rule that out .. aka.. tagging passes ok... and it is bonding mode issue.



    One thing that would help would be a script to return to factory default.. or even better.. base load can only bind IP... ok.. back that state up and once I go "fancy" ... have a means to role back. I guess it would be something like deleting /etc/netplan and returning initial OS install file back.


    Ex:

    <os base install with single nic untagged mode connection static IP>

    cp /etc/network/interfaces /root/intefaces.bak

    <use UI to generate bond + NICs + teams + binding of static IPs>

    <test>

    If fail.. and need to get back to initial setup

    cp /root/interfaces.bak /etc/network/interfaces

    rm -rf /etc/netplan


    not sure outside reboot how to hup that so network reloads... tried restart network.. but I think this distro has modified a few things to not allow that to work



    thanks,

    I have for years used NAS function from router.. but they decided to push there cloud thing... so I have to find a new home for my NAS data. Most of my lab is Linux and so NFS but I do have to backup other things so SMB also helpful. Torrent / Object icing on cake.


    4 x 2TB samsung SSD R5 with

    -> two folders /movies /backup export as NFS/ SMB

    -> bonus round: torrent system for downloads vs my rasberry pi doing it

    -> bonus round: Object based volume to test backup of K8 environment



    I purchased GnuBee Personal Cloud 1 | Crowd Supply because other OEM NAS units were to $$ and too many lockin issues on software


    My lab has 10Gb for data such as vsan and K8 clusters (VLAN 101 172.16.101.0/24) and 1Gb for VM / general data (VLAN 100 172.16.100.0/24)


    My plan was to deploy VM under my vmware cluster to document deployment and setup methods, as well as kick tires on admin day 2 tasks of platform while I await my new hardware to show. So I built VM 2vcpu 2GB RAM 1 x 64GB (OS) 3 x 100GB for Data 2 x NICs (aka.. same as hardware I ordereded)


    I deployed OS: IP binding to first nic ens192 with IP. Then logged into GUI -> Created bond with just second nic ens224 -> Create both VLANs , but bind storage IP to just vlan 101 apply.


    Test that trunk to VM works.. <ok>

    Second phase is remove en192 -> add to bond -> bind IP to VLAN100... apply <ok>


    Both IPs now running over VMWare trunk to bond with vlans and tags...


    Now create self sign SSH keys and flip mgmt to ONLY HTTPS secure. <ok>


    Now... reboot to test that this holds over reboot <fail>


    Host is offline... no arp .. it "used to work pre-reboot" ...


    But It is hard to debug when I am not sure where debian / this distro placed network configuration files



    Nothing in /etc/network/interfaces or subdirectory include path /etc/network/interfaces.d



    Question:

    1) where are the network configuration files stored.. the base deployment had the configuration there.. but once bond / vlans created they move ????

    2) with this NAS linux distro, I purchased that appliance due to too many querks with raspberry pi 3 and 4 not stable, and drive issues with using USB attached SATA.. Any other words of wisdom .. this NAS Linux project looks solid, but want to make sure my plans are not over reaching.