Beiträge von vspeed

    Interesting never seen that one before, I'm assuming that by cloning the drives one is informing mdadm that there are no bad blocks on those drives

    You only have to do that if the array is active (auto-read-only)


    What you've done is an interesting solution to solving your problem :thumbup:

    giving this an update: There were still badblocks still after cloning the drives. The badblock seem were logic badblocks or whatever you call it. The array was not mounting when that command finished. In journalctl, it was showing several corrupt files and that the filesystem was inconsistent, which I had to try to recover with fsck. After this I could only recover like 60% of the data. Most files were corrupted.


    After this I thought that maybe the cloning was not so successful. Having 60% of the data back was better than 0, but I thought, ok, le me try this with the original drives, including the WD RED (which apparently is an SMR drive and that is not suitable to any NAS raid. Acording to the experts, you should avoid these, specially a mix of them). read this

    So I again used mdadm --assemble --update=force-no-bbl --force /dev/md127 /dev/sda /dev/sdd /dev/sde /dev/sdc and assembled the array as before but ignoring the badblocks. Now I think I have all the data intact and the raid is healthy. No more filesystem corrupt or the need to do a fsck.

    Going to backup everything to a 8TB drive plus some 500gb drives that I have lying around and I'm going to rebuild the entire array probably with only the toshibas N300 drives.


    I hope that my mistakes and the time I lost with this, saves someone else from the same hassle.

    Hey, been a long time.

    Here is what I did.


    Bought 4 proper Toshiba N300 4TB drives and cloned the original WDs to these so I avoid messing more with the data.

    In another machine (don't want to mess the OMV more than it already is), plugged the drives and checked the status. It was clean degraded.


    Then upon reading this I assembled the array as having no badblocks.


    Code
    root@openmediavault:~# mdadm --stop --force /dev/md127
    mdadm: stopped /dev/md127
    root@openmediavault:~# mdadm --detail /dev/md127
    mdadm: cannot open /dev/md127: No such file or directory
    root@openmediavault:~# mdadm --assemble --update=force-no-bbl --force /dev/md127 /dev/sda /dev/sdd /dev/sde /dev/sdc
    mdadm: Marking array /dev/md127 as 'clean'
    mdadm: /dev/md127 has been started with 4 drives (out of 5).
    root@openmediavault:~# mdadm --readwrite /dev/md127

    Now the array is reshaping back from 5 to the 4 drives again and it's progressing fine.


    Code
    root@openmediavault:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md127 : active raid5 sda[6] sde[4] sdc[7] sdd[5]
    11720661504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [UUUU_]
    [>....................]  reshape =  0.7% (30500352/3906887168) finish=969.8min speed=66612K/sec
    bitmap: 6/30 pages [24KB], 65536KB chunk

    So maybe I could from the beggining just use this command and continue with the growth with the 5 drivers .


    Let's hope it finishes successfully.



    As lessons learned, maybe before growing an array, its better to check for badblocks and check the health of the array with:

    Code
    root@openmediavault:~# mdadm --examine-badblocks /dev/sdd
    echo check > /sys/block/md127/md/sync_action OR /usr/share/mdadm/checkarray -a /dev/md127
    root@openmediavault:~# cat /proc/mdstat

    I was checking this post better and it seems I am more or less in the same boat.

    Code
    root@openmediavault:~# mdadm --examine-badblocks /dev/sda
    Bad-blocks list is empty in /dev/sda
    root@openmediavault:~# mdadm --examine-badblocks /dev/sdc
    Bad-blocks list is empty in /dev/sdc
    root@openmediavault:~# mdadm --examine-badblocks /dev/sdd
    Bad-blocks on /dev/sdd:
    4074832 for 8 sectors
    root@openmediavault:~# mdadm --examine-badblocks /dev/sde
    Bad-blocks on /dev/sde:
    4074832 for 8 sectors

    I have some doubts on how to proceed... the procedure seems kind or risky

    I also saw that he somehow solved the problem with some kernel patch and also found this post. It mentions a backup file "--backup-file /root/mdadm-backup". Which file is this?

    it hanged again... more or less, same block as before. I'm loosing hope on this. The smart show that the drives are fine.

    The speed keeps dropping until it reaches 0 and the block remains the same.


    Any way of checking for drive errors?

    Code
    root@openmediavault:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md127 : active raid5 sda[6] sde[4] sdd[7] sdb[5]
          11720661504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [UUUU_]
          [>....................]  reshape =  0.0% (2303100/3906887168) finish=3315.9min speed=19624K/sec
          bitmap: 6/30 pages [24KB], 65536KB chunk

    Noe it's set to clean degraded. How do I rebuild this or how can I access the data? I was using LVM.

    With just the 4 original HDDs, it is now in inactive state. Which are my options now?

    Should I use: mdadm --assemble -force /dev/md127 /dev/sd[abde] ?

    Installed the 5 drives in a different machine, to rule out any hardware issue. The mdadm --detail detecte the array with the 5 drives and state was reshaping paused or something. I resume with mdadm --readwrite /dev/md127 it started for a couple of seconds, just to stop again, more or less in the same block...


    Still cannot get what is wrong.


    Will try to recover with just the 4 drives.

    I'm not in system rescue CD. Never used it actually. Is it in any way useful in this situation?

    How can I reboot for normal debian/OMV when I'm unable to freeze the reshaping and it won't let me shutdown?

    I tried the command but I see no difference. This reshape is hanged. This is not a problem of speed. Look at the block. It stopped at 1923544/3906887168 since it started

    Code
    root@openmediavault:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md127 : active raid5 sda[4] sde[7] sdb[6] sdf[8] sdc[5]
    11720661504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
    [>....................]  reshape =  0.0% (1923544/3906887168) finish=15987659.9min speed=4K/sec
    bitmap: 6/30 pages [24KB], 65536KB chunk



    Found this post but I'm having a hard time getting sense of it


    The command "echo frozen > /sys/block/md127/md/sync_action" just hangs the terminal with no output. I have tried it before.

    I've also found that but might not be the solution, however I did find this and would suggest echo max > /sys/block/md127/md/sync_max


    TBH I've never encountered this personally so anything I find is going to be from searching

    To execute that "echo max" should the reshape be stopped?

    How can I stop it? I cannot seem to be able to stop it with any command.

    Are there any information in the syslog? Run journalctl -f for some time.

    not really.




    Here is the complete journal

    Drive inserted at:

    Code
    Jan 10 16:27:21 openmediavault smartd[1187]: Device: /dev/disk/by-id/ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K0KL21HZ [SAT], state written to /var/lib/smartmontools/smartd.WDC_WD40EZRZ_00GXCB0-WD_WCC7K0KL21HZ.ata.state

    reshape started:

    Code
    Jan 10 16:56:32 openmediavault kernel: md: reshape of RAID array md127

    The sata controller was working fine with another HDD inserted in the same port. How can I go back to 4x4TB?

    What are my options here? Can I start over?


    Just pulled the power from the server, to see if it would restart the reshaping. It seems it did, but it hung again.


    Code
    root@openmediavault:~# cat /proc/mdstat
    Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0] [raid1] [raid10]
    md127 : active raid5 sda[4] sde[7] sdb[6] sdf[8] sdc[5]
    11720661504 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/5] [UUUUU]
    [>....................]  reshape =  0.0% (1923544/3906887168) finish=241411.6min speed=269K/sec
    bitmap: 6/30 pages [24KB], 65536KB chunk

    No, It's not. These drives were once installed in an helios4 which died. They are now (for 2 years already, working like a charm) in a custom motherboard with an i7-3630QM and a PCIEx4 sata controller with 10 sata ports.

    The reshaping is hanged. The output of "cat /proc/mdstat" doesn't show any difference for hours. mdadm is using 100% CPU but is doing nothing. There are no HDD activity LEDs blinking. I'm sure this hanged.

    I just added a new 4TB drive to the existing 4x4TB drives. Then in the OMV just hit expand in the "software raid" tab.

    I saw the HDD LEDs blink for some seconds but it then stopped. I cannot stop the raid or the reshaping. It's simply stuck. I even rebooted the server(probably shouldn't have done that as now I see no LVM and I cannot access any data).


    THe new drive is sdi.


    How can I recover from this?


    Yes, that is the real problem.

    What do you mean with this? Do you mean that portainer/omv cannot connect to the internet because pihole is running as a container?

    I should add that I have DHCP disabled in my main router and I'm using pihole DHCP server.


    All other containers are running fine and have access to internet.


    I was able to recreate and pull the images from portainer by editing /etc/resolv.conf and adding nameserver 8.8.8.8, however, it keeps deleting this entry and going back to default which is with just this:


    I think you can ignore all of this mess :)

    I added 8.8.8.8 DNS server IP in advanced settings of OMV network and it seems to be working fine now