Beiträge von spigs

    I find quite rarely that sometimes I lose my network connection and cannot logon from another PC. This can be resolved by a restart of ifup@enp3s0.service. However, this means I need to attach a keyboard and monitor to do so.


    In trying to diagnose the problem I sometimes see on a server restart, like below, that the service ifup@enp3s0.service has failed. It doesn't give much information other than `code=exited, status=1`.


    At this point OMV is working fine but I believe it is the next day when the DHCP renewal does not work. I cannot think of a way to fix this. I've been tempted to just restart it every day but that would be a crude solution to not understanding the problem.


    Anyone seen anything like this before please?


    After a restart all is well

    My LSI 9207-8 arrived a couple of days ago. This was via ebay for GBP48.80 / USD56.27 and included the 2*4 cables. It was reported as actually being unused and I didn't see any evidence that this old card had been. I switched to it from the BEYIMEI PCIE card for my two drives and all was okay. I then added the further four Samsung disks which had been a problem for the BEYIMEI. This time a single disk is reported as having a problem and no other disks. (As reported above errors were seen across multiple disks last time.)


    Code
     [Sat Mar 11 15:28:55 2023] sd 0:0:5:0: task abort: SUCCESS scmd(0x000000008e71f524)
    2[Sat Mar 11 15:28:57 2023] sd 0:0:5:0: [sdf] tag#7569 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
    3[Sat Mar 11 15:28:57 2023] sd 0:0:5:0: [sdf] tag#7569 Sense Key : Not Ready [current]
    4[Sat Mar 11 15:28:57 2023] sd 0:0:5:0: [sdf] tag#7569 Add. Sense: Logical unit not ready, cause not reportable
    5[Sat Mar 11 15:28:57 2023] sd 0:0:5:0: [sdf] tag#7569 CDB: Read(10) 28 00 74 70 6d 00 00 00 08 00
    6[Sat Mar 11 15:28:57 2023] I/O error, dev sdf, sector 1953524992 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2

    I took this sdf disk out and no further problems have been reported. So it seems that my cheaper PCIE controller is bad at handling which disk has the problem.


    This card is seen in lspci as

    01:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)


    $ sudo dmesg -T | grep LSI

    [Sat Mar 11 15:58:15 2023] mpt2sas_cm0: LSISAS2308: FWVersion(20.00.07.00), ChipRevision(0x05), BiosVersion(07.39.02.00)


    It completely rearranges the HCTL: Host:Channel:Target:Lun values putting its own disks on Host 0 before the motherboard hosted disks starting here at line 8.


    I thought I would write up my notes of using a cheap PCIE card to add extra disks to my Open Media Vault setup. I have had a few problems with this type of disk controller but could not categorically prove it was at fault. Maybe other people who are using a similar controller might find these notes helpful. I have found I can use two disks but if I try to add more I get I/O errors even on the two disks that work fine when the later disk were not added.


    The controller I picked was this quite cheap one:

    BEYIMEI PCIE SATA Card 8 Port, 6 Gbit/s SATA 3.0 PCI-E Card 4X with 8 SATA Cables, Power Splitter Cable, JMB575 + ASM1166 PCIe to SATA Controller Expansion Card, Non-Raid, Boot as System Hard Drive
    BEYIMEI PCIE SATA Card 8 Port, 6 Gbit/s SATA 3.0 PCI-E Card 4X with 8 SATA Cables, Power Splitter Cable, JMB575 + ASM1166 PCIe to SATA Controller Expansion…
    www.amazon.co.uk

    BEYIMEI PCIE SATA Card 8 Port, 6 Gbit/s SATA 3.0 PCI-E Card 4X

    Chipset: ASMedia ASM1166


    From an OMV / Debian / Linux perspective we must be using it through this SATA controller


    Code
    root@omv23:~# lspci | grep ASM
    01:00.0 SATA controller: ASMedia Technology Inc. Device 1166 (rev 02)
    04:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller


    Verbose mode


    My Open Media Vault version is up to date:

    Code
    root@omv23:~# dpkg -l | grep openmediavault
    ii  openmediavault                 6.3.2-2                        all          openmediavault - The open network attached storage solution
    ii  openmediavault-kernel          6.4.5                          all          kernel package
    ii  openmediavault-keyring         1.0                            all          GnuPG archive keys of the OpenMediaVault archive
    ii  openmediavault-omvextrasorg    6.1.1                          all          OMV-Extras.org Package Repositories for OpenMediaVault
    ii  openmediavault-zfs             6.0.12                         amd64        OpenMediaVault plugin for ZFS

    The original two disks I added look like this

    Code
    lsblk -o NAME,MODEL,SERIAL,WWN,HCTL,MOUNTPOINT,SIZE,FSTYPE,UUID,PARTUUID
    
    
    sdh    TOSHIBA_HDWE160         10GAK1QDFB8G   0x50000399cbb81261 36:0:0:0                5.5T
    ├─sdh1                                        0x50000399cbb81261                         5.5T zfs_member 17579695501947784507                 de287461-af87-2849-a0d6-1cd58c90a959
    └─sdh9                                        0x50000399cbb81261                           8M                                                 0547f32f-48e5-4345-896f-b1a3b733724d
    sdi    TOSHIBA_HDWE160         10GXK1B6FB8G   0x50000399cc700b52 37:0:0:0                5.5T
    ├─sdi1                                        0x50000399cc700b52                         5.5T zfs_member 17579695501947784507                 6d50b91d-825a-1647-84dd-23005db0caaf
    └─sdi9                                        0x50000399cc700b52                           8M                                                 d0671e81-7697-7247-9fd1-84cad7c99b9b

    The additional fours disk I was trying to add are very old: three SAMSUNG SpinPoint F1 DT, one SAMSUNG SpinPoint F3.

    Code
    sdd    SAMSUNG_HD103UJ         S13PJDWS228239 0x50000f0000fe02b9 10:0:0:0              931.5G
    sde    SAMSUNG_HD103UJ         S13PJDWS228237 0x50000f0000fe02ab 11:0:0:0              931.5G
    sdf    SAMSUNG_HD103SJ         S246JDWZ525473 0x50024e900392fa0f 34:0:0:0              931.5G
    sdg    SAMSUNG_HD103UJ         S13PJDWS228229 0x50000f0000fe026b 35:0:0:0              931.5G


    I tested with the two Toshiba drives for weeks with no issues at all. When I add the four old Samsung disks (they were unformatted from a QNAP NAS which were working perfectly fine) we see within a few minutes the IO errors. These IO errors are seen on the Toshiba drive which made me think this is a Controller issue.


    There are many errors but to summarize they look like this




    I restarted the machine with just the four Samsung disks attached and they were fine on reboot. The sd d,e,f,g labels are now rearranged

    Code
    lsblk -o NAME,MODEL,SERIAL,WWN,HCTL,MOUNTPOINT,SIZE,FSTYPE,UUID,PARTUUID
    
    ...
    sdd    SAMSUNG_HD103UJ          S13PJDWS228239 0x50000f0000fe02b9 10:0:0:0              931.5G
    sde    SAMSUNG_HD103UJ          S13PJDWS228237 0x50000f0000fe02ab 11:0:0:0              931.5G
    sdf    SAMSUNG_HD103SJ          S246JDWZ525473 0x50024e900392fa0f 34:0:0:0              931.5G
    sdg    SAMSUNG_HD103UJ          S13PJDWS228229 0x50000f0000fe026b 35:0:0:0              931.5G

    and they did not detect any problems in their smart reports. NB That these SAMSUNG SpinPoint disks make an almost comic "Whoooop" sound on startup and have a fair bit of clicking - but I understand for these disks this is normal and they have always done this.


    At some point we do get IO errors reported and smartctl reports the following for all but the sdd disk on 10.0.0.0 where that's the HCTL: Host:Channel:Target:Lun identifier.

    Code
    root@omv23:~# smartctl --device=ata -H /dev/sdf
    smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.1.10-1-pve] (local build)
    Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
    
    Read Device Identity failed: Input/output error
    
    A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.


    I bought brand new SATA leads and re-seated all power leads and was able to once again run long smart tests with no issue. After many hours I added the Toshiba disks and once again I got within ten minutes IO errors.


    I've read reports that some people think PCIE SATA cards are not a good idea

    They state they

    • can have buggy firmware
    • have issues with more than two SATA ports

    but I could find no backing for this in any other sources.

    They recommend instead: Serial Attached SCSI (SAS) Host Bus Adapter (HBA). I could not find many examples of these for sale.


    For the time being I'm using my motherboard's four SATA ports and two on the PCIE controller. Everything is working well. As my Samsung drives are very old I think I will just give up on them but I'm quite confident they are fine.

    I really miss my own custom bash settings when logged on as root on my own machine.


    As a workaround I've put in to /root/.bashrc bind commands to get the behaviour I want


    e.g. to be able to search my history backwards on press of Page-Up I have


    Code
    bind '"\e[5~": history-search-backward'

    Update

    I've just noticed that the Anacron job 'cron.weekly' will email you a warning if you make this change.

    Code
    /etc/cron.weekly/openmediavault-update-smart-drivedb:
    /root/.bashrc: line 30: bind: warning: line editing not enabled


    To prevent that warning we could use

    Code
    # Rest of script above
    
    # https://www.gnu.org/software/bash/manual/bash.html#Is-this-Shell-Interactive_003f
    if [ -z "$PS1" ]; then
        # This shell is not interactive
        :
    else
        # This shell is interactive
        bind '"\e[5~": history-search-backward'
    fi