Posts by atomicrabbit

    Well. I am not sure. And today I am away so I can test on my VM.

    Well... I finally got a spare hdd to copy all the files from the raid array and about 10% into the copy, it seems like the raid drives are failing. The shared drive is offline and when I access the File Systems or Raid Management section in the OMV UI, I get "Communication Error"


    I feel like a drive (or multiple) have failed. How can I confirm?


    I've attached the output from the VM while it was trying to shutdown -- it's been a long time and still hasn't been able to shutdown.


    EDIT: After rebooting OMV, it seemed startup but there's no RAID array in the RAID Management section. All the disks appear in the Disks section and they're all green in the SMART section.

    I don't think it's a power issue because the tower (a Dell PowerEdge T110 ii) has 4 HDD bays, plus 2 bays at the top for more HDDs or optical drives, so it made to support that many devices. I don't think it's a problem with the cable since I replaced it already. I don't think it's the LSI 9811-8i card because the BIOS page for it is able to see all the drives and OMV is able to see all the drives and show the SMART status for all of them. And when I initially setup the raid array, I was able to select all 4 drives and create the RAID10 array. Sure it's possible that some hardware might have failed afterward, but based on the information I have, I really feel it's not the case. At this point I'm kind of out of solutions as I don't know enough about OMV or mdadm yet and I haven't found a solution. The HDD that all my data was previously on was failing badly and I barely was able to copy the data off of it and no longer have a separate HDD that can hold the 3TB that I copied to the RAID array. So this is my plan:


    • Buy a new 4TB (or maybe a fifth 8TB WD80EFZX for when one of the drives in the array fail)
    • Copy all the data from the raid array to the new drive temporarily
    • Kill the raid array and recreate it.
    • Copy all the data back.

    If you have any other suggestions before I do this, please let me know.

    Well I meant “removed” from the UI options. The removed state was showing before I touched any of the cables. That being said I did switch the cables afterward and the sda, sdb, sdc.. mapping’s seem to be different now, but 2 drives are always removed. When I switch the SATA cables, I was looking at which HDD serial numbers were mapped to which port, so it was unique.


    Is there anything I can do now?


    Can I delete the RAID Array and re-add the drives and still keep all the data on the drives?

    Haha. Gotta keep those kitten clear of the equipment!


    Thanks again for the help in troubleshooting! My biggest concern is that there were little to no warning flags in the OMV UI. I would expect if 1 (or multiple) drives were failing or not connected, that OMV would be screaming bloody murder. Instead I had to do a whole bunch of investigation and even then it really wasn’t that obvious what had occurred. The shares were working which was good but it would be nice to have some big red warnings next time. Any ideas if this is just some configuration I need to set? I’m still new to OMV.

    Just tested swapping the SATA cables on the SFF-8087 cable I have and confirmed that it seems to be a faulty cable.


    Before swap:


    PortDevStatus
    0/dev/sdbRemoved
    1/dev/sdcConnected
    2/dev/sddRemoved
    3/dev/sdeConnected



    After swap:


    PortDevStatus
    0/dev/sdeRemoved
    1/dev/sddConnected
    2/dev/sdcRemoved
    3/dev/sdbConnected



    Based on that alone, I think I can safely say it's a problem with the cable.


    EDIT: I'm also concerned about the SMART errors in the logs. Any thoughts on that?

    As I said in my last post, the controller is already in IT (Initiator Target) mode. I purchased it like that. Also, I was able to see all 4 disks in OMV when I initially setup the RAID array and I can still see all 4 disks in the Disks section of OMV.


    In the Detail popup of the RAID Management section, I see this. What does it mean when it says "Raid Devices: 4" and "Total Devices: 2", then below it says "Active Devices: 2" and "Working Devices: 2" but Failed Devices says 0.



    I also checked the SMART logs and I see a couple entries like this:

    Code
    ATA error count increased from 64475 to 64477

    and this one a different hdd:

    Code
    SMART Usage Attribute: 199 UDMA_CRC_Error_Count changed from 200 to 198

    I think the problem is I don't know what I SHOULD be seeing in OMV. What should Raid Devices, Total Devices, Active Devices and Working Devices show in a proper healthy environment? Should all the devices be listed in the RAID Management section. This is all new to me so I don't have a benchmark to say what is ok and what isn't. Why is everything seemingly working still? The only reason I noticed the "clean, degraded" state was because the read/write speeds were a tad slow and I did some digging, but other than a few not-obvious warning flags, the UI of OMV looks like business as usual. No big red warning saying 1 or more drives are failing or something is wrong. I had to dig into the details. This seems kinda backwards to me.

    could you be more specific about harware.


    Mobo; cpu; ram ....


    We can't guess and compare between a VM and something reel

    sorry, forgot that while adding all the other details.


    ESXi is running on a Dell PowerEdge T110 II with a Xeon E3-1230 V2 and 32GB of ECC RAM. The OMV VM has 2GB of RAM and and 4CPUs. The four 8TB HDDs are connected via a LSI 9211-8i HBA in IT mode and set to direct pass through to the VM.


    EDIT: What exactly does "degraded" mean in OMV?

    I just set up OMV in a VM in ESXi a couple weeks ago and the raid array state is showing "clean, degraded" now. I'm using 4 x 8TB WD80EFZX 5400RPM in RAID10. I'm not sure what happened for this to occur. I had just recently copied all the data to it a couple days ago. About 3TB of the 14.4TB is being used. Here's the output from the various commands requested from the pinned post. If you need any more information, please let me know. The SMART status of all the drives is showing green/good.


    cat /proc/mdstat

    Code
    Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
    md0 : active raid10 sde[3] sdc[1]
    15627790336 blocks super 1.2 512K chunks 2 near-copies [4/2] [_U_U]
    bitmap: 117/117 pages [468KB], 65536KB chunk


    blkid

    Code
    /dev/sda1: UUID="5fd7c9d7-d9b4-4c03-ba51-7017ae8018fa" TYPE="ext4" PARTUUID="9856ddaf-01"
    /dev/sda5: UUID="1671fa99-4244-4fab-9cad-975e63b1b012" TYPE="swap" PARTUUID="9856ddaf-05"
    /dev/sdc: UUID="d856f092-8499-7949-b3f9-705e26a12002" UUID_SUB="3e5f75fb-76e5-6ebb-bd0c-77473af3ad0f" LABEL="acmomv:acmraid" TYPE="linux_raid_member"
    /dev/sdb: UUID="d856f092-8499-7949-b3f9-705e26a12002" UUID_SUB="56d31a5b-4c6d-258d-b997-5e807d463250" LABEL="acmomv:acmraid" TYPE="linux_raid_member"
    /dev/sdd: UUID="d856f092-8499-7949-b3f9-705e26a12002" UUID_SUB="944b81c7-e25c-988d-971f-2d1f5a3cc058" LABEL="acmomv:acmraid" TYPE="linux_raid_member"
    /dev/md0: LABEL="acmraid" UUID="3301097f-2458-4e36-94e5-e633cd21dfcc" TYPE="ext4"
    /dev/sde: UUID="d856f092-8499-7949-b3f9-705e26a12002" UUID_SUB="6158fc62-6e62-d339-ecb2-ca05b94822b9" LABEL="acmomv:acmraid" TYPE="linux_raid_member"

    fdisk -l | grep "Disk "

    Code
    Disk /dev/sda: 8 GiB, 8589934592 bytes, 16777216 sectors
    Disk identifier: 0x9856ddaf
    Disk /dev/sdc: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
    Disk /dev/sdb: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
    Disk /dev/sdd: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
    Disk /dev/sde: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
    Disk /dev/md0: 14.6 TiB, 16002857304064 bytes, 31255580672 sectors

    cat /etc/mdadm/mdadm.conf

    mdadm --detail --scan --verbose

    Code
    ARRAY /dev/md0 level=raid10 num-devices=4 metadata=1.2 name=acmomv:acmraid UUID=d856f092:84997949:b3f9705e:26a12002
    devices=/dev/sdc,/dev/sde

    I noticed when I click Details in the RAID management section, I see this.. what does "removed" mean?:

    Code
    Number Major Minor RaidDevice State
    - 0 0 0 removed
    1 8 32 1 active sync set-B /dev/sdc
    - 0 0 2 removed
    3 8 64 3 active sync set-B /dev/sde


    I also notice that my read/write speeds are nothing to rave about. About 67 MB/sec write and 95 MB/sec read on average, but this being the first time I ever set up RAID, I don't know what to expect. Any help is appreciated.

    normal use of 1 HDD.

    So you're saying if I had 100 MB/sec write speeds on a local HDD, I should get 200 MB/sec write speeds in the RAID? That's not the case as you can see above. What kind of diagnostics can I do to investigate this?


    Also, when I was looking around the OMV UI, I noticed that in the RAID Management page, my raid is showing up with a "clean, degraded" state. What does that mean? I just set up this raid 2 weeks ago with brand new HDDs. What does degraded mean? I checked the smart values of the drives and they're all green/good.


    Here are some outputs:


    cat /proc/mdstat


    Code
    Personalities : [raid10] [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
    md0 : active raid10 sde[3] sdc[1]
    15627790336 blocks super 1.2 512K chunks 2 near-copies [4/2] [_U_U]
    bitmap: 117/117 pages [468KB], 65536KB chunk
    unused devices: <none>

    Anything else I can provide?

    I'm new to OMV and RAID in general. This is really my first setup. I setup OMV as a VM in ESXi. I connected four 8TB WD80EFZX 5400RPM HDDs in RAID10. What kind of read/write speed should I expect to be getting? I tried NAS performance tester 1.7 and I got an average write speed of 67 MB/sec and average read speed of 95 MB/sec. This seems a bit slow to me, but I don't really have anything to compare it to other than copying directly to a HDD. Help is appreciated.