OMV does not boot when all RAID drives are connected

  • I can not put in new commands nor see any progress

    The there is a serious problem somewhere as you should be back at this root@nas-Jonathan:~# so that you run cat /proc/mdstat


    Do you have a backup of your data?

    Raid is not a backup! Would you go skydiving without a parachute?

  • Do you have a backup of your data?

    Sadly I am not back at root@nas-Jonathan:~# :(

    My last backup is about a year old which is not too bad for me, but it would definitely be nicer if I could get the RAID back.

    Do you think there is no realistic chance of getting it back?

  • Do you think there is no realistic chance of getting it back

    Something is wrong, what, I don't know, the --readwrite option should put you back to the command prompt and that is done instantaneously after executing it.


    The only other option at this stage is to download and create then boot with a systemrescuecd the issue is either hardware related i.e. your m'board, sata cables, power supply, or the drives themselves. The systemrescuecd isolates OMV from the equation with that you could test each of drives using the smartmontools.


    The alternative is you start from again, disconnect the drives, reinstall OMV, connect each drive one at a time and wipe it, if you connect all the drives together the probability is it will detect the raid signatures on the drives.


    TBH I don't know what else to suggest, it's as if the machine and/or OMV is frozen.

    Raid is not a backup! Would you go skydiving without a parachute?

  • I have checked all drives with smartctl and did not find any errors.

    I then found the following link: https://unix.stackexchange.com…shed-linux-md-raid5-array

    I connected only 3 out of 4 RAID drives and used the following command

    Code
    mdadm --create /dev/md127 --assume-clean -l5 -n4 -c512 /dev/sd[bcd] missing

    The RAID now shows up in OMV RAID management, but I can still not access any files nor can I mount the RAID in the OMV gui.

    RAID details in OMV:

    I would like to either try to repair the RAID or pull a backup and then wipe the drives and build a new RAID.

  • I'm sorry but I am somewhat at a loss as to why you have reposted back here after five days, I suggested you use a systemrescuecd to run commands from the cli in an attempt to ascertain what might be wrong. After suggesting that I was waiting for feedback, to proceed, but as I said this could be hardware related.


    I have checked all drives with smartctl and did not find any errors

    How, where, long, short, do you know what specific errors you're looking for

    I then found the following link: https://unix.stackexchange.com…shed-linux-md-raid5-array

    I connected only 3 out of 4 RAID drives and used the following command

    I used this once on here with a user, this is a last resort option as it can lose the data on the array, in that case it didn't work and he had to start again anyway.


    Going back to #1 you removed one of the drives and the system boots, this was an error on my part as this thread should have stopped there, the error you posted in there suggests there is a hardware problem. The use of the systemrescuecd along with the m'boards BIOS may have been enough to locate the issue.

    Removing any drive from an mdadm causes the array to go inactive when it's booted, because it is software mdadm knows a drive is missing, it knows what drive, but it doesn't know why.

    By 'pulling' a drive from your system you had no idea if that was the drive causing the system not to boot, it could be one of the other three, it could be another problem with the m'board.


    If you want confirmation regarding the error you posted in #1 have a look here

    Raid is not a backup! Would you go skydiving without a parachute?

    Edited once, last by geaves ().

  • I'm sorry but I am somewhat at a loss as to why you have reposted back here after five days, I suggested you use a systemrescuecd to run commands from the cli in an attempt to ascertain what might be wrong. After suggesting that I was waiting for feedback, to proceed, but as I said this could be hardware related.

    I did setup the systemrescuecd, but was not sure which commands I should run.

    Because of that I used the following command on all drives via OMV-cli, which took me some days as each execution took about 6 hours.

    Code
    smartctl -s on -t long /dev/sdX

    All of these tests were "Completed without error".


    Quote

    If you want confirmation regarding the error you posted in #1 have a look here

    I have checked with the Bios and all drives look normal (size of 3tb shown)

    smartctl -H /dev/sdX test results for the 3 currently connected drives are "PASSED".


    If it is of any help I can post the results of smartctl --atributes -H /dev/sdX as well.


    My only reason why I have not given up yet is the fact that I can actually see the RAID in the OMV gui again, where it has not shown up for weeks.


    If you would tell me that using the mdadm --create command I have ruined my array anyways, I will just go ahead and do this:

    Quote

    The alternative is you start from again, disconnect the drives, reinstall OMV, connect each drive one at a time and wipe it, if you connect all the drives together the probability is it will detect the raid signatures on the drives

  • Raid is not my thing and I'm coming in the middle so;

    If it is of any help I can post the results of smartctl --atributes -H /dev/sdX as well.

    The attributes to look at, for each spinning drive, are:


    SMART 5 – Reallocated_Sector_Count.

    SMART 187 – Reported_Uncorrectable_Errors.

    SMART 188 – Command_Timeout.

    SMART 197 – Current_Pending_Sector_Count.

    SMART 198 – Offline_Uncorrectable.


    The following is usually related to cabling or interfaces, but it's worth mentions becuase an array may kick a drive out with CRC errors.

    SMART 199 - UltraDMA CRC errors
    __________________________________________________________________________

    If you would tell me that using the mdadm --create command I have ruined my array anyways, I will just go ahead and do this:

    I don't understand this. Have you ran mdadm --create ?


    **Edit:** Also to confirm - you stay you've tested each drive in the array, with a SMART LONG drive test, correct? smartctl -s on -t long /dev/sd?

  • My only reason why I have not given up yet is the fact that I can actually see the RAID in the OMV gui again, where it has not shown up for weeks.

    There is an option you could try before going down the complete testing route, which will take time.


    Recreate/rebuild OMV on a USB flash drive or small hard drive, disconnect the data drives, clean install, update, shutdown, then connect all the data drives, the new install should pick up the array from the drive signatures. If that failed then you're back to testing and the systemrescuecd.

    Raid is not a backup! Would you go skydiving without a parachute?

  • My results when executing : smartctl --attribues /dev/sdX (SMART 187 and 188 were missing from the table)


    Have you ran mdadm --create ?

    I have ran mdadm --create /dev/md1 --assume-clean -l5 -n4 -c512 /dev/sd[bcd] missing with only three out of four drives plugged in.

    Also to confirm - you stay you've tested each drive in the array, with a SMART LONG drive test, correct? smartctl -s on -t long /dev/sd?

    That is correct. All drives were tested and these tests completed without error.

    There is an option you could try before going down the complete testing route, which will take time.

    Recreate/rebuild OMV on a USB flash drive or small hard drive, disconnect the data drives, clean install, update, shutdown, then connect all the data drives, the new install should pick up the array from the drive signatures. If that failed then you're back to testing and the systemrescuecd.

    I have reinstalled OMV the way you described. OMV-gui looks the same way it did before. It recognizes my newly created RAID as shown in #25, but I can not access any data. I could only setup a new file system on that RAID, which, as I understand, would delete all my data as well.


    I have set up a systemrescuecd usb drive, but I am not sure what to do nor if it is worth trying after all the tests I have already conducted. If you see a chance of still getting it to work that way, I will try it, but I am pretty close to giving up and reinstalling OMV once again, wiping all disks, setting up a new RAID and just losing the data since my backup.

  • Based on what I'm seeing /dev/sdd is probably the one being kicked out of the array. And this is speculation on my part but,,,

    It looks like you may have an actual hardware problem. SMART 199 is, typically, associated with a bad SATA/SAS cable, intermittent contact at hard drive SATA connections at either end of, or a hardware interface issue (one of the drive's SATA interface card) or the motherboard's drive controller.


    While more pronounced on one drive (/dev/sdd) you have CRC's on ALL drives which is something I haven't seen before. You could try replacing all cables. That's the cheapest route - they're relatively low cost on E-bay.

    However, statistical probability suggests that all cables and all interface boards would not go bad at the same time. (Unless there was a power surge or a lightening strike.) The only thing ALL drives have in common is the drive controller. If the controller is integrated into the motherboard, that's not a good state of affairs....
    ________________________________________________


    t recognizes my newly created RAID as shown in #25, but I can not access any data.

    I'm going to assume geaves walked you through mounting the array on the command line. With the array mounted, can you navigate to the array, into it's file structure? (If you're not comfortable with navigating on the Linux command line, Midnight Command or WinSCP might be helpful.)

  • I'm going to assume geaves walked you through mounting the array on the command line

    Nope, I've actually given up on this, why;


    1) mdadm --create -> bad idea and should only ever be used as a last resort on the basis it can cause data loss

    2) reinstall on new hardware and connect all data drives did that happen -> no, we're still using the same drives from the mdadm --create


    could try mount -a which will probably fail or mount /dev/md127 /srv this will most likely throw an fs error

    Raid is not a backup! Would you go skydiving without a parachute?

  • could try mount -a which will probably fail or mount /dev/md127 /srv this will most likely throw an fs error

    jonni

    If the array can't be mounted on the command line, in my opinion, it's over. You could try a Live Distro like Knoppix. It wouldn't hurt to try, but I doubt that you'll be able to pull anything off of the array after running the "create" command. Even if you could pull something off of the array, you'd need a destination drive/array to store data as it spools out.

    With CRC's on all drives, I'd wonder how long that's being going on. Was it a recent event or over a longer period of time? If you had E-mail reports set up, as recommended in the User Guide, you'd have been aware of the problem as it started to develope. To avoid getting in a situation like this again, you might entertain 100% backup, perhaps even setting up an SBC or an old workstation as a Backup Server. (Your call.)


    I'll get out of this thread (and out of geaves business and go back to minding mine) but, on a departing note, I'd look really hard at your motherboard before trusting it with another (data filled) array.

    Sorry that I don't have more to offer.

  • I have decided to give up and did the following.

    The alternative is you start from again, disconnect the drives, reinstall OMV, connect each drive one at a time and wipe it, if you connect all the drives together the probability is it will detect the raid signatures on the drives.

    I have also set up notifications as recommended by you:

    With CRC's on all drives, I'd wonder how long that's being going on. Was it a recent event or over a longer period of time? If you had E-mail reports set up, as recommended in the User Guide, you'd have been aware of the problem as it started to develope.

    I hope that I will see errors early if they occure in the future.

    Thank you guys for your help and time.

  • geaves

    Added the Label resolved

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!