Again system hangs

  • Hello,


    my start with a new NAS based on OMV was not really fine:
    http://phpbb.openmediavault.or…966a4f0bd09d558b435b583a3


    Before few month the reason was finally the CPU.


    Since a few days, i get exactly the same error again. The complete system freezes with screen colored points as in topic above. The system is then not reachable, not per WI, also not per SSH.
    On one disk i saw an SMART error count, so i decided to replace it.


    But why should the complete system freeze if a HDD will fail?
    My OS is placed on a 32GB SSD. The HDDs are 4x 2TB WD Red. Two Raid1 with each one have 2 HDDs.


    I could also see, that the Raid 1 was resyncing. The freeze came between 10 & 30%.


    Now with a new disk, i add it to the raid. But the freezes are still available.
    My question, could it be that the system freeze if a HDD is damaged?
    Should i look if the SSD has failures? Smart quick test are fine. Only one Raid HDD has failures.
    Could it be that again the CPU is damaged? This would be very suspicious.


    My MB like in topic above:
    http://www.amazon.de/dp/B00J0D…171_37038021_TE_M3T1_dp_1
    The CPU:
    http://www.amazon.de/gp/produc…age_o05_s00?ie=UTF8&psc=1


    Again so much time lost....


    Greets

  • Yes, your system can freeze because of hard disk errors. If the filesystem is damaged you can get a kernel error which definitly can freeze your system.


    Does your SSD support trim? Please post your syslog and kern.log from a time where your system froze. If you have a kernel error it will, in most cases but not allways, written into the logs.


    Greetings
    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • Hello,


    i hope so much, that its not again the CPU.
    I changed yesterday the defect disk. From this i have no further SMART errors. I removed all partitions on it and was then able to select it again for my raid1.
    The status is "clean,degraded, recovering (unknown)" Currently if i double click on this entry, is see 45%.


    BUT.
    During this recovering, yesterday and today, i also got freezes???


    Last today on 6:14 oclock.
    The autoshutdown skript is normally working well, if my computer is away, after 1 minute the NAS shut down himself. This wasnt happening i saw, but i had to go to work. I let it run.
    On 16.11 oclock i came back and saw as expected a frozen system.
    Only hard shutdown and startup. Until then, it works with recovering. Until yet casually without frozens???


    The logfiles you sayed prepared with the time from 6:14 until 16:11 appends on this tread.


    Thank you...

  • I can't see any noticable errors, besides the degraded RAID ones, but I'm not sooo familiar with RAID and mdadm.
    Maybe a newer kernel (3.2 for 0.5, 3.13 for 1.0.x) will help, if the HW is "brandnew"

  • Code
    Aug 13 06:21:53 sdev-nas-omv 
    mountd[1590]: authenticated unmount request from 192.168.178.11:720 for 
    /export/data (/export/data)


    Thats what I spotted... seems to be NFS and/or autoFS related.


    Greetings
    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • Hi,


    you think its the NFS connection to my main computer? How you can see that i use AutoFS?


    I use AutoFS since the first day with the same config, i will post now. All runs last 3 month well:


    Code
    /etc/auto.master
    
    
    /media/omv /etc/auto.omv --timeout=5 --ghost


    and


    Code
    /etc/auto.omv
    
    
    data -fstype=nfs,rw,soft,tcp,rsize=32768,wsize=32768 sdev-nas-omv:/export/data


    The system is in meantime ono 75%. Also as i come before an hour, it was again frozen.
    But as i went away, a file from AutoFS Mount was open.


    Now i try to get the second HDD completely synchron without any Mount.
    On your idea, this should go okay as i read.


    Greets

  • Hi,


    you think its the NFS connection to my main computer? How you can see that i use AutoFS?


    [...]


    Because if you google that error you will find that its about AutoFS in the first hit.


    Greetings
    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • I´m really desperated. I let the system run during night. Every device connecting to the NAS was turned off. No SMB, no NFS nothing else.
    I recognized one or two own restarts as in my last post. This i could hear on upbooting the hdds because he make a quata check if a restart comes from a hard shutdown or failure.


    Yes i have two things now, own restarts and the freezes. The same errors as on the last time before few month. There was the CPU damaged. This corresponds with the graphical failure on screen as in the linked posts above.


    In meantime, my raid1 is always not ready. Syncing is on 89%. Also now my second Raid needs a resync perhabs the hole failures and restarts. This sync is at 2%. If the system runs a few minutes, again the freeze.


    The Harddisk which had smart errors is now deinstallated. I will send it back.
    Think its also as last time the best way, i sent the CPU back. Last time i made few tests memtest and so on. All good. I also changed the mainboard without effekt. Only the CPU change was on last time successful.


    Is this hardware configuration not so recommendable? Has anyone same Mainboard and CPU as in first post running? This is the second time the CPU brake down during few weeks.

  • Maybe you should try to replace your PSU. Sounds like it could may be the reason for your errors.


    A faulty PSU can damage all of your system components.


    Greetings
    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • I checked again the PSU with my tester, without any fault.


    What you mean from this log:


    Three from my 4 WD reds have such logs and no self test log.


    It seems the freezes depends on the raids? if i remove on every raid1 on disk, over hours the system runs...

  • Okay now. I installed memtest86 and started the system with it.


    On 5% system hangs but without graphical fialure.
    First RAM Module removed the test runs a while, then system makes reboot. 4x
    First RAM inserted, Second removed. Same result. 3x


    Without only 1 Drive each RAID all runs well.


    In the meantime, every of my HDDs have one or more of the logs as above:

    Code
    Error 2 [1] occurred at disk power-on lifetime: 50 hours (2 days + 2 hours)
      When the command that caused the error occurred, the device was active or idle.


    Now i runned the system with only 1 HDD each RAID over hours. I plugged the other HDDs and rebuild the RAIDs... Same failure as in first post.


    I think it would be the best, waiting for the new CPU.
    But what about all the SMART logs? On every disk?
    It only says, the when a command came, the device was active or idle. Only on reading, this should not be a disk failure or??? Should i send back all 4 devices???


    What i have done that i´m so punished with this NAS???

  • You were getting errors, hangs and reboots with no hard drives connected???? Your hard drives will not work right if you have bad ram modules. You should have 0 errors when you run memtest. There is no point in connecting drives if you have errors.

  • Yes. Hangs and reboots also without harddisks. 2 4GB RAM Modules are installed.
    Both installed, Memtest Hangs every time on 5%.
    One of the 2 installed restarts.


    I don´t know what todo. Okay waiting for new cpu. I think the Memtest also the CPU runs on 100% based on google results?

  • The ram is bad or it is not for that motherboard. Everything else is probably ok. You cannot run your system til the ram runs stable. It will corrupt any data on the hard drives. Tell us your motherboard and the ram that your have. We can verify it is the right kind of ram for your motherboard. If it is the right kind you need to return the ram for new sticks.

  • Did you try to swap the RAM to another RAM Bank/another Channel?


    Greetings
    David

    "Well... lately this forum has become support for everything except omv" [...] "And is like someone is banning Google from their browsers"


    Only two things are infinite, the universe and human stupidity, and I'm not sure about the former.

    Upload Logfile via WebGUI/CLI
    #openmediavault on freenode IRC | German & English | GMT+1
    Absolutely no Support via PM!

  • Hello all,


    so today the newest facts of my NAS systems, which runs again without trouble.


    At frist my configuration:
    The mainboard
    The CPU
    and the RAM


    Yesterday a new CPU was coming, because this was the same failure before round about 3 month. The CPU was changed and no 10 minutes later the same error occurs.


    With memtest86 the test hangs on 5 %. On single RAM modules the test made own resets.
    But on memtest86+ the tests could run completely. (see memtest410.jpg in attachment)


    Also no errors was found, but very strange settings as in image:
    CAS 1-3-3-3 with DDR2??? On RAM specification this should be 9-9-9-24 and its a DDR3


    Okay. I changed the 22 4GB modules with 2x 4GB modules from my main computer with following RAMs:
    RAM from main Computer
    From specification side the RAMS should be identically.


    But with this RAM modules, the NAS works now correclty, without freezes and reboots.


    Memtest86+ shows the same settings as on the other Modules. In BIOS its all correctly setup. Okay but this works.


    I inserted the NAS RAM Modules in the main computer, so that i have there again 16GB RAM. The ASUS Mainbaord here shows me with a LED that RAM is imcompatible. With pressing the button as in manual, all is now good.


    So the result of this failure is really, that the RAM was not compatible with the mainboard.


    Hope this thread helps any other one. I spent many hours to solve this problem.
    Hope also the NAS runs now longer than 3 month.


    My final question, what i should do with all 4 HDDs? (2TB WD Red).
    Every have one or more SMART error logs

    Code
    Error 2 [1] occurred at disk power-on lifetime: 50 hours (2 days + 2 hours)
      When the command that caused the error occurred, the device was active or idle.


    Should i change them? Or are these entries in fact of the instable system an RAM modules? Can i remove this entries from logs?


    Greets

Jetzt mitmachen!

Sie haben noch kein Benutzerkonto auf unserer Seite? Registrieren Sie sich kostenlos und nehmen Sie an unserer Community teil!