So as of late my server (a Dell T1700) has started crashing approximately once a day so hard that it brings my entire network down until I reboot my server. When I look at my screen I see a variety of error messages but one that I see frequently and stands out to me is
watchdog BUG: soft lockup - CPU#3 stuck for 22s!
rcu INFO: rcu_sched detected stall on CPU
(see attachment)
My friend google wasn't much help. It sounds like there's a wide variety of things that could cause problems like this. I had a few ideas and was wondering if people could help me narrow it down or suggest something else:
Failing HDD
- Not likely as I recently replaced it and have scanned it several time with no issue.
Failing Ram
- Ran memtest86+ for over 24 hours with no issues
Failing MB/CPU
- Not sure but not likely and if it is the I'm more or less SOL.
Possible Failing PSU
- Some people online suggested this fixed the problem for them, others said it didn't make any difference. It probably would be good to get rid of the OEM power supply (which probably isn't great quality) but i had spending money when I don't have to.
Corrupted Kernel files
- Possible but the only solution would be a complete re install which would take a really long time (don't have a backup that I know is clean)
Anything else I haven't thought of?
Thanks!