kernel rrdtool segfault error every 15min
-
- OMV 4.x
- Shinobi
-
-
I would make sure your os disk (is it flashmemory?) isn't failing and check your ram.
-
Thanks. NVME ssd (and the whole system) is quite new and I actually had RAM problems at the very beginning and resolved them when I changed it.
The NVME is a Corsair Force MP510. Any ideas how I could check it?
-
-
Any ideas how I could check it?
SMART. And I would check the ram again. Are you running the flashmemory plugin? nvme or not, it still has limited writes.
-
SMART. And I would check the ram again. Are you running the flashmemory plugin? nvme or not, it still has limited writes.
I cannot activate SMART for the NVME drive. Not sure why!? Yes, I'm running the flashmemory plugin. Would it make sense to disable it temporarily for testing? Can I just enable it afterwards again?
-
Read NVMe SMART/Health Information failed: NVMe Status 0x2002
-
-
Would it make sense to disable it temporarily for testing?
Definitely not. If smart isn't working, you may have a tougher time diagnosing this.
-
it works in the shell with this command
smartctl -x /dev/nvme0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSEDSMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 49 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 2%
Data Units Read: 1,599,392 [818 GB]
Data Units Written: 2,948,543 [1.50 TB]
Host Read Commands: 17,379,756
Host Write Commands: 24,023,608
Controller Busy Time: 119
Power Cycles: 96
Power On Hours: 3,522
Unsafe Shutdowns: 62
Media and Data Integrity Errors: 0
Error Information Log Entries: 176
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0Error Information (NVMe Log 0x01, max 63 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 176 0 0x0015 0x4004 0x004 0 1 -
1 175 0 0x0015 0x4004 0x004 0 1 -
2 174 0 0x001d 0x4004 0x004 0 1 -
3 173 0 0x001d 0x4004 0x004 0 1 -
4 172 0 0x0004 0x4004 0x004 0 1 -
5 171 0 0x0004 0x4004 0x004 0 1 -
6 170 0 0x001d 0x4004 0x004 0 1 -
7 169 0 0x001d 0x4004 0x004 0 1 -
8 168 0 0x001c 0x4004 0x004 0 1 -
9 167 0 0x001d 0x4004 0x004 0 1 -
10 166 0 0x0015 0x4004 0x004 0 1 -
11 165 0 0x001d 0x4004 0x004 0 1 -
12 164 0 0x001c 0x4004 0x004 0 1 -
13 163 0 0x0015 0x4004 0x004 0 1 -
14 162 0 0x001c 0x4004 0x004 0 1 -
15 161 0 0x001c 0x4004 0x004 0 1 -
... (47 entries not shown) -
Maybe the nvme drive is getting hot (especially when the system is under load?) and doing bad things? Otherwise, I would go back to the ram check.
-
-
the system is never really under load... it's waaaayyyy too overpowered, but i had most of the components laying around. maybe i will get some tools and check the ram. is there something i can install and test within OMV?
-
is there something i can install and test within OMV?
Not that can really do as good as a job as memtest+.
-
it works in the shell with this command
smartctl -x /dev/nvme0
This specific SMART issue is because smartmontools is not updated in OMV5. The bug has been solved by smartmontools team, but the updated version is not showing up. Found it out just a while ago.
NVMe SMART status reading failure (wrong drive name) -
-
what about reinstalling rrdtools service? i would like to try it, but only if it doesn't interfere with OMV. can it be done easily?
Participate now!
Don’t have an account yet? Register yourself now and be a part of our community!