Hi Forum,
today I encoutered that my NAS wasn't reachable after approx. an hour after I started it and copied a file onto it.
Well, while I was writing this post and collecting all logs I found the error:
kern.log.1
[...]
Dec 29 04:02:29 chap kernel: [ 91.155474] ip6_tables: (C) 2000-2006 Netfilter Core Team
Dec 29 04:06:02 chap kernel: [ 304.067797] EXT4-fs (sda1): error count: 1
Dec 29 04:06:02 chap kernel: [ 304.067802] EXT4-fs (sda1): initial error at 1385238319: ext4_lookup:1050: inode 1587518
Dec 29 04:06:02 chap kernel: [ 304.067807] EXT4-fs (sda1): last error at 1385238319: ext4_lookup:1050: inode 1587518
kern.log
[...]
Dec 30 03:56:49 chap kernel: [ 304.071969] EXT4-fs (sda1): error count: 1
Dec 30 03:56:49 chap kernel: [ 304.071976] EXT4-fs (sda1): initial error at 1385238319: ext4_lookup:1050: inode 1587518
Dec 30 03:56:49 chap kernel: [ 304.071982] EXT4-fs (sda1): last error at 1385238319: ext4_lookup:1050: inode 1587518
root@chap:/var/log# blkid
/dev/sdb1: LABEL="RAID5_9TB" UUID="d3c21c66-af4b-41d4-b098-462e83fa641d" TYPE="xfs"
/dev/sda1: UUID="1eed4e13-6904-4af5-9b0d-8d01093f9c38" TYPE="ext4"
/dev/sda3: UUID="d0c9a972-fcd3-4980-929b-2d1bfda52076" TYPE="xfs"
/dev/sda5: UUID="2b60b699-b2e1-49f1-b9f0-b1f38300ba75" TYPE="swap"
/dev/sdc1: LABEL="Sammy1" UUID="ca425484-1be3-47f1-b7bb-8f0785c9ea5b" TYPE="xfs"
root@chap:/var/log# df -h
Dateisystem Size Used Avail Use% Eingehängt auf
/dev/sda1 29G 7,0G 21G 26% /
tmpfs 2,0G 20K 2,0G 1% /lib/init/rw
udev 1,9G 208K 1,9G 1% /dev
tmpfs 2,0G 4,0K 2,0G 1% /dev/shm
/dev/sdb1 11T 8,2T 2,8T 75% /media/d3c21c66-af4b-41d4-b098-462e83fa641d
/dev/sda3 429G 5,6G 424G 2% /media/d0c9a972-fcd3-4980-929b-2d1bfda52076
/dev/sdc1 1,4T 471G 926G 34% /media/ca425484-1be3-47f1-b7bb-8f0785c9ea5b
root@chap:/var/log#
Alles anzeigen
So obviously my Filesystem on my OS drive is in fact damaged. Alltough smart doesn't show something that would say that sectors got replaced
root@chap:/var/log# smartctl /dev/sda -a
smartctl 5.40 2010-07-12 r3124 [x86_64-unknown-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint F3 EG series
Device Model: SAMSUNG HD503HI
Serial Number: S23CJ9DZ602665
Firmware Version: 1AJ10001
User Capacity: 500.107.862.016 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 6
Local Time is: Mon Dec 30 05:17:29 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (6000) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 100) minutes.
SCT capabilities: (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 051 Pre-fail Always - 0
2 Throughput_Performance 0x0026 252 252 000 Old_age Always - 0
3 Spin_Up_Time 0x0023 085 084 025 Pre-fail Always - 4742
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 875
5 Reallocated_Sector_Ct 0x0033 252 252 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 252 252 051 Old_age Always - 0
8 Seek_Time_Performance 0x0024 252 252 015 Old_age Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 6200
10 Spin_Retry_Count 0x0032 252 252 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 1
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 890
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 8251
192 Power-Off_Retract_Count 0x0022 252 252 000 Old_age Always - 0
194 Temperature_Celsius 0x0002 064 047 000 Old_age Always - 32 (Lifetime Min/Max 8/53)
195 Hardware_ECC_Recovered 0x003a 100 100 000 Old_age Always - 0
196 Reallocated_Event_Count 0x0032 252 252 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 252 252 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 252 252 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0036 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 62
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 1
225 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 909
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
Note: selective self-test log revision number (0) not 1 implies that no selective self-test has ever been run
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Completed [00% left] (0-65535)
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Alles anzeigen
Things that come up:
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 1
191 G-Sense_Error_Rate 0x0022 100 100 000 Old_age Always - 8251
200 Multi_Zone_Error_Rate 0x002a 100 100 000 Old_age Always - 62
Google told me that all these errors are nothing to scare about as long as the value isn't gooing towards a meeting with the threshold.
What would you check in this type of error?
Greetings
David
---- you can ignore the stuff beneath here ----
So what I did was I started my NAS at '03:53:03'. I copied a file via smb over at approximately '03:59'. I then left my NAS alone and wanted to go to bed and watch the file at 4:35 where my Pi couldn't reach my NAS. Even after attaching usb keyboard and my tv screen no console came on the screen so I resetted the machine.
Judging from my temperature script for my NAS drives my NAS went off between '04:00:02' and '04:03:02' because '00' is the last entry written in the log.
Last messages in the syslog, nothing I would care about. It shows me that the System was running at least until '04:01:25' nothing more after that until the hard reboot at '04:42:47'
Dec 30 04:00:01 chap /USR/SBIN/CRON[4395]: (root) CMD (/var/lib/openmediavault/cron.d/userdefined-311f8b5e-9be4-4dfa-b4ea-a1bc50d2e37f >/dev/null 2>&1)
Dec 30 04:00:01 chap /USR/SBIN/CRON[4396]: (root) CMD (test -x /usr/sbin/cron-apt && /usr/sbin/cron-apt)
Dec 30 04:00:01 chap /USR/SBIN/CRON[4397]: (root) CMD (/usr/sbin/omv-mkgraph >/dev/null 2>&1)
Dec 30 04:00:24 chap dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5
Dec 30 04:00:29 chap dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5
Dec 30 04:00:34 chap dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 9
Dec 30 04:00:43 chap dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 11
Dec 30 04:00:54 chap dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 19
Dec 30 04:01:13 chap dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 12
Dec 30 04:01:25 chap dhclient: No DHCPOFFERS received.
Dec 30 04:01:25 chap dhclient: No working leases in persistent database - sleeping.
Dec 30 04:42:47 chap kernel: imklog 4.6.4, log source = /proc/kmsg started.
Alles anzeigen
Full Syslog on pastebin: http://pastebin.com/BGaAcMjL
I think this already occured one or two days ago, but I'm not sure about that. Because I don't see anyhting in the syslog I doubt that it is a kernel panic or something similiar because it seems the system freezes 100% without beeing able to log anyhting after that point(-of-no-return).
Any Ideas where I should look for errors? Samba cores do not have any logs. The samba log for my desktop computer is empty.
log.smbd
[2013/12/30 03:53:04, 0] smbd/server.c:1123(main)
smbd version 3.5.6 started.
Copyright Andrew Tridgell and the Samba Team 1992-2010
[2013/12/30 04:42:48, 0] smbd/server.c:1123(main)
smbd version 3.5.6 started.
Copyright Andrew Tridgell and the Samba Team 1992-2010
log.nmbd