Hi All,
My current setup is an R510 with 6x 4TB SAS/SATA HDDs and 2x2TB SAS/SATA HDD.OMV 4 is running from a 128gb SSD.
2x4TB HDD are devoted to SnapRAID Parity, with 2x2TB and 4x4TB as Data drives.Total available space in the snapraid datastore is combined with UnionFS into a single datastore.4 x SMB shares live on the Datastore.
Tonight i noticed an odd sound, visually checked the server and noticed a single HDD light was flashing repeatedly when there should have been little activtiy. I went to the GUI and it would not enter the Disks, Smart, or filesystem tabs, attempting this caused a communication failure.
Syslog shows this give or take the first line, repeated over and over:
Feb 3 23:17:38 openmediavault omv-engined[28250]: Failed to write to socket: Broken pipe
Feb 3 23:17:39 openmediavault kernel: [733118.908897] mpt2sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
Feb 3 23:17:40 openmediavault kernel: [733120.158433] mpt2sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
Feb 3 23:17:50 openmediavault monit[909]: 'openmediavault' loadavg(5min) of 9.9 matches resource limit [loadavg(5min)>8.0]
Feb 3 23:18:21 openmediavault monit[909]: 'openmediavault' loadavg(5min) of 9.0 matches resource limit [loadavg(5min)>8.0]
Feb 3 23:18:51 openmediavault monit[909]: 'openmediavault' loadavg(5min) of 8.1 matches resource limit [loadavg(5min)>8.0]
Feb 3 23:18:56 openmediavault kernel: [733195.909012] mpt2sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
Feb 3 23:18:57 openmediavault kernel: [733197.158512] mpt2sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
Feb 3 23:18:59 openmediavault kernel: [733198.408473] mpt2sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
Feb 3 23:19:00 openmediavault kernel: [733199.658488] mpt2sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
Feb 3 23:19:01 openmediavault kernel: [733200.908946] mpt2sas_cm0: log_info(0x31111000): originator(PL), code(0x11), sub_code(0x1000)
Feb 3 23:19:02 openmediavault kernel: [733202.158502] sd 0:0:7:0: [sdh] tag#1904 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 3 23:19:02 openmediavault kernel: [733202.158508] sd 0:0:7:0: [sdh] tag#1904 CDB: Read(16) 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
Feb 3 23:19:02 openmediavault kernel: [733202.158511] print_req_error: I/O error, dev sdh, sector 7814036992
Feb 3 23:19:02 openmediavault kernel: [733202.160207] Buffer I/O error on dev sdh1, logical block 976754368, async page read
As the system was begining to get unresponsive i opted to try and free the GUI from what i read online may be a communication failure as a command had been sent prior to the drive possibly failing, and OMV timiing out waiting for the response?... I rebooted from the GUI.
After restart and minor coronry! the GUI was working again but i'm still unable to load SMART or disks, but i was able to load snapraid enough to check/edit the name of the disks in the each of the data and parity sections, and it appears that its one of the two 4TB drives used for parity thats died/dying.
As such the USB backup that runs each time the disk is connected (so started on reboot) doesn't worry me as much as i initially thought (was super worried it would sync lots of corrupted changes)
I've been stupid and added files over the last week and not synced in 8 days (that i know of, i don't know how to check if something is scheduled as things seem to happen on the server that i can't find GUI schedules for!!)
My Questions are, how to i confirm whats wrong with the unresponsive disk (its a Dell Constellation 4TB SAS) as i can't run a smart check, i assume its completely toast?
If above is correct should i shutdown ASAP and remove the drive so the system doesn't thrash?
My tiny brain is looking at http://www.snapraid.it/manual and isn't sure how to proceed replacing the drive from the OMV GUI, is that possible?
Thank you for any assistance in advance, sorry for the long post.
** edit - can't access Disk or SMART tabs - Error:
Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; blockdev --getsize64 '/dev/sdg' 2>&1' with exit code '1': blockdev: cannot open /dev/sdg: No such device or address
Error #0:
OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; blockdev --getsize64 '/dev/sdg' 2>&1' with exit code '1': blockdev: cannot open /dev/sdg: No such device or address in /usr/share/php/openmediavault/system/process.inc:182
Stack trace:
#0 /usr/share/php/openmediavault/system/blockdevice.inc(258): OMV\System\Process->execute(Array)
#1 /usr/share/openmediavault/engined/rpc/smart.inc(96): OMV\System\BlockDevice->getSize()
#2 /usr/share/php/openmediavault/rpc/serviceabstract.inc(612): OMVRpcServiceSmart->{closure}()
#3 /usr/share/openmediavault/engined/rpc/smart.inc(104): OMV\Rpc\ServiceAbstract->asyncProc(Object(Closure))
#4 [internal function]: OMVRpcServiceSmart->enumerateDevices(NULL, Array)
#5 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#6 /usr/share/openmediavault/engined/rpc/smart.inc(171): OMV\Rpc\ServiceAbstract->callMethod('enumerateDevice...', NULL, Array)
#7 [internal function]: OMVRpcServiceSmart->getList(Array, Array)
#8 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#9 /usr/share/php/openmediavault/rpc/serviceabstract.inc(149): OMV\Rpc\ServiceAbstract->callMethod('getList', Array, Array)
#10 /usr/share/php/openmediavault/rpc/serviceabstract.inc(565): OMV\Rpc\ServiceAbstract->OMV\Rpc\{closure}('/tmp/bgstatusAE...', '/tmp/bgoutput5h...')
#11 /usr/share/php/openmediavault/rpc/serviceabstract.inc(159): OMV\Rpc\ServiceAbstract->execBgProc(Object(Closure))
#12 /usr/share/openmediavault/engined/rpc/smart.inc(206): OMV\Rpc\ServiceAbstract->callMethodBg('getList', Array, Array)
#13 [internal function]: OMVRpcServiceSmart->getListBg(Array, Array)
#14 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#15 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('getListBg', Array, Array)
#16 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('Smart', 'getListBg', Array, Array, 1)
#17 {main}