Hi all,
This is my first post here. I have tried looking through the forum but haven't fount a solution to my issue.
I have been using OMV since June having switched from Unraid and it has been great. However, over the last few weeks I have been having intermittent disconnect issues.
The drives in my MergerFS pool will randomly unmount which causes the pool to unmount. This has happened a few times over the last 3 weeks, but it seems to happen more frequently. Last night it happened, and then I restarted OMV and then everything worked again, but then 12 hours later it happened again.
I am pretty new to this so don't really know how to trouble shoot this.
I have a Dell LSI 9240-8i with 7 HDD's powered by a standalone PSU and 3 SSD's connected directly to the motherboard.
Six drives are in the MergerFS pool and 1 is used for parity using SnapRAID.
I'm not sure what information I should provide for trouble shooting, so please let me know if there is anything else I can add.
I have attached the logs as there is a 10k character limit to paste here, but I have included snippets below.
This is the syslog where the issue starts.
Nov 17 00:43:23 node kernel: [114222.808999] sd 0:0:0:0: device_block, handle(0x0009)
Nov 17 00:43:23 node kernel: [114222.809057] sd 0:0:1:0: device_block, handle(0x000a)
Nov 17 00:43:23 node kernel: [114222.809110] sd 0:0:2:0: device_block, handle(0x000b)
Nov 17 00:43:23 node kernel: [114222.809168] sd 0:0:5:0: device_block, handle(0x000e)
Nov 17 00:43:23 node kernel: [114222.809234] sd 0:0:6:0: device_block, handle(0x000f)
Nov 17 00:43:25 node kernel: [114225.058990] sd 0:0:0:0: device_unblock and setting to running, handle(0x0009)
Nov 17 00:43:25 node kernel: [114225.059040] sd 0:0:1:0: device_unblock and setting to running, handle(0x000a)
Nov 17 00:43:25 node kernel: [114225.059064] sd 0:0:2:0: device_unblock and setting to running, handle(0x000b)
Nov 17 00:43:25 node kernel: [114225.059109] sd 0:0:5:0: device_unblock and setting to running, handle(0x000e)
Nov 17 00:43:25 node kernel: [114225.059147] sd 0:0:6:0: device_unblock and setting to running, handle(0x000f)
Nov 17 00:43:25 node kernel: [114225.063625] device offline error, dev sdb, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
Nov 17 00:43:25 node systemd[1]: Unmounting MergerFS mount for Node...
Nov 17 00:43:25 node systemd[1]: srv-mergerfs-Node.mount: Succeeded.
Nov 17 00:43:25 node systemd[1]: srv-mergerfs-Node.mount: Unit process 1376 (mergerfs) remains running after unit stopped.
Nov 17 00:43:25 node systemd[1]: Unmounted MergerFS mount for Node.
Nov 17 00:43:25 node systemd[1]: srv-mergerfs-Node.mount: Consumed 26min 35.173s CPU time.
Nov 17 00:43:25 node systemd[1]: Unmounting /srv/dev-disk-by-uuid-49a8b411-4885-4edc-a278-806a252bca98...
Nov 17 00:43:25 node kernel: [114225.137471] sd 0:0:0:0: [sdb] Synchronizing SCSI cache
Nov 17 00:43:25 node kernel: [114225.137678] sd 0:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 17 00:43:25 node kernel: [114225.138355] mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221100000000)
Nov 17 00:43:25 node kernel: [114225.138363] mpt2sas_cm0: removing handle(0x0009), sas_addr(0x4433221100000000)
Nov 17 00:43:25 node kernel: [114225.138368] mpt2sas_cm0: enclosure logical id(0x590b11c028a4c100), slot(3)
Nov 17 00:43:25 node kernel: [114225.139507] device offline error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
Nov 17 00:43:25 node kernel: [114225.162103] Aborting journal on device sdb1-8.
Nov 17 00:43:25 node kernel: [114225.162142] Buffer I/O error on dev sdb1, logical block 976781312, lost sync page write
Nov 17 00:43:25 node kernel: [114225.162175] JBD2: Error -5 detected when updating journal superblock for sdb1-8.
Nov 17 00:43:25 node kernel: [114225.162895] EXT4-fs error (device sdb1): ext4_journal_check_start:83: comm umount: Detected aborted journal
Nov 17 00:43:25 node kernel: [114225.162947] EXT4-fs (sdb1): Remounting filesystem read-only
Nov 17 00:43:25 node systemd[1]: srv-dev\x2ddisk\x2dby\x2duuid\x2d49a8b411\x2d4885\x2d4edc\x2da278\x2d806a252bca98.mount: Succeeded.
Nov 17 00:43:25 node systemd[1]: Unmounted /srv/dev-disk-by-uuid-49a8b411-4885-4edc-a278-806a252bca98.
Nov 17 00:43:25 node systemd[1]: systemd-fsck@dev-disk-by\x2duuid-49a8b411\x2d4885\x2d4edc\x2da278\x2d806a252bca98.service: Succeeded.
Nov 17 00:43:25 node systemd[1]: Stopped File System Check on /dev/disk/by-uuid/49a8b411-4885-4edc-a278-806a252bca98.
Nov 17 00:43:26 node kernel: [114225.666205] device offline error, dev sda, sector 5607819520 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Nov 17 00:43:26 node kernel: [114225.666250] EXT4-fs error (device sda1): __ext4_find_entry:1635: inode #87621633: comm mergerfs: reading directory lblock 0
Nov 17 00:43:26 node kernel: [114225.666383] device offline error, dev sda, sector 5607819520 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
Nov 17 00:43:26 node kernel: [114225.666408] EXT4-fs error (device sda1): __ext4_find_entry:1635: inode #87621633: comm mergerfs: reading directory lblock 0
Nov 17 00:43:27 node systemd[1]: Unmounting /srv/dev-disk-by-uuid-a9a3995c-2ada-4376-8f5e-918434d665c8...
Nov 17 00:43:27 node systemd[1]: srv-dev\x2ddisk\x2dby\x2duuid\x2da9a3995c\x2d2ada\x2d4376\x2d8f5e\x2d918434d665c8.mount: Mount process exited, code=exited, sta
Display More
And then here is the same syslog for when the unmount happened 12 hours later.
Nov 17 12:36:28 node kernel: [34772.059525] sd 0:0:0:0: device_block, handle(0x0009)
Nov 17 12:36:28 node kernel: [34772.059592] sd 0:0:1:0: device_block, handle(0x000a)
Nov 17 12:36:28 node kernel: [34772.059641] sd 0:0:2:0: device_block, handle(0x000b)
Nov 17 12:36:28 node kernel: [34772.059699] sd 0:0:4:0: device_block, handle(0x000d)
Nov 17 12:36:28 node kernel: [34772.059757] sd 0:0:5:0: device_block, handle(0x000e)
Nov 17 12:36:30 node kernel: [34773.809518] sd 0:0:0:0: device_unblock and setting to running, handle(0x0009)
Nov 17 12:36:30 node kernel: [34773.809547] sd 0:0:1:0: device_unblock and setting to running, handle(0x000a)
Nov 17 12:36:30 node kernel: [34773.809604] sd 0:0:2:0: device_unblock and setting to running, handle(0x000b)
Nov 17 12:36:30 node kernel: [34773.809653] sd 0:0:4:0: device_unblock and setting to running, handle(0x000d)
Nov 17 12:36:30 node kernel: [34773.809676] sd 0:0:5:0: device_unblock and setting to running, handle(0x000e)
Nov 17 12:36:30 node kernel: [34773.813916] device offline error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
Nov 17 12:36:30 node systemd[1]: Unmounting MergerFS mount for Node...
Nov 17 12:36:30 node systemd[1]: srv-mergerfs-Node.mount: Succeeded.
Nov 17 12:36:30 node systemd[1]: srv-mergerfs-Node.mount: Unit process 1375 (mergerfs) remains running after unit stopped.
Nov 17 12:36:30 node systemd[1]: Unmounted MergerFS mount for Node.
Nov 17 12:36:30 node systemd[1]: srv-mergerfs-Node.mount: Consumed 2min 35.497s CPU time.
Nov 17 12:36:30 node systemd[1]: Unmounting /srv/dev-disk-by-uuid-49a8b411-4885-4edc-a278-806a252bca98...
Nov 17 12:36:30 node kernel: [34773.898335] sd 0:0:0:0: [sda] Synchronizing SCSI cache
Nov 17 12:36:30 node kernel: [34773.898550] sd 0:0:0:0: [sda] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Nov 17 12:36:30 node kernel: [34773.899793] mpt2sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221100000000)
Nov 17 12:36:30 node kernel: [34773.899804] mpt2sas_cm0: removing handle(0x0009), sas_addr(0x4433221100000000)
Nov 17 12:36:30 node kernel: [34773.899811] mpt2sas_cm0: enclosure logical id(0x590b11c028a4c100), slot(3)
Nov 17 12:36:30 node kernel: [34773.901435] device offline error, dev sdb, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
Nov 17 12:36:30 node kernel: [34773.911020] Aborting journal on device sda1-8.
Nov 17 12:36:30 node kernel: [34773.911085] Buffer I/O error on dev sda1, logical block 976781312, lost sync page write
Nov 17 12:36:30 node kernel: [34773.911157] JBD2: Error -5 detected when updating journal superblock for sda1-8.
Nov 17 12:36:30 node kernel: [34773.912078] EXT4-fs error (device sda1): ext4_journal_check_start:83: comm umount: Detected aborted journal
Nov 17 12:36:30 node kernel: [34773.912154] EXT4-fs (sda1): Remounting filesystem read-only
Nov 17 12:36:30 node systemd[1]: srv-dev\x2ddisk\x2dby\x2duuid\x2d49a8b411\x2d4885\x2d4edc\x2da278\x2d806a252bca98.mount: Succeeded.
Nov 17 12:36:30 node systemd[1]: Unmounted /srv/dev-disk-by-uuid-49a8b411-4885-4edc-a278-806a252bca98.
Nov 17 12:36:30 node systemd[1]: systemd-fsck@dev-disk-by\x2duuid-49a8b411\x2d4885\x2d4edc\x2da278\x2d806a252bca98.service: Succeeded.
Nov 17 12:36:30 node systemd[1]: Stopped File System Check on /dev/disk/by-uuid/49a8b411-4885-4edc-a278-806a252bca98.
Nov 17 12:36:31 node systemd[1]: Unmounting /srv/dev-disk-by-uuid-a9a3995c-2ada-4376-8f5e-918434d665c8...
Nov 17 12:36:31 node kernel: [34774.420455] EXT4-fs error (device sdb1): ext4_get_inode_loc:4576: inode #14: block 1989: comm umount: unable to read itable block
Display More
Here is some additional info.