RAID5 with unexplained error

    • OMV 3.x

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • RAID5 with unexplained error

      Hello,

      I have RAID 5 with 3 3TB drives.

      I keep receiving the message below although I run fsck and it fixes things but the message keeps appearing.
      Is there a way to understand the cause for it and to fix it?

      Please let me know if there is additional data that I need to provide

      Source Code

      1. Nov 7 12:23:45 openmediavault kernel: [310678.492008] EXT4-fs (md127): error count since last fsck: 4
      2. Nov 7 12:23:45 openmediavault kernel: [310678.492010] EXT4-fs (md127): initial error at time 1509773902: ext4_iget:4641: inode 98893575: block 11
      3. Nov 7 12:23:45 openmediavault kernel: [310678.492013] EXT4-fs (md127): last error at time 1510032951: ext4_iget:4641: inode 98893575: block 11
    • Hi,
      I would appreciate support reading the S.M.A.R.T report for my 3 HDD.

      Thank you
      Images
      • Disk 1.jpg

        18.55 kB, 480×271, viewed 19 times
      • Disk 1S.jpg

        66.52 kB, 612×500, viewed 25 times
      • Disk 2.jpg

        12.39 kB, 347×187, viewed 19 times
      • Disk 2S.jpg

        67.84 kB, 600×507, viewed 25 times
      • Disk 3.jpg

        13.08 kB, 324×194, viewed 22 times
      • Disk 3S.jpg

        67.67 kB, 600×512, viewed 21 times
    • I am not a SMART expert, but to me the reports look fine. Just the load cycle count is quite high which is probably a result of agressive APM settings. Your drives are around 300k. For my drives 600k are specified. I had rapidly increasing load cycle counts in the past. I solved it by disabling hdparm and using hd-idle instead.

      Concerning your first post I found this with google. May be it helps you.

      Hopefully somebody else can give you a better response.
      BananaPi - armbian - OMV4.x | Asrock Q1900DC-ITX - 16GB - 2x Seagate ST3000VN000 - 1x Intenso SSD 120GB - OMV3.x 64bit
    • RAID5 with unexplained error

      macom wrote:

      I am not a SMART expert, but to me the reports look fine. Just the load cycle count is quite high which is probably a result of agressive APM settings. Your drives are around 300k. For my drives 600k are specified. I had rapidly increasing load cycle counts in the past. I solved it by disabling hdparm and using hd-idle instead.

      Concerning your first post I found this with google. May be it helps you.

      Hopefully somebody else can give you a better response.


      Thank you for the quick reply. I will r ad through the link you sent and will also be glad to receive additional feedbacks on the SMART results.

      I want to know if you can better explain the APM parameter you mentioned. See attached my current configuration.


      Sent from my iPhone using Tapatalk
    • APM1 is the most agressive one. Maximum power saving, but this results in many load cycle counts I would try if with APM127 the number of load cycle counts per operation hour decreases.

      You can check the datasheet of your drive regarding the specified number of load cycle counts. Probably you have reached around 50% of the specified value.
      BananaPi - armbian - OMV4.x | Asrock Q1900DC-ITX - 16GB - 2x Seagate ST3000VN000 - 1x Intenso SSD 120GB - OMV3.x 64bit
    • For reference, if a drive is heading toward failure, the following SMART stat's start incrementing:

      • SMART 5 – Reallocated_Sector_Count.
      • SMART 187 – Reported_Uncorrectable_Errors.
      • SMART 188 – Command_Timeout.
      • SMART 197 – Current_Pending_Sector_Count.
      • SMART 198 – Offline_Uncorrectable.
      Where the remainder of the stat categories are concerned, some of the counts and their meaning (raw values) can vary between drive OEM's.

      If you want to get a complete picture of each drive:

      On the command line, do:
      apt-get install curl

      After the curl install finishes:
      (In the line below replace "?" with the letter for your disk)
      for disk in /dev/sd? ; do smartctl -x $disk ; done | curl -F 'sprunge=<-' sprunge.us

      The above line returns a URL. Copy and past the URL into the address bar of a Web browser.
      ___________________________________________________________________________

      You could copy and post the URL's generated from the above, into this thread, but I'm guessing that your drives have some age on them. Note that RAID is not kind to the remaining drives in the array, if a failed drive is replaced. "Resilvering" a new drive can cause another drive failure, during the process. Two failures and it's over.

      If you don't have a backup of the data stored in your array, I'd encourage you to give it some thought.
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.90 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.81 Erasmus - Rsync'ed Backup Server
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119
    • Your drives appear to be OK, if a bit on the warm side. The drives are a about a year old and it appears you leave your server on most of the time. The seek error rate can be related to platter thermal expansion. (Cooling, then heating up.) As macom noted, using "spin down" for power savings might be hard on your drives. They're designed to be NAS drives (on 24x7) so take his recommendation for APM127.

      So, from your code box you're using mdadm RAID and Ext4. (There's another forum member who is having a similar but slightly different problem. He's getting file system errors and the array is resyncing once in awhile.)
      ________________________
      (The command line below was provided to me and others by one of the forum moderators, tkaiser.)

      The place to start is in your log files.
      You can narrow down the vast amount of info in your ALL of your syslog files in var/log, in one shot, using the following on the CLI.


      Source Code

      1. zgrep -A2 -B2 -E "EXT4-fs|(md127)" /var/log/syslog* |more >/ext4logs.txt
      Between the quotes are search items (pulled from your code block above), each separated by a pipe, with a 2nd pipe directing output to a file that's deposited at the root. ext4logs.txt

      (After you look over your results:)
      You could modify the same command line to search within the output file, and narrow the search farther, using additional terms of interest. (The term "warning" may have no significance - it's used only as an example. You'll need to review and make an appropriate choice.)

      Source Code

      1. zgrep -A2 -B2 -E "warning" /ext4logs.txt |more >/ext4logs-2.txt
      The idea in the above is to look at error entries and look for associated events around the time frame of the events you're interested in. Things to look for - patterns - can it be isolated to a specific drive, is it always the same block, is there some event that triggers an error, etc.
      _____________________________________________________________________________

      As far as a real fix goes, can I talk you into using ZFS? :)

      OMV has a ZFS plugin, for the Web GUI, that will allow you to create a raidz1 array which is the functional equivalent of RAID5, but with a lot of extra benefits to include check sums for "self healing" files.

      If you have backup, to put your data back onto a newly created ZFS array, give it some thought.

      Happy Hunting!
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.90 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.81 Erasmus - Rsync'ed Backup Server
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119

      The post was edited 1 time, last by flmaxey: e ().

    • danieliod wrote:

      Is there a way to understand the cause for it and to fix it?
      Sure, but that would require getting a bit into storage details and especially if you don't have backup and only play RAID (RAID5 in 2017 -- insane) you can't sleep well later (do you really want this?)

      Anyway: there's s problem with one specific inode (reported daily) so first step would be to search for the file affected. Check 'df' output for the mountpoint of your md device (eg. /srv/foo/bar) and then

      Source Code

      1. find /srv/foo/bar -inum 98893575
      If you have a backup simply try to delete the file, restore it from backup and see what's happening. If you don't have a backup you're doing something seriously wrong and can't be supported anyway.
      'OMV problems' with XU4 and Cloudshell 2? Nope, read this first. 'OMV problems' with Cloudshell 1? Nope, just Ohm's law or queue size.
    • RAID5 with unexplained error

      Hello,

      Thank you very much for the inputs, I really appreciate it.
      As I understand from the responses I need to upgrade my File System.

      I would really much like to do that and would appreciate your guidance.
      How do I know what is the best FS for me?

      Regarding the current issue, I have received the following:

      Source Code

      1. /srv/dev-disk-by-label-RAID/lost+found/#98893575': Structure needs cleaning
      Can I just delete it?
      I also got his message during boot, tried to run the find command on these and got the same output for structure needs cleaning.

      The post was edited 1 time, last by danieliod ().

    • danieliod wrote:

      Hi,

      I'm ready with all the needed backups to my data.
      Please advise what are the next steps to improve my File System.

      Thank you
      If you have backup,, you're way ahead of the game. It's amazing how many users don't bother,, until it's too late. (I hope you'll maintain a regularly schedule backup going forward.) While ZFS is good, it is NOT backup.

      The safest bet of the advanced file systems available, at this point in time, is a ZFS mirror or the rough equivalent of RAID1. (That's just my opinion but there are others who agreed.) To get an understanding of ZFS, here's a good primer that provides an overview ->ZFS.) On the other hand, with good backup that you know can be restored (read - tested), raidz1 is fine. (radiz1 is the rough equivalent to RAID5). Read the provided link and think it over.

      With the disks you have (3x3TB) you can have a 6TB raidz1 array, or a 3TB ZFS mirror with a hot spare. Setting up either option can be done in OMV's WEB GUI. I can walk you through the process.

      To do this cleanly, rebuilding OMV from scratch might be the best way to go. (It doesn't take that long.) Or, if you want to keep your current build, realize that the process will mean deleting Samba Shares, Base shares, deleting file systems, etc., etc. I can give you a note or two on that as well.

      Again, give it some thought..
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.90 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.81 Erasmus - Rsync'ed Backup Server
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119

      The post was edited 1 time, last by flmaxey: e ().

    • Hi,

      Thank you for all the details, I will need to take some time and learn it carefully but my preference is 6TB raidz1 array
      I have just recently installed OMV from Scratch due to the upgrade from OMV 2 to 3.

      On the meanwhile, I would appreciate an input regarding the current FS issue I have mentioned above, is that safe to delete that INODE in the lost+found?
    • danieliod wrote:


      On the meanwhile, I would appreciate an input regarding the current FS issue I have mentioned above, is that safe to delete that INODE in the lost+found?
      I haven't done it before. I'd have to refer you to tkaisers post on this above. It all comes back to the backup. You have options, and can take a risk, with a current backup because the file can be replaced/repaired if there is an ill effect.

      On the other hand, if you plan to upgrade to ZFS, I'd endure the error messages if they're not critical. (And they don't appear to be.)
      Good backup takes the "drama" out of computing
      ____________________________________
      OMV 3.0.90 Erasmus
      ThinkServer TS140, 12GB ECC / 32GB USB3.0
      4TB SG+4TB TS ZFS mirror/ 3TB TS

      OMV 3.0.81 Erasmus - Rsync'ed Backup Server
      R-PI 2 $29 / 16GB SD Card $8 / Real Time Clock $1.86
      4TB WD My Passport $119
    • danieliod wrote:

      /srv/dev-disk-by-label-RAID/lost+found/#98893575
      It's already where it belongs to, fsck sent it to this directory and somewhere else this data is missing (please search the web for »what is 'lost+found'«). So maybe the above syslog message is just telling you that there is a file you should have an eye on (no idea, I don't use ext4 for valuable data any more since years even if it can be considered one of the most robust filesystems around)
      'OMV problems' with XU4 and Cloudshell 2? Nope, read this first. 'OMV problems' with Cloudshell 1? Nope, just Ohm's law or queue size.