repair snapraid array with error after syncing

chente · 12. Januar 2021

Hi.

First sorry for my english and thanks for the help.

After syncing a snapraid array it gives me the follwing error.

Code

Error #0:
OMV\ExecException: Failed to execute command 'export PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin; export LANG=C.UTF-8; export SHELL=/bin/sh; sudo --shell --non-interactive --user='root' -- /var/lib/openmediavault/cron.d/userdefined-825cbd03-2094-4867-8200-21b6663130f6 2>&1' with exit code '1': Self test...
Loading state from /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Scanning disk D1...
Scanning disk D2...
Scanning disk D3...
Scanning disk D4...
Using 785 MiB of memory for the file-system.
Initializing...
Resizing...
Saving state to /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Saving state to /srv/dev-disk-by-uuid-5500857a-1cf2-420b-ad39-8edf801effa5/snapraid.content...
Saving state to /srv/dev-disk-by-uuid-62118653-0966-42ec-a0e9-ffe855f6baec/snapraid.content...
Saving state to /srv/dev-disk-by-uuid-0345ab3d-f539-4df9-871a-03673490fcb4/snapraid.content...
Verifying /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Verifying /srv/dev-disk-by-uuid-5500857a-1cf2-420b-ad39-8edf801effa5/snapraid.content...
Verifying /srv/dev-disk-by-uuid-62118653-0966-42ec-a0e9-ffe855f6baec/snapraid.content...
Verifying /srv/dev-disk-by-uuid-0345ab3d-f539-4df9-871a-03673490fcb4/snapraid.content...
Verified /srv/dev-disk-by-uuid-0345ab3d-f539-4df9-871a-03673490fcb4/snapraid.content in 6 seconds
Verified /srv/dev-disk-by-uuid-62118653-0966-42ec-a0e9-ffe855f6baec/snapraid.content in 7 seconds
Verified /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content in 8 seconds
Verified /srv/dev-disk-by-uuid-5500857a-1cf2-420b-ad39-8edf801effa5/snapraid.content in 9 seconds
Syncing...
Using 40 MiB of memory for 32 cached blocks.
0%, 1 MB
0%, 45 MB
0%, 581 MB
0%, 1117 MB
0%, 1655 MB
0%, 2188 MB, 535 MB/s, 511 block/s, CPU 19%, 4:18 ETA
0%, 2654 MB, 521 MB/s, 497 block/s, CPU 19%, 4:25 ETA
0%, 3168 MB, 520 MB/s, 496 block/s, CPU 19%, 4:25 ETA
0%, 3646 MB, 514 MB/s, 491 block/s, CPU 19%, 4:28 ETA
0%, 4180 MB, 516 MB/s, 493 block/s, CPU 19%, 4:27 ETA
0%, 4716 MB, 516 MB/s, 493 block/s, CPU 19%, 4:27 ETA
0%, 5197 MB, 512 MB/s, 489 block/s, CPU 18%, 4:29 ETA
0%, 5742 MB, 516 MB/s, 492 block/s, CPU 18%, 4:27 ETA

...

99%, 7867352 MB, 313 MB/s, 419 block/s, CPU 14%, 0:00 ETA
99%, 7867583 MB, 312 MB/s, 419 block/s, CPU 14%, 0:00 ETA
99%, 7867825 MB, 312 MB/s, 419 block/s, CPU 14%, 0:00 ETA
99%, 7868082 MB, 311 MB/s, 419 block/s, CPU 14%, 0:00 ETA
99%, 7868322 MB, 310 MB/s, 419 block/s, CPU 14%, 0:00 ETA
99%, 7868579 MB, 309 MB/s, 419 block/s, CPU 14%, 0:00 ETA
99%, 7868824 MB, 309 MB/s, 419 block/s, CPU 14%, 0:00 ETA
99%, 7868887 MB, 300 MB/s, 408 block/s, CPU 17%, 0:00 ETA
100% completed, 7868900 MB accessed in 5:29

     D1  8% | *****
     D2 48% | *****************************
     D3  3% | *
     D4  1% |
 parity 23% | **************
   raid  4% | **
   hash  9% | ******
  sched  1% |
   misc  0% |
            |______________________________________________________________
                           wait time (total, less is better)


       0 file errors
       0 io errors
       1 data errors
DANGER! Unexpected data errors! The failing blocks are now marked as bad!
Use 'snapraid status' to list the bad blocks.
Use 'snapraid -e fix' to recover.
Saving state to /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Saving state to /srv/dev-disk-by-uuid-5500857a-1cf2-420b-ad39-8edf801effa5/snapraid.content...
Saving state to /srv/dev-disk-by-uuid-62118653-0966-42ec-a0e9-ffe855f6baec/snapraid.content...
Saving state to /srv/dev-disk-by-uuid-0345ab3d-f539-4df9-871a-03673490fcb4/snapraid.content...
Verifying /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Verifying /srv/dev-disk-by-uuid-5500857a-1cf2-420b-ad39-8edf801effa5/snapraid.content...
Verifying /srv/dev-disk-by-uuid-62118653-0966-42ec-a0e9-ffe855f6baec/snapraid.content...
Verifying /srv/dev-disk-by-uuid-0345ab3d-f539-4df9-871a-03673490fcb4/snapraid.content...
Verified /srv/dev-disk-by-uuid-0345ab3d-f539-4df9-871a-03673490fcb4/snapraid.content in 6 seconds
Verified /srv/dev-disk-by-uuid-62118653-0966-42ec-a0e9-ffe855f6baec/snapraid.content in 14 seconds
Verified /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content in 17 seconds
Verified /srv/dev-disk-by-uuid-5500857a-1cf2-420b-ad39-8edf801effa5/snapraid.content in 19 seconds in /usr/share/openmediavault/engined/rpc/cron.inc:185
Stack trace:
#0 /usr/share/php/openmediavault/rpc/serviceabstract.inc(588): Engined\Rpc\Cron->Engined\Rpc\{closure}('/tmp/bgstatusbq...', '/tmp/bgoutputFL...')
#1 /usr/share/openmediavault/engined/rpc/cron.inc(189): OMV\Rpc\ServiceAbstract->execBgProc(Object(Closure))
#2 [internal function]: Engined\Rpc\Cron->execute(Array, Array)
#3 /usr/share/php/openmediavault/rpc/serviceabstract.inc(123): call_user_func_array(Array, Array)
#4 /usr/share/php/openmediavault/rpc/rpc.inc(86): OMV\Rpc\ServiceAbstract->callMethod('execute', Array, Array)
#5 /usr/sbin/omv-engined(537): OMV\Rpc\Rpc::call('Cron', 'execute', Array, Array, 1)
#6 {main}

Alles anzeigen

after that I have executed "snapraid status" and the reading has been

Code

Self test...
Loading state from /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Using 737 MiB of memory for the file-system.
SnapRAID status report:

   Files Fragmented Excess  Wasted  Used    Free  Use Name
            Files  Fragments  GB      GB      GB
   27053      16    2862     4.8    4161     793  83% D1
   18443      20    1046       -    1649     317  83% D2
   32658      22    4388       -    2070    1865  52% D3
   28506       0       0       -    2072    1863  52% D4
 --------------------------------------------------------------------------
  106660      58    8296     4.8    9953    4840  67%


 49%|                                                                   o  
    |                                                                   o  
    |                                                                   o  
    |                                                                   o  
    |                                                                   o  
    |o                                                                  o  
    |o                                                                  o  
 24%|o                                                                  o  
    |o                                                                  o  
    |o                         o                                        o  
    |o                         o                                        o  
    |o                         o                                        o  
    |o                         o                                        o  
    |o                         o                                        o  
  0%|o_________________________o__________o_____________________________o_o
     6                    days ago of the last scrub/sync                 0

The oldest block was scrubbed 6 days ago, the median 3, the newest 0.

WARNING! The array is NOT fully synced.
You have a sync in progress at 99%.
The 87% of the array is not scrubbed.
You have 1 files with zero sub-second timestamp.
Run the 'touch' command to set it to a not zero value.
No rehash is in progress or needed.
DANGER! In the array there are 1 errors!

They are from block 5918679 to 5918679, specifically at blocks: 5918679

To fix them use the command 'snapraid -e fix'.
The errors will disappear from the 'status' at the next 'scrub' command.

Alles anzeigen

I have looked for information in the forum and I thought that the solution was to apply "snapraid --filter-error fix" whit the following result

Code

Self test...
Loading state from /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Searching disk D1...
Searching disk D2...
Searching disk D3...
Searching disk D4...
Filtering...
Using 763 MiB of memory for the file-system.
Initializing...
Fixing...
100% completed, 2 MB accessed in 0:00

       2 errors
       0 recovered errors
       1 UNRECOVERABLE errors
DANGER! There are unrecoverable errors!

Alles anzeigen

Now i'm stuck and i dare not do anything else, i appreciate any help.

Before all this i have changed a 2TB disk for a 4TB one (D3) and added another 4TB (D4). After these operations the sync command was correct. Later ihave copied files from one disk to another to empty the full ones and sync again gave me the error. I don't know if this is relevant.

chente · 12. Januar 2021

It would also be helpful to know if this query is not appropriate in this forum and in that case to know where I can go.

Thanks again and sorry for the inconvenience.

geaves · 13. Januar 2021

I'm surprised no one's answered, but I don't use this any more, I just went with the suggestion and run snapraid -e fix whenever I had an error warning, one time it came back with 66 but I never had any unrecoverable errors

chente · 13. Januar 2021

Thank you very much for answering, I am also surprised that they did not answer me, I thought they had ostracized me for asking such basic questions. I'm new to OMV and I still don't dare to touch things without understanding them well.

I obviously already used that command, although I hadn't said it, I'll give you the result, I still have the error.

Code

Self test...
Loading state from /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Searching disk D1...
Searching disk D2...
Searching disk D3...
Searching disk D4...
Filtering...
Using 763 MiB of memory for the file-system.
Initializing...
Fixing...
100% completed, 2 MB accessed in 0:00

       2 errors
       0 recovered errors
       1 UNRECOVERABLE errors
DANGER! There are unrecoverable errors!

Alles anzeigen

I want to give OMV a chance, I really like it but I find it difficult to understand the information available to handle it. I have managed to install OMV and get it working without bothering anyone but I don't know how to solve this I get the impression that this forum is for users much more advanced than me, if so tell me and I will look for help elsewhere although it is difficult to find. Thanks also.

geaves · 13. Januar 2021

The first question then is why have you chosen to use Snapraid? for me the choice was because 95% of my files are media which is the idea of Snapraid.

There are a few on here that use it but not many and most would refer to the Snapraid manual, I can't remember if the reported errors came after a scrub or sync, whatever it was after running fix I would then run the scrub or sync.

If the manual doesn't help then try a search there are a few hits in relation to unrecoverable errors and seem to point to the content file.

chente · 13. Januar 2021

Zitat

The first question then is why have you chosen to use Snapraid? for me the choice was because 95% of my files are media which is the idea of Snapraid.

Indeed I use snapraid because on this server I only have multimedia files, with little or no movement.

Zitat

There are a few on here that use it but not many and most would refer to the Snapraid manual, I can't remember if the reported errors came after a scrub or sync, whatever it was after running fix I would then run the scrub or sync.

If the manual doesn't help then try a search there are a few hits in relation to unrecoverable errors and seem to point to the content file.

Reported errors are after a sync.

The system has entered a loop from which I don't know how to get out. After running "snapraid -e fix" it tells me that there is an unrecoverable error. I run "snapraid scrub", the error follows, and it says to run "snapraid status" and then "snapraid -e fix" and then "snapraid -p bad scrub". After the process the error follows and takes me back to the beginning.

geaves · 13. Januar 2021

Well you're using Snapraid for what it was meant to be used, as to the error you've done exactly as instructed in the manual, it's not in a loop as such it's just reporting it has found an error that's unrecoverable How do you stop it or remove it I don't know, from recollection there is tab in the plugin to exclude, perhaps adding uncoverable in the correct format will prevent it from reporting.

chente · 13. Januar 2021

I assume you mean a tab that says "fix silent". My version of OMV is installed in Spanish. I cannot locate that command in the Snapraid Manual and I ignore the result. Do I apply it? Will it solve the problem or hide it? Is the problem in the snapraid sync or is it my data and my hard drives?

geaves · 13. Januar 2021

Zitat von chente

Will it solve the problem or hide it

I think it will hide it, but I used this for about 12 months primarily because I had mismatched drive sizes, I've since replaced two of my drives, finally completed a clean install of V5 but I've deployed zfs, only because I have used it before in another OS.

Zitat von chente

Is the problem in the snapraid sync or is it my data and my hard drives?

That's what I've been trying to find out, but without success, I'm thinking/wondering if the parity drive was removed, wiped, formatted then added back would it resolve the error.

The next question is do you have a backup of the data, as this is something that should not be overlooked, with me everything got backed up once every two weeks, except for any docker containers and configs which are on a separate drive.

chente · 13. Januar 2021

Zitat von geaves

I'm thinking/wondering if the parity drive was removed, wiped, formatted then added back would it resolve the error.

no, would it help?

Zitat von geaves

The next question is do you have a backup of the data

Unfortunately not now, use the backup disks you used to mount this server. I am waiting for a 12TB disk that I will use for backup. I should have bought that record before, I know, don't remind me ... Personal photos are endorsed, the rest are not.

"fix silent" applied, keeps giving error...

Code

Self test...
Loading state from /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Searching disk D1...
Searching disk D2...
Searching disk D3...
Searching disk D4...
Filtering...
Using 763 MiB of memory for the file-system.
Initializing...
Fixing...
100%, 1 MB          
100% completed, 2 MB accessed in 0:00    

       2 errors
       0 recovered errors
       1 UNRECOVERABLE errors

Alles anzeigen

and after "status"

Code

Self test...
Loading state from /srv/dev-disk-by-uuid-d198490f-27bf-4e04-803b-1ac5bfb7549f/snapraid.content...
Using 737 MiB of memory for the file-system.
SnapRAID status report:

   Files Fragmented Excess  Wasted  Used    Free  Use Name
            Files  Fragments  GB      GB      GB
   27053      16    2862     4.8    4161     793  83% D1
   18443      20    1046       -    1649     317  83% D2
   32658      22    4388       -    2070    1865  52% D3
   28506       0       0       -    2072    1863  52% D4
 --------------------------------------------------------------------------
  106660      58    8296     4.8    9953    4840  67%


 49%|                                                                   o  
    |                                                                   o  
    |                                                                   o  
    |                                                                   o  
    |                                                                   o  
    |o                                                                  o  
    |o                                                                  o  
 24%|o                                                                  o  
    |o                                                                  o  
    |o                         o                                        o  
    |o                         o                                        o  
    |o                         o                                        o  
    |o                         o                                        o  
    |o                         o                                        o  
  0%|o_________________________o__________o_____________________________o_o
     8                    days ago of the last scrub/sync                 1

The oldest block was scrubbed 8 days ago, the median 4, the newest 1.

WARNING! The array is NOT fully synced.
You have a sync in progress at 99%.
The 87% of the array is not scrubbed.
No file has a zero sub-second timestamp.
No rehash is in progress or needed.
DANGER! In the array there are 1 errors!

They are from block 5918679 to 5918679, specifically at blocks: 5918679

To fix them use the command 'snapraid -e fix'.
The errors will disappear from the 'status' at the next 'scrub' command.

Alles anzeigen

We are like at the beginning...

geaves · 13. Januar 2021

We are, I've contacted someone and will await a reply.

chente · 13. Januar 2021

Thanks for your time : pulgar arriba:: pulgar arriba:

macom · 13. Januar 2021

crashtest · 13. Januar 2021

Have you checked the smart stat's on your drives? If there are hardware errors involved, generally speaking, that might be why you can't run a sync. Under Storage, SMART, enable SMART. Then run some device tests. In your case, off-line LONG tests might be in order.

Some details on SMART are in the current User Guide, under Hard Drive Health and SMART.

chente · 13. Januar 2021

Zitat von crashtest

Have you checked the smart stat's on your drives? If there are hardware errors involved, generally speaking, that might be why you can't run a sync. Under Storage, SMART, enable SMART. Then run some device tests. In your case, off-line LONG tests might be in order.

Some details on SMART are in the current User Guide, under Hard Drive Health and SMART.

I have the short test scheduled weekly. I do not see errors in the results. I think that now it would not be wise to do a long test because I have no backup, I will wait for the hard disk that I bought to arrive.

I already downloaded that guide a month ago and it was very good for me to configure OMV and try to understand how it works.

Thanks for your interest.

crashtest · 14. Januar 2021

The latest Guide, in the last week, has some very minor updates and was reorganized a bit. (I've been going through it.)

Zitat von chente

I think that now it would not be wise to do a long test because I have no backup, I will wait for the hard disk that I bought to arrive.

That makes sense to me. A long test does a surface scan which might detect a bad sector, if there some present, but are as yet undetected. But a long test in an exercise for a drive.

Have you looked at these SMART stat's? Any counts?

SMART 5 – Reallocated_Sector_Count

SMART 187 – Reported_Uncorrectable_Errors

SMART 188 – Command_Timeout

SMART 197 – Current_Pending_Sector_Count

SMART 198 – Offline_Uncorrectable

Are you using the UnionFS plugin?

chente · 14. Januar 2021

Zitat von crashtest

The latest Guide, in the last week, has some very minor updates and was reorganized a bit. (I've been going through it.)

Ok, I'll check the new version in case it helps, thanks.

Zitat von crashtest

Have you looked at these SMART stat's? Any counts?

SMART 5 – Reallocated_Sector_Count
SMART 187 – Reported_Uncorrectable_Errors
SMART 188 – Command_Timeout
SMART 197 – Current_Pending_Sector_Count
SMART 198 – Offline_Uncorrectable

Alles anzeigen

All disks report zero but there are 3 disks that do not report values 187 188

Zitat von crashtest

Are you using the UnionFS plugin?

Yes, my disk configuration is:

1 parity - 5TB

1 data - 5TB

2 data - 2TB

3 data - 4TB

4 data - 4TB

Data disks 1 to 4 joined with unionFS

crashtest · 14. Januar 2021

Zitat von chente

Ok, I'll check the new version in case it helps, thanks.

Not to worry about that. There are no functional differences - just a bit more readable. Just get a new copy for ref.

Zitat von chente

All disks report zero but there are 3 disks that do not report values 187 188

That's good news but the stat's many change with Long tests.

There's nothing to worry about with stat's that are not available. Stat 5 and 197 are a couple of the most worrisome.

_________________________________________________________________________

I chatted with geaves about your situation. There's no obvious reason for this fault and I've never seen exactly what you're dealing with.
(I finally have a marginal 2TB hard drive to throw in SNAPRAID array, for some testing, but results may take months.)

While waiting on your backup drive, you might consider copying and pasting some of your posts, above, into the SNAPRAID forum. They, on the SNAPRAID forum, have a lot of cumulative experience with odd errors in a SNAPRAID array.

geaves · 14. Januar 2021

Zitat von crashtest

There's no obvious reason for this fault and I've never seen exactly what you're dealing with.

It's the unrecoverable error, I can't work out if this is related to a physical drive or it's an error in relation to the parity, if you do a search there is very little out there.

That's why an option I suggested was to remove the parity drive, wipe it, format it and add it back, but that approach may be prudent after a backup of the data

crashtest · 14. Januar 2021

chente , I forgot to ask, what disk format are you using? EXT4?

____________________________________________________________________________________

Zitat von geaves

I can't work out if this is related to a physical drive or it's an error in relation to the parity

I agree that there's no way, that I know of, to make that determination. But we both know that SMART stat's can lag, somewhat, behind the actual physical condition of the disk. A Long test (after backup) would be the best way to dig out a potential drive problem.

Zitat von geaves

That's why an option I suggested was to remove the parity drive, wipe it, format it and add it back, but that approach may be prudent after a backup of the data

I agree that wiping the parity drive is an action that could be tried, after he backs up.
__________________________________________________________________________

While others may see it differently, I see the health and age of the Parity drive as being the most important to a SNAPRAID array. A Parity drive with an odd issue has the potential create a lot of issues and problems in recovery. For those reasons, the Parity drive should be the newest in the array with zero health issues.

On the other hand, with a good backup in hand, a parity drive problem is one of the easier issues to look at. Wipe it, format it (EXT4), run a long test to be sure of health, and run a sync.

repair snapraid array with error after syncing

Jetzt mitmachen!

Tags