Beiträge von calvin940

calvin940 · 10. Oktober 2017

I am now back up and running. I am using OMV 4.0.5 and remade my raid 6 using md and the 6x8TB drives using the pure JBOD 8 port sata card with no shenanigans.

Things are running pretty smoothly. Must say I am enjoying OMV on top of stretch and re-installing all my stuff was a breeze.

Thanks again everyone for all your input/help. Very much appreciated.

calvin940 · 3. Oktober 2017

So, after recovery efforts completed, I managed to lose only about 1TB worth of data from pretty much a full capacity 16TB array. Pretty much any recovery was miraculous but getting all that back probably meant I wasn't a douchebag in my previous life

So, here I am , OMV4.0.5 and wanting to create my MD RAID 6 , 6x8TB raid array.

Can someone please point me to resources on how I should configure this before I start? specifically I am looking for advice on any controller settings I should be changing / tweaking or drive settings I should be making (write-through, write-back , etc). When I created my other raid, I am not sure I paid attention to any of that stuff so I want to do this understanding more about the process and settings. Also, can I make those changes after the raid is built or must I do these things before hand?

Also, last time I used the OMV interface to make the raid and then do the volume configuration in that same interface. Does it make sense to continue with that approach this time around?

Thanks again to all of you for all the help you have provided thus far. It was invaluable and I am back on track. In the end, I don't mind this "refresh" as I get to use new versions of software.

calvin940 · 22. September 2017

Definitely the plan to do the fsck after first pass. I am shocked it worked as well. I'll take what I can get and then go from there.

I moved to software raid specifically so that I wouldn't be tied into one vendor's proprietary hardware. I moved completely away from a QNAP box which is mostly proprietary to a Linux system and software raid so basically all the equipment can be replaced with newer/non specific hardware to rebuild a linux box and get access to my raid again. I feel I got screwed over again by Adaptec's proprietary nature with that geometry change. Had I used a straight 8 port sata controller (like the supermicro I just bought) I would have been laughing. Everything I had done was proper and worked perfectly. I would not have been in this situation if it wasn't for that Adaptec raid card controller not giving me a JBOD without putting a bunch of shit in there to make it their own.

I like software raid. I like MD. It's hardware agnostic. That is really important to me. It has always been my friend. All the problems I have experienced over time have not been due to software raid, but rather the hardware.

With this final change to the supermicro card, I should have now effectively removed the last piece of the proprietary puzzle.

calvin940 · 22. September 2017

So, I did what I said I was going to do:

Powered down NAS
Yanked all 6 x 8TB drives
Pulled the Supermicro 8 port SATA controller and put back the Adaptec
Put back into the bays 5 of the 6 original 4TB drives
Booted into the OS. Raid / Volume not available
Assembled the raid with 5 of 6 disks
Rebooted to mount the voume
Got access to the file system but there were errors and fsck did not complete on boot up and was asking me to go into maintenance mode to check.
Decided to umount and remount as readonly
I Am now copying as much data as I can from the raid (some file system errors occasionally in places and certain files aren't recoverable).

So, was I right to not try to correct the file system using fsck? i was worried that writing to the array would result in potentially mucking everything up so I am just trying to salvage as much as possible in ReadOnly mode 1st pass.

When reassembling there were a bunch of event differences in the drives but it seemed to assemble at least in this state.

So I am doing some targetted rsyncing to external USB drives to recover what I can.

Then I will format everything and install 4.0.1 and build a full 6 x 8TB raid 6 from scratch.

Should I be doing anything else?

Thanks a lot for all of your collective help. It was/is very much appreciated. I am hopeful at least for partial recovery where I had felt before all hope was lost.

calvin940 · 21. September 2017

This does sound like an excellent plan. I would love to be able to provide something valuable and consistent when I get into these types of situation instead of fumbling around with piecemeal adhoc queries /commands.

I don't know where I will get in my current situation but if I have to rebuild my raid from scratch I will be implementing this the next time around for sure.

I'll let you know how plan B goes (but really I think the probability is not on my side). If it doesn't work, I'll just wipe everything and begin anew with OMV 3.x I Guess and start again (OR... SHOULD I GO 4.x??)

calvin940 · 20. September 2017

Zitat von tkaiser

Yep, but all disks show up with 15628053168 sectors in size. So in case you attach them back to the Adaptec please check the log and update the thread (or even better record output from 'gdisk -l /dev/sdb' with the Marvell controller now and the Adaptec then).

Attaching the 8TB drives back to the Adaptec will not give me anything as the drives aren't being exposed to the debian OS. When I rebooted my NAS they were in the adaptec post messages but not passed through to the OS (debian only saw my boot drive and 2 USB drives). It did not see the 8TB disks because they would need to be initialized as JBOD and thus get erased.

Damn adaptec and it's bullshit changing the drives to present them to an OS rather than simply passing them as generic discrete disks like a normal controller. The whole point in moving from a QNAP to a DIY NAS/RAID system was to try to remove proprietary crap for safety and recovery. And I ordered adaptec only because they had a JBOD option and I *THOUGHT* that it was straight forward. Once again, Adaptec has screwed me over (it did so a number of years ago and against my better judgement, I chose it again). *sigh*.

I think my only option is to try to recover my raid is to power down, take the 8TB out, put Adaptec back in, put 5 of the 6 of the original 4TB drives back in and try to recover my raid that way. If I can, then I will back up the contents to 2 of the 8 TB drives and then power down, replace adaptec with the new Supermicro and build my OMV box anew from scratch (and raid) with 3.x.

I think I'll start that tonight or tomorrow unless I hear any other hail-Mary ideas...

calvin940 · 20. September 2017

Zitat von tkaiser
Is it possible that the Adaptec changed disk geometry (possibly hiding sectors from the beginning of the disk since used 'internally')? Just asking since I'm currently fighting against a stupid USB-to-SATA bridge which does the same but from the disk's end (so it doesn't hurt that much, only backup GPT corrupted)?
@calvin940 in case your syslogs date back from before the controller change it might help to provide output from
Code
zgrep 'LBA48 NCQ ' /var/log/syslog* | curl -F 'sprunge=<-' http://sprunge.us

http://sprunge.us/gOjj

Not a lot there unfortunately. I don't reboot my NAS much .

calvin940 · 20. September 2017

Zitat von ryecoaaron

Sorry, I really don't have any ideas now. Maybe connect the drives back to the old raid card??

The problem I had originally was that after these shenanigans, when I rebooted, the drives were not showing up at all in Debian. Apparently the adaptec doesn't simply present a new disk as a drive. You need to configure them either in a raid or JBOD and then it presents it to the OS. hotswapping the drives seems to have cause the issue. when I went into the Adaptec Controller config (CTRL-A), it saw the raw devices, but when I went to manage JBOD , it said no JBODs found. Creating JBODS initializes the disk so that would mean wiping the drives (also not productive).

So... uhm, what about a really far out idea?

Given that I replaced the drives as I stated (one by one) and rebuilt each new drive, what about using the original drives? I have 5 of the 6 original 4TB drives that I swapped out. Could I put the adaptec back and slide those 5 in and try to re-assemble the raid based on those, get a 6th one in there to make it back to my original 6x4tb and then start the whole process over again? There has been changes to the filesystem since (ie. disparity between event numbers on each drive for sure), but what is the likelihood I could get data back?

calvin940 · 20. September 2017

Zitat von ryecoaaron

not good. I'm betting it says there is no superblock on the other drives as well. Try:

mdadm --examine /dev/sd[bcdefg]

Code

root@CHOMEOMV:/var/log/1# mdadm --examine /dev/sd[bcdefg]
/dev/sdb:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdc:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdd:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sde:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdf:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)
/dev/sdg:
   MBR Magic : aa55
Partition[0] :   4294967295 sectors at            1 (type ee)

Alles anzeigen

calvin940 · 20. September 2017

Zitat von ryecoaaron

Is there any array assembled in the output of cat /proc/mdstat?
If not, then try assembling it without sdb:
mdadm --assemble --force --verbose /dev/md127 /dev/sd[cdefg]

Code

root@CHOMEOMV:/var/log/1# mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcdefg]
mdadm: looking for devices for /dev/md127
mdadm: Cannot assemble mbr metadata on /dev/sdb
mdadm: /dev/sdb has no superblock - assembly aborted


root@CHOMEOMV:/var/log/1# cat /proc/mdstat
Personalities :
unused devices: <none>


root@CHOMEOMV:/var/log/1# mdadm --assemble --force --verbose /dev/md127 /dev/sd[cdefg]
mdadm: looking for devices for /dev/md127
mdadm: Cannot assemble mbr metadata on /dev/sdc
mdadm: /dev/sdc has no superblock - assembly aborted

Alles anzeigen

calvin940 · 20. September 2017

Zitat von tkaiser

All fine except your boot disk which is IMO the next failure candidate (check SMART attribute 193, be aware that specs talk about 600,000 and do a web search for 'wdd lcc problem').

I have 2 of these boot drives and they were both refurbs from NewEgg. I assume that would be factored into whether that parameter is of concern or not? I don't know what the process is when they perform a refurb and whether or not they would reset those values.

I clone one drive to another and bring the clone offline when I make any significant changes in case of failure, but I appreciate you pointing it out to me. I really should pay more attention to these values.

So, having said all this, any ideas on where I go from here?

calvin940 · 20. September 2017

@Dropkick Murphy

I was thinking about this too, but I can't imagine that sdb is any different than all the other drives in terms of state. It wouldn't make sense to me so I suspect even if I excluded sdb, I'd then get the same error on sdc when trying to reassemble without that drive.

Zitat von tkaiser
Log file excerpts aren't useful since filtered. And full SMART info for one drive is also somewhat useless when it's about checking all disks
I already posted two commands above and the below for example would put all SMART info of all of your drives to an online pasteboard service:
Code
for disk in /dev/sd? ; do smartctl -q noserial -a $disk ; done | curl -F 'sprunge=<-' http://sprunge.us

Here is the information asked for the in raid degraded help thread consolidated:

Code

root@CHOMEOMV:/var/log/1# cat /proc/mdstat
Personalities :
unused devices: <none>
root@CHOMEOMV:/var/log/1# blkid
/dev/sda1: UUID="7cab3464-94c9-4a02-bceb-85ec8aa9b523" TYPE="ext4"
/dev/sda5: UUID="2b207325-2a6e-4608-9f90-1356de34208c" TYPE="swap"
root@CHOMEOMV:/var/log/1# fdisk -l | grep "Disk "


WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.




WARNING: GPT (GUID Partition Table) detected on '/dev/sdc'! The util fdisk doesn't support GPT. Use GNU Parted.




WARNING: GPT (GUID Partition Table) detected on '/dev/sdd'! The util fdisk doesn't support GPT. Use GNU Parted.




WARNING: GPT (GUID Partition Table) detected on '/dev/sde'! The util fdisk doesn't support GPT. Use GNU Parted.




WARNING: GPT (GUID Partition Table) detected on '/dev/sdf'! The util fdisk doesn't support GPT. Use GNU Parted.




WARNING: GPT (GUID Partition Table) detected on '/dev/sdg'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sda: 160.0 GB, 160041885696 bytes
Disk identifier: 0x0008b16b
Disk /dev/sdb: 8001.6 GB, 8001563222016 bytes
Disk identifier: 0x16f2a91f
Disk /dev/sdc: 8001.6 GB, 8001563222016 bytes
Disk identifier: 0x16f2a91f
Disk /dev/sdd: 8001.6 GB, 8001563222016 bytes
Disk identifier: 0x16f2a91f
Disk /dev/sde: 8001.6 GB, 8001563222016 bytes
Disk identifier: 0x16f2a91f
Disk /dev/sdf: 8001.6 GB, 8001563222016 bytes
Disk identifier: 0x16f2a91f
Disk /dev/sdg: 8001.6 GB, 8001563222016 bytes
Disk identifier: 0x16f2a91f
root@CHOMEOMV:/var/log/1# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#


# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions


# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes


# automatically tag new arrays as belonging to the local system
HOMEHOST <system>


# definitions of existing MD arrays
ARRAY /dev/md127 metadata=1.2 name=CHOMEOMV:Storage001 UUID=7ff2875d:f8466166:ed83b1d8:5d486d37

Alles anzeigen

Here is a repost of my response to what you asked for before:

Code

Last login: Tue Sep 19 20:09:29 2017 from w10p-hero-ti.local
root@CHOMEOMV:~# du -sk /var/log/syslog* | sort -n -r
956     /var/log/syslog.1
120     /var/log/syslog
64      /var/log/syslog.2.gz
24      /var/log/syslog.3.gz
20      /var/log/syslog.5.gz
16      /var/log/syslog.4.gz
12      /var/log/syslog.7.gz
12      /var/log/syslog.6.gz
root@CHOMEOMV:~# for disk in /dev/sd? ; do smartctl -a $disk | grep '^199'; done
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

Alles anzeigen

Here is the link to the full Smartinfo dump (thanks for letting me know about the site)

http://sprunge.us/EdgM

calvin940 · 20. September 2017

Also took an excerpt from the initial Syslog from the first startup after changing the controller:

Code

Sep 19 20:08:32 CHOMEOMV kernel: [    8.169867] scsi0 : mvsas
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170193] sas: phy-0:0 added to port-0:0, phy_mask:0x1 (               0)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170210] sas: phy-0:1 added to port-0:1, phy_mask:0x2 ( 100000000000000)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170224] sas: phy-0:2 added to port-0:2, phy_mask:0x4 ( 200000000000000)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170238] sas: phy-0:3 added to port-0:3, phy_mask:0x8 ( 300000000000000)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170255] sas: phy-0:4 added to port-0:4, phy_mask:0x1 ( 400000000000000)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170269] sas: phy-0:6 added to port-0:5, phy_mask:0x4 ( 600000000000000)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170274] sas: DOING DISCOVERY on port 0, pid:216
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170655] sas: Enter sas_scsi_recover_host
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170658] ata7: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.170663] sas: sas_ata_hard_reset: Found ATA device.
Sep 19 20:08:32 CHOMEOMV kernel: [    8.171005] ata7.00: ATA-9: WDC WD80EFAX-68LHPN0, 83.H0A83, max UDMA/133
Sep 19 20:08:32 CHOMEOMV kernel: [    8.171007] ata7.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.174354] ata7.00: configured for UDMA/133
Sep 19 20:08:32 CHOMEOMV kernel: [    8.174357] sas: --- Exit sas_scsi_recover_host
Sep 19 20:08:32 CHOMEOMV kernel: [    8.189861] scsi 0:0:0:0: Direct-Access     ATA      WDC WD80EFAX-68L 83.H PQ: 0 ANSI: 5
Sep 19 20:08:32 CHOMEOMV kernel: [    8.189927] sd 0:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.189930] sd 0:0:0:0: [sdb] 4096-byte physical blocks
Sep 19 20:08:32 CHOMEOMV kernel: [    8.189946] sd 0:0:0:0: Attached scsi generic sg1 type 0
Sep 19 20:08:32 CHOMEOMV kernel: [    8.189956] sd 0:0:0:0: [sdb] Write Protect is off
Sep 19 20:08:32 CHOMEOMV kernel: [    8.189958] sd 0:0:0:0: [sdb] Mode Sense: 00 3a 00 00
Sep 19 20:08:32 CHOMEOMV kernel: [    8.189970] sd 0:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 19 20:08:32 CHOMEOMV kernel: [    8.189972] sas: DONE DISCOVERY on port 0, pid:216, result:0
Sep 19 20:08:32 CHOMEOMV kernel: [    8.189977] sas: DOING DISCOVERY on port 1, pid:216
Sep 19 20:08:32 CHOMEOMV kernel: [    8.208291] sas: Enter sas_scsi_recover_host
Sep 19 20:08:32 CHOMEOMV kernel: [    8.208294] ata7: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.208297] ata8: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.208300] sas: sas_ata_hard_reset: Found ATA device.
Sep 19 20:08:32 CHOMEOMV kernel: [    8.208643] ata8.00: ATA-9: WDC WD80EFAX-68LHPN0, 83.H0A83, max UDMA/133
Sep 19 20:08:32 CHOMEOMV kernel: [    8.208645] ata8.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.211965] ata8.00: configured for UDMA/133
Sep 19 20:08:32 CHOMEOMV kernel: [    8.211967] sas: --- Exit sas_scsi_recover_host
Sep 19 20:08:32 CHOMEOMV kernel: [    8.212354]  sdb: unknown partition table
Sep 19 20:08:32 CHOMEOMV kernel: [    8.212451] sd 0:0:0:0: [sdb] Attached SCSI disk
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225782] scsi 0:0:1:0: Direct-Access     ATA      WDC WD80EFAX-68L 83.H PQ: 0 ANSI: 5
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225850] sd 0:0:1:0: [sdc] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225853] sd 0:0:1:0: [sdc] 4096-byte physical blocks
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225871] sd 0:0:1:0: Attached scsi generic sg2 type 0
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225882] sd 0:0:1:0: [sdc] Write Protect is off
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225884] sd 0:0:1:0: [sdc] Mode Sense: 00 3a 00 00
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225898] sas: DONE DISCOVERY on port 1, pid:216, result:0
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225901] sd 0:0:1:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225907] sas: DOING DISCOVERY on port 2, pid:216
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243403] sas: Enter sas_scsi_recover_host
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243405] ata7: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243409] ata8: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243411] ata9: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243414] sas: sas_ata_hard_reset: Found ATA device.
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225850] sd 0:0:1:0: [sdc] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225853] sd 0:0:1:0: [sdc] 4096-byte physical blocks
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225871] sd 0:0:1:0: Attached scsi generic sg2 type 0
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225882] sd 0:0:1:0: [sdc] Write Protect is off
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225884] sd 0:0:1:0: [sdc] Mode Sense: 00 3a 00 00
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225898] sas: DONE DISCOVERY on port 1, pid:216, result:0
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225901] sd 0:0:1:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 19 20:08:32 CHOMEOMV kernel: [    8.225907] sas: DOING DISCOVERY on port 2, pid:216
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243403] sas: Enter sas_scsi_recover_host
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243405] ata7: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243409] ata8: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243411] ata9: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243414] sas: sas_ata_hard_reset: Found ATA device.
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243756] ata9.00: ATA-9: WDC WD80EFAX-68LHPN0, 83.H0A83, max UDMA/133
Sep 19 20:08:32 CHOMEOMV kernel: [    8.243758] ata9.00: 15628053168 sectors, multi 0: LBA48 NCQ (depth 31/32)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.247075] ata9.00: configured for UDMA/133
Sep 19 20:08:32 CHOMEOMV kernel: [    8.247077] sas: --- Exit sas_scsi_recover_host
Sep 19 20:08:32 CHOMEOMV kernel: [    8.247920]  sdc: unknown partition table
Sep 19 20:08:32 CHOMEOMV kernel: [    8.248016] sd 0:0:1:0: [sdc] Attached SCSI disk
Sep 19 20:08:32 CHOMEOMV kernel: [    8.261689] scsi 0:0:2:0: Direct-Access     ATA      WDC WD80EFAX-68L 83.H PQ: 0 ANSI: 5
Sep 19 20:08:32 CHOMEOMV kernel: [    8.261777] sd 0:0:2:0: [sdd] 15628053168 512-byte logical blocks: (8.00 TB/7.27 TiB)
Sep 19 20:08:32 CHOMEOMV kernel: [    8.261780] sd 0:0:2:0: [sdd] 4096-byte physical blocks
Sep 19 20:08:32 CHOMEOMV kernel: [    8.261781] sd 0:0:2:0: Attached scsi generic sg3 type 0
Sep 19 20:08:32 CHOMEOMV kernel: [    8.261811] sd 0:0:2:0: [sdd] Write Protect is off
Sep 19 20:08:32 CHOMEOMV kernel: [    8.261812] sas: DONE DISCOVERY on port 2, pid:216, result:0
Sep 19 20:08:32 CHOMEOMV kernel: [    8.261814] sd 0:0:2:0: [sdd] Mode Sense: 00 3a 00 00
Sep 19 20:08:32 CHOMEOMV kernel: [    8.261818] sas: DOING DISCOVERY on port 3, pid:216
Sep 19 20:08:32 CHOMEOMV kernel: [    8.261826] sd 0:0:2:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Sep 19 20:08:32 CHOMEOMV kernel: [    8.283117] sas: Enter sas_scsi_recover_host
Sep 19 20:08:32 CHOMEOMV kernel: [    8.283120] ata7: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.283124] ata8: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.283126] ata9: sas eh calling libata port error handler
Sep 19 20:08:32 CHOMEOMV kernel: [    8.283128] ata10: sas eh calling libata port error handler

Alles anzeigen

I am not exactly sure how to read it through nor do I see (for me) any obvious issue.

What other information can I provide?

Thanks all for helping me.

calvin940 · 20. September 2017

Code

root@CHOMEOMV:~# du -sk /var/log/syslog* | sort -n -r
956     /var/log/syslog.1
120     /var/log/syslog
64      /var/log/syslog.2.gz
24      /var/log/syslog.3.gz
20      /var/log/syslog.5.gz
16      /var/log/syslog.4.gz
12      /var/log/syslog.7.gz
12      /var/log/syslog.6.gz


root@CHOMEOMV:~# for disk in /dev/sd? ; do smartctl -a $disk | grep '^199'; done
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

Alles anzeigen

New Sata Controller card:
SUPERMICRO AOC-SAS2LP-MV8 PCI-Express 2.0 x8 SATA / SAS 8-Port Controller Card

Old Controller card:
Adaptec RAID 6805E 2271800-R 6Gb/s SATA/SAS 8 Internal Ports w/ 128MB Cache Memory Controller Card, Kit
(configured originally with 6 x 4TB configured as JBOD)

All drives are/were WD REDs.

First 6 x 4TB WD reds

now

6 x 8TB reds

Here is smartctl from one of the drives

Code

root@CHOMEOMV:~# smartctl --all /dev/sdb
smartctl 5.41 2011-06-09 r3365 [i686-linux-3.2.0-4-686-pae] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net


=== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EFAX-68LHPN0
Serial Number:    XXXXXXXXXXXXXX
LU WWN Device Id: 5 000cca 252c77b42
Firmware Version: 83.H0A83
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Wed Sep 20 08:51:12 2017 ADT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   93) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.


SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   129   129   054    Old_age   Offline      -       112
  3 Spin_Up_Time            0x0007   100   100   024    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       7
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       75
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       7
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       16
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       16
194 Temperature_Celsius     0x0002   180   180   000    Old_age   Always       -       36 (Min/Max 24/43)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0


SMART Error Log Version: 1
No Errors Logged


SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]




SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


root@CHOMEOMV:~#

Alles anzeigen

SAS Cables (2x4sata hydra) on a Silverstone 8 bay hotswap PC case with a backplane.

calvin940 · 20. September 2017

I am not afraid of taking my time. I can have patience for a long process if the outcome has a higher probability of success. But from the vibe I get from you it would seem that it is my only option and that there is not a good success rate. And likely that action is not recoverable either. Fair assessment?

calvin940 · 20. September 2017

Here is what I got:

Code

root@CHOMEOMV:/var/log# mdadm --assemble --force --verbose /dev/md127 /dev/sd[bcdefg]
mdadm: looking for devices for /dev/md127
mdadm: Cannot assemble mbr metadata on /dev/sdb
mdadm: /dev/sdb has no superblock - assembly aborted

calvin940 · 20. September 2017

OMV 2.1

had 6 x 4TB drives.

Adaptec controller configured disks as JBOD

Raid running fine.

Then one by one, replaced disks with 8TB - never rebooting NAS in between.

Grew raid, lvm, etc until had to grow filesystem. Found out resize2fs with OMV2.1 (Wheezy) has older resize2fs that doesn't support raid larger than 16TB.

I downloaded Live debian 9 on USB with newer core to use resize2fs on my volume. I restart NAS, miss the boot option it goes back into OMV. However, raid is missing now. /dev/sd[b-g] no longer in dev list.

I reboot again and hit control-A and no JBOD showing up any longer. it looks like adaptec actually does something to disks to make them JBOD.

So I buy a pure 8 port SATA card. replace the adaptec with this SUpermicro card. It only does JBOD. Now when I boot up OMV, I see the drives /dev/sd[b-g], however syslog still says:

Sep 19 20:08:34 CHOMEOMV anacron[2496]: Anacron 2.3 started on 2017-09-19
Sep 19 20:08:35 CHOMEOMV mdadm[2533]: DeviceDisappeared event detected on md device /dev/md127

Code

root@CHOMEOMV:/var/log# mdadm --examine /dev/sd[b-g] | egrep 'Event|/dev/sd'
/dev/sdb:
/dev/sdc:
/dev/sdd:
/dev/sde:
/dev/sdf:
/dev/sdg:


root@CHOMEOMV:/var/log# mdadm --detail --scan --verbose


root@CHOMEOMV:/var/log# mdadm --detail /dev/md127
mdadm: cannot open /dev/md127: No such file or directory


root@CHOMEOMV:/var/log# cat /proc/mdstat
Personalities :
unused devices: <none>
root@CHOMEOMV:/var/log# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#


# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions


# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes


# automatically tag new arrays as belonging to the local system
HOMEHOST <system>


# definitions of existing MD arrays
ARRAY /dev/md127 metadata=1.2 name=CHOMEOMV:Storage001 UUID=7ff2875d:f8466166:ed83b1d8:5d486d37

Alles anzeigen

What do I do now? Can I force assemble the raid array?

Any help would be appreciated.

calvin940 · 2. November 2016

That's great to hear. Glad you have an opportunity to get your critical data off before you go to the next step of recovery.

Cheers

calvin940 · 2. November 2016

Zitat von neoreturn

so i tired the command yesterday (the --level=5 was not supported on the assemble command btw, so i skipped this one).

it did not work (devices were all busy)
.
.
.

I had this initially as well. This happened because my removed drives somehow got assigned to a different raid (my original raid was /dev/md0 but after the failures, /dev/sdf, /dev/sdg, /dev/sdh were all assigned to some phantom raid /dev/md127 - could see this with the cat /proc/mdstat).

So first I stopped /dev/md127 to make them not busy. Then I executed the assemble command and that worked.

Adding /dev/sda in the way you did I believe caused the issue you are seeing.

I hate to make a recommendation here without having as much knowledge the other good folks, but I think you should:

cat /proc/mdstat

and stop all raids you see there (md127, md128 or md0 or whatever)

Then I think you should assemble you raid using the disks that reported to up UP previously. That looks like this:

mdadm --assemble /dev/md127 /dev/sd[bcd] --verbose --force

I am excluding the sda because I think that it is a problem so the plan would be to assemble based on the drives that appeared to be fine. I'd focus first on getting the critical data off before you attempt to get a recovered array.

I would caution you that I don't have as much experience and knowledge as others so you might want to wait for one of those folks to help.

calvin940 · 2. November 2016

Zitat von ryecoaaron

Looks like it is working. cat /proc/mdstat will tell you for sure.

Gotcha. This looks good.

Code

root@CHOMEOMV:/tmp# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1]
      15623215104 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/6] [UUUUUU]


unused devices: <none>

Beiträge von calvin940

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

RAID 6 Array (6x8TB) device Disappeared

Degraded array (removed disk) after OMV 0.5 -> 2.x reinstall

Degraded array (removed disk) after OMV 0.5 -> 2.x reinstall

OMV 2.0 - Raid 6 (6 x 4TB) FAILED state - Need some serious help