Raid 0 working, as clean but filesystem is "missing"

kokodin · 5. Februar 2024

Hello i'm just a stupid admin / electrician, so what i say might be incohearent for an actual tech person. Please forgive my lack of technical language.

After powerlose my homebrew fileserver at work have a missing filesystem. I am pretty sure it might have something to do with smart status of one drive being "few bad sectors" but raid does assemble and work as "clean" status with no drives missing, only the filesystem is missing.

Required commands outputs

Code

root@sejf:/# cat /proc/mdstat
Personalities : [raid0] [linear] [multipath] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid0 sdc[0] sdb[3] sdd[2] sde[1]
      1953017856 blocks super 1.2 512k chunks

unused devices: <none>

Code

root@sejf:/# blkid
/dev/sde: UUID="fa9d9871-e08e-0991-28b9-1686674acb8f" UUID_SUB="29e259c0-8d5c-d4fe-ec7d-2377526a6590" LABEL="sejf.local:magazyn" TYPE="linux_raid_member"
/dev/sdb: UUID="fa9d9871-e08e-0991-28b9-1686674acb8f" UUID_SUB="4bdc5a68-0a08-e92b-acbd-54f80eac2716" LABEL="sejf.local:magazyn" TYPE="linux_raid_member"
/dev/sda1: UUID="390b1de1-90ce-4894-95a4-90e34d2aec8c" TYPE="ext4" PARTUUID="6e05c9d1-01"
/dev/sda5: UUID="4d880235-05ac-4272-b613-34b130d10b07" TYPE="swap" PARTUUID="6e05c9d1-05"
/dev/sdc: UUID="fa9d9871-e08e-0991-28b9-1686674acb8f" UUID_SUB="9ff6f7d0-f639-6510-3928-587cd931b21c" LABEL="sejf.local:magazyn" TYPE="linux_raid_member"
/dev/sdd: UUID="fa9d9871-e08e-0991-28b9-1686674acb8f" UUID_SUB="3b7daa3f-36ed-50f8-b00b-feae1dff8b4c" LABEL="sejf.local:magazyn" TYPE="linux_raid_member"

Code

root@sejf:/# fdisk -l | grep "Disk "
Disk /dev/sde: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDC WD5000AAKX-0
Disk /dev/sdb: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: SAMSUNG HD502HJ
Disk /dev/sda: 74,5 GiB, 80026361856 bytes, 156301488 sectors
Disk model: WDC WD800JD-00MS
Disk identifier: 0x6e05c9d1
Disk /dev/sdc: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: SAMSUNG HD502HJ
Disk /dev/sdd: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: SAMSUNG HD502HJ

Alles anzeigen

Code

root@sejf:/# cat /etc/mdadm/mdadm.conf
# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# definitions of existing MD arrays
ARRAY /dev/md0 metadata=1.2 name=sejf.local:magazyn UUID=fa9d9871:e08e0991:28b91686:674acb8f

Alles anzeigen

Code

root@sejf:/# mdadm --detail --scan --verbose
ARRAY /dev/md0 level=raid0 num-devices=4 metadata=1.2 name=sejf.local:magazyn UUID=fa9d9871:e08e0991:28b91686:674acb8f
   devices=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde

Code

root@sejf:/# mdadm -D /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Mon Aug 16 09:38:40 2021
        Raid Level : raid0
        Array Size : 1953017856 (1862.54 GiB 1999.89 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

       Update Time : Mon Aug 16 09:38:40 2021
             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

        Chunk Size : 512K

Consistency Policy : none

              Name : sejf.local:magazyn
              UUID : fa9d9871:e08e0991:28b91686:674acb8f
            Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8       16        3      active sync   /dev/sdb

Alles anzeigen

It is a simple 4 drives striped volume i wanted to pull data from and rebuild it with a new set of drives, but if i can't mount filesystem data is as good as lost. it is not a critical storage, but some data could needs to be onlineby next week from a local computer backup. But i have no idea what users put there.

It is used as a fileserver for a small school, to share files between computer classrooms, so students or teachers might have put something up there without me knowing and there is no backup then.

One thing that might be important i might add, there is no error messages on terminal window, but local monitor there is one repeated message

Code

blk_update_request: i/o error, dev/sdc sector 264192 op 0x0 read flags 0x80700 phys seg 1 prio class 0
blk_update_request: i/o error, dev/sdc sector 264192 op 0x0 read flags 0x0 phys seg 1 prio class 0
buffer i/o error on dev md0 logical block 0 async page read

I assume it is bad since this raid has no redundancy , but i am not linux educated person so there might ba a simple fix like scandisc, pray , copy, and run away.

BernH · 5. Februar 2024

I can’t help much with the recovery. geaves is the resident mdadm expert, but I can tell you about raid levels and a serious mistake you made.

RAID0 (stripe) is a horrible decision for a server array. It offers absolutely no protection against a drive failing. If you loose one drive you loose everything with no possibility to recover. Every other RAID level offers some king of data redundancy at the expense of drive space.

If you don’t require the high availability feature of a raid, you would have been better off using mergerfs. It would pool the drives into a single volume, making it appear the same as the raid0 in size, but each drive would be independent so loosing one does not loose everything. Adding snapraid on top can add some raid like redundancy but still have the drives be independent from each other

Krisbee · 5. Februar 2024

kokodin As you realise raid0 is living dangerously, if one drive is kicked out of the array you can kiss it and the data goodbye, but your outputs show the array is still up. If the filesystem on /dev/md0 is not mounting due to errors on the drive currently lettered as /dev/sdc, its possible that a filesystem check might revive it. A filesystem check must be run on an unmounted filesystem which is the current state if it's shown as missing in the WebUI and run against the RAID not individual disks. So I believe the command should be fsck -y /dev/md0

Added to the risk of running raid0 is using ten year old samsung spin points drives. With a total capacity of only 2TB, a single 2TB drive in the server routinely backup to one 2TB in an external drive would be a way safer bet than RAID0.

( I don't think geaves is available at the moment)

kokodin · 6. Februar 2024

I am pretty sure the data is byebye then. Because fsck thinks it is ext4 as it should but then spit out input output error while trying to open /dev/md0. I think if there is no way to safely probe the drive outside the raid or clone raw to another drive it will simply refuse access due to being dificult :], some samsung self defense or simply drive being broken. it may be just that smart makes it slow (some kind of recovery mode) and unable to synch up with the remaining 3 while still works as a basic storage device thus raid sees it as working.

I totaly agree it is stupid to make raid volume like this, and i had known the risks while making it. But it was made to test structural network troughput over the school and just kind of remain active due to docker serving some other functions on the same server. It mostly works as a clonezilla short term backup server for images of other computers and file server for providing installation images. So i wouldn't lie i am a bit salty about it not working. but it might be time to build recovery images from scratch after 5 years of junk acumulation. The only thing that might been valuable is the stuff that some teacher might put there for safekeeping , ironically.

thanks for help anyway

I will remain open to any ideas till thursday if there is anything else we can do with it, like special blend of clonezilla or something to a single clone drive if possible , but i think on friday the case will be close and i would have to put in place something else.

Krisbee · 6. Februar 2024

kokodin I don't know if you ever checked the SMART data on /dev/sdc to see what the counts were for Reallocated/Pending/Uncorrectable blocks. A mdadm -E /dev/sdc would also show if bad blocks were present in the "bad block log" of the raid member.

AFAIK, as you say, there is no safe way to remove the faulty disk to test it outside the array. For a redundant array the typical procedure is simply to fail and remove a suspect drive from the array and replace it with a suitable spare drive. Redundancy being a fundamental reason for using RAID in the first place, along with combining individual drive storage into a greater whole. Again, AFAIK, for a redundant array, a tool like ddrescue can be used to image a failing drive to a new one ( exact copy minus the bad blocks ) and then there's a chance you might reconstruct a working array.

I came across this thread about removing a drive from a RAID0 array which uses some trickery to change it to a level 4, adding a temporary non-striped parity drive etc. before reverting to RAID0. But the starting point is a RAID0 array with a working filesystem. In the absence of other ideas, it might worth a shot. https://serverfault.com/questi…dm-remove-disk-from-raid0

Krisbee · 6. Februar 2024

@kokdin I need to correct a previous statement. There is hope. I'd confused what you can do while the system remains online and once it's shutdown. You can use ddrescue on a single RAID0 member to image a disk to a new drive once OMV is shutdown with the help of a SystemRescueCD. But it does of course mean you need a suitable spare drive - supposedly identical size/make/model to the faulty drive.

See these two refs:

Using ddrescue to recover a RAID disk with many bad blocks

Using ddrescue to recover a RAID disk with many bad blocksOften when one drive in an array fails, other drives

kb.promise.com

https://linuxconfig.org/how-to-repair-and-clone-disk-with-ddrescue

Add the spare to the system, then when you boot the system with a sytemrescuecd choose the option not to activate any MD raid ..... Keep a careful track of the drives in the system to ensure you use ddrescue on the faulty disk. As you've no spare space on your boot drive, you'd have to clone directly from faulty drive (/dev/sdX) to the spare drive /dev/sdY), e.g:

Code

ddrescue -d -f /dev/sdX /dev/sdY clone.logfile

Change X & Y according to your device lettering.

kokodin · 7. Februar 2024

mdadm -E /dev/sdc

Code

root@sejf:/# mdadm -E /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : fa9d9871:e08e0991:28b91686:674acb8f
           Name : sejf.local:magazyn
  Creation Time : Mon Aug 16 09:38:40 2021
     Raid Level : raid0
   Raid Devices : 4

 Avail Dev Size : 976508976 (465.64 GiB 499.97 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : 9ff6f7d0:f6396510:3928587c:d931b21c

    Update Time : Mon Aug 16 09:38:40 2021
  Bad Block Log : 512 entries available at offset 8 sectors
       Checksum : ea05aeb5 - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

Alles anzeigen

I will try that rescue cd, but i don't have identical drive, i do however have identical western digital drive with 900 hours online and working fine so it may be a good option. Since they are all striped equaly , valid data block should be the same size on each drive as the smallest one in the array., But they both exacly the same size jurdging by the smart report.

here some informations i can gather from smart page in the gui (part of it , because message go over the character limit of single post)

Code

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.10.0-0.bpo.15-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F3
Device Model:     SAMSUNG HD502HJ
Serial Number:    S20BJ9GB601983
LU WWN Device Id: 5 0024e9 20575d939
Firmware Version: 1AJ10001
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 1.5 Gb/s
Local Time is:    Wed Feb  7 10:17:35 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is:     254 (maximum performance), recommended: 254
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         ( 4740) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  79) minutes.
SCT capabilities:            (0x003f)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   100   051    -    7195
  2 Throughput_Performance  -OS--K   252   252   000    -    0
  3 Spin_Up_Time            PO---K   083   073   025    -    5402
  4 Start_Stop_Count        -O--CK   100   100   000    -    249
  5 Reallocated_Sector_Ct   PO--CK   252   252   010    -    0
  7 Seek_Error_Rate         -OSR-K   252   252   051    -    0
  8 Seek_Time_Performance   --S--K   252   252   015    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    67731
 10 Spin_Retry_Count        -O--CK   252   252   051    -    0
 11 Calibration_Retry_Count -O--CK   252   252   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    382
191 G-Sense_Error_Rate      -O---K   100   100   000    -    8
192 Power-Off_Retract_Count -O---K   252   252   000    -    0
194 Temperature_Celsius     -O----   064   050   000    -    27 (Min/Max 14/56)
195 Hardware_ECC_Recovered  -O-RCK   100   100   000    -    0
196 Reallocated_Event_Count -O--CK   252   252   000    -    0
197 Current_Pending_Sector  -O--CK   100   100   000    -    8
198 Offline_Uncorrectable   ----CK   252   252   000    -    0
199 UDMA_CRC_Error_Count    -OS-CK   091   091   000    -    5113
200 Multi_Zone_Error_Rate   -O-R-K   100   100   000    -    1852
223 Load_Retry_Count        -O--CK   252   252   000    -    0
225 Load_Cycle_Count        -O--CK   100   100   000    -    397
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      2  Comprehensive SMART error log
0x03       GPL     R/O      2  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      2  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
Device Error Count: 11273 (device log contains only the most recent 8 errors)
    CR     = Command Register
    FEATR  = Features Register
    COUNT  = Count (was: Sector Count) Register
    LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    LH     = LBA High (was: Cylinder High) Register    ]   LBA
    LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    LL     = LBA Low (was: Sector Number) Register     ]
    DV     = Device (was: Device/Head) Register
    DC     = Device Control Register
    ER     = Error register
    ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11273 [0] occurred at disk power-on lifetime: 2195 hours (91 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 08 00 00 00 04 08 00 e0 00  Error: UNC 8 sectors at LBA = 0x00040800 = 264192

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  c8 00 00 00 08 00 00 00 04 08 00 e0 08     00:00:01.002  READ DMA
  b0 00 da 00 00 00 00 00 c2 4f 00 00 08     00:00:01.002  SMART RETURN STATUS
  27 00 00 00 00 00 00 00 00 00 00 e0 08     00:00:01.002  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 00 00 00 00 00 a0 08     00:00:01.002  IDENTIFY DEVICE
  ef 00 03 00 46 00 00 00 00 00 00 a0 08     00:00:01.002  SET FEATURES [Set transfer mode]

Alles anzeigen

Krisbee · 7. Februar 2024

kokodin Honestly, I think I've been only half awake in the last couple of days, as this thread has turned out backwards. I should have asked you for the SMART data upfront. Did you ever bother to set up scheduled SMART tests on your OMV server along with notifications? You can see that the errant disk has a "Current_Pending_Sector" of eight, showing there's some kind of read problem "Error: UNC 8 sectors at LBA = 0x00040800 = 264192" and as show as the error messages on the local monitor you posted at #1 above. The disk has had it.

You can't attempt to correct those read errors non-destructively. You need to write-read-verify the bad sectors which typically is done by writing zeros to the whole disk before deciding if it's good enough to (re)use. So it's inevitable that's there's been some data loss.

Ideally, if you can find a second spare drive large enough to hold an image file of the bad disk you would be better to use ddrescue to create an image first before copying the image to the spare WD drive you have.

kokodin · 8. Februar 2024

Well the ddrescue had been atempted. And i live hard die fast or something , no image just plain disk to disk clone .

Size of the drives is identical to a single byte.

Initially ddrescue designated around 8mb as problematic of which it recovered 6 and slightly less than 2mb was in bad sectors. Therefore i think gone. How much of it was on free space and how much in data holding area i can't tell. But most wear was between 70 and 85% of the drive where read error count jumped from 30 to 130 and first pass finished with only one read error more. (after advanced data recovery pases ddrescue reported around 5000 read errors trough wgole process)

That being said we don't care about how many error it had but did it worked, and yes it did.

Initially array self assembled with the copy but volume remained missing, but this time runing ckdsk on it let it do the job and filesystem is back, mounted and files seems to be on it.

How many of them are corrupted i will only know when i find one, but there is some redundency and i don't really need everything.

There is one conspicuous empty folder, but it might been empty for a while , there is no random extra files, no silly names.

Either checkdisk unceremonially removed all corupted files, or i was lucky

Since schools in my region have very tight budget on non esential equipment, i will probably have no budget for new drives untill december, but i will make regular backups from now on.

The only other option to replace the drives i have now is to swap them with barely used laptop toshibas mq01abd100 of which i have 15 laying around, but that bad idea of itself even in raid 1+0 configuration (server only have 5 sata ports and one is for system boot drive (which is even older) so the size would be still only 1,8TB ) i don't think running laptop drives 24/7 would work long term and they might not even live untill december, but maybe i am biased. Nothind stoping me from using them as backup usb drives though.

For now i think the raid is saved and there is no much more to be done to improve it. I will pull the data of next trough network since there is no other way. And if you think raid 1+0 of 2,5 inch 1tb slightly used laptop drives is better than running raid 0 of ancient desktop drives i might do that.

As for smart sheduled checks i had no idea that is a thing untill recently so of course there was never any of them scheduled or performed. Server wasn't even planed to exist. It was hastly assembled 3 years ago from junked vista computer, got 2 extra gigabit nics, i think at one point it even had 5, but the network is bottlenecked at 1gbps by goverment program "new network equipment", and we ended up with just feeding 3 separate networks at 1gbps instead of bundling the bandwidth into one so harddrives didn't even utilize full raid0 speed over network. And server just kind of remainded operating since one of the teachers asked me to keep it. Probably gobbles power like a champ for a job rawsberry pi would do faster too.

Thanks Krisbee you really saved my bacon this time. if you think my dodgy raid 1+0 of laptop drives is a better option let me know, because otherwise they will just rot in the box

Krisbee · 8. Februar 2024

kokodin You did a good job on the data recovery. Maybe use the laptops as backups over usb, but I wouldn't for 24/7 file server duty. If you do use them for backups test each drive thoroughly first with a full badblocks test (https://wiki.archlinux.org/title/badblocks) and possibly rotate your backups using more than one laptop drive.

kokodin · 9. Februar 2024

Well i do use some of those drives as mobile disk image repository for clonezilla, but that is 2 out of 15 , rest is dormand state in humidity controlled room (where 3d printer filement is), all pulled from fairly new dell laptops when we discovered you can't really use them with slow hdd and windows 10. Laptops were bought in late 2019, switch over to ssd's happened around summer 2021, (laptops were barely used becuse they were infuriatingly slow) so i wouldn't even check them to be honest other than confirming run time and smart health with crystal .They have like average 9 days (200-250 hours) of use, and most of it is on dell factory side.

Although using crystal is scary on by itself since my work computer has another one of those samsungs drives with much more uptime than a server ones, to the point i think crystal confusing hours with minutes (around 30500 hours with 5000 powercycles)but then i think about how i use my computer and it checks out. Although failed samsung had twice that. While rest of the raid drives hover around 17000 and 26000 so this one must been saved from another desktop computer used as a domain controller around 2010 and 2013 . It also had different firmware revision than all the other samsung drives i have of this model so i suspect i am his first owner, and it never left my "server room"(more of a storage for various electrical things). It is funny what you find out when you investigatefaied computer components.

Currently raid runs on 2 samsung drives with ~26000 and 19000 hours and 2 identical wdc's with 17000 and 900 hours on the clock. They all running on sata 3 speed have the same capacity and cahe size and are of the same firmware revision in pairs.

All the data has been pulled over to external laptop hdd's overnight.

I think if i can hack e-sata port of the motherboard i could run system drive from it and possibly rebuild raid as raid 5 if i find another 500gb drive in my used computers pile. Which isn't unlikely i have surprisingly amount of junk from retired computers and related electronics, mostly 80-250gb though i have some oddball 1tb drives too and i am convinced there was 4 of identical samsung + the one that failed and there should be 4 of those wdc drives too. just need to think where i put them. because i currently don't know where 3 of them are (i have a strong suspicion though) Living with raid 5 would so much less stressfull untill i can get something new. For obvious reason raid 1 or 6 is currently out of the question , I would need a motherboard swap.

Krisbee · 9. Februar 2024

Linux + hdd + laptop was/is doable, but not Windows10. With a a shoe-string budget you have no easy choices. Is a HBA card or basic 2 port SATA pcie card feasible in your m/board?

kokodin · 9. Februar 2024

It is a intel made desktop motherboard based on 45g chipset so technically i could put in pcie controller 2 x1 and one x16 slots are open, and one pci. I might have pci sata150 controller somewhere or bodge ide>sata over adapter. But financially my hands are tied for non emergency stuff untill april. i did found the esata>sata cable on local ebay for 2$ so i just bought it from my own pocket.

Board by itself have 5 sata 300 ports with some software/hardware raid hybrid in the bios, but only for raid 0 and 1 and one esata port on the back i/o shield, which is on separate sata channel i hope. i also upgraded southbridge heatsink since i didn't trust sensor reasing of how hot it runs 3 years ago, but now i have to remove it whenever i'm swaping drives :] it is just 3 mm too high.

I did find one more matching wdc drive, but with bad sectors so it is delegated to be scraped along with some other drives i keep for no good reason

i did also dig out 3 - 1tb drives , each different and each overworked, if one would test ok i might dump my daily driver samsung just to have fifth drive, or would have to dig deeper untill i find out where the missing samsung and one wdc went :] because for sure they would be lower hour drives . The 1 tb 3,5 inch drives are mostly from 2014 with 5 years online in security cameras so, those are slow and relible, smart test ok, but also old and even more worn out than my raid drives with exeption to broken one.

i did clean up the server case though, some cable mannagment, some new thermal pase , relocated system boot drive away from array drives and made room for 3 more drives on the sled, ironically new living space is away from air duct so it might be warmer since samsungs idle at 26, wdc's at 27 and relocated boot drive shows 29 celsius so it should be in low 40's during summer

Krisbee · 9. Februar 2024

A 45g chipset? So Pentium 4 era, I think. This really is stretching old computer re-use.

In case it wasn't obvious, never mix a m/board's BIOS fake RAID with Linux software raid. Finding the m/boards spec/manual should tell you if the esata port is shared if that's not clear in the BIOS settings. Happy juggling.

kokodin · 12. Februar 2024

Sorry not 45g but g45 i always get those mixed up . Also i think pentium 4 era was 845g to 945g there was no 45g alone.

Either way it is the 4th and last generation of lga 775 chipsets made by intel, and the last ddr2 board they made for the desktop.

It was pain to find bios version openmediavault would not complain about but that entirely different story.

Anyway i think the topic came to it's own natural conclusion and should be wraped up. raid 0 was restored and soon it will be gone due to upgrades. thanks for help and i hope we all have stress free experience from now on.

Raid 0 working, as clean but filesystem is "missing"

crashtest 5. Februar 2024

Jetzt mitmachen!