Beiträge von kokodin

kokodin · 12. Februar 2024

Hello . i am here to ask for more help. typical of me .

I am currently trying to migrate my low budget school file server, with docker aplications runing on it from using raid 0 array to raid 5

The problem is only avalable sata port i have left to connect another drive is an esata port.

I would rather not mix esata with internal sata ports in an array so i thought i move system drive to that port, but since this motherboard is made by intel with some crazy ideas, esata port always work in ahci mode while i installed my server in ide mode long ago.

Switching type of operation for the controlled ends up in non bootable system right after grub so i presume the same will be true for external sata port.

is thera a way to fix it for a non linux user like me? I would rather not downsize the array, it is small as it is.

i know how to do it for windows but that won't hepl much.

kokodin · 12. Februar 2024

Sorry not 45g but g45 i always get those mixed up . Also i think pentium 4 era was 845g to 945g there was no 45g alone.

Either way it is the 4th and last generation of lga 775 chipsets made by intel, and the last ddr2 board they made for the desktop.

It was pain to find bios version openmediavault would not complain about but that entirely different story.

Anyway i think the topic came to it's own natural conclusion and should be wraped up. raid 0 was restored and soon it will be gone due to upgrades. thanks for help and i hope we all have stress free experience from now on.

kokodin · 9. Februar 2024

It is a intel made desktop motherboard based on 45g chipset so technically i could put in pcie controller 2 x1 and one x16 slots are open, and one pci. I might have pci sata150 controller somewhere or bodge ide>sata over adapter. But financially my hands are tied for non emergency stuff untill april. i did found the esata>sata cable on local ebay for 2$ so i just bought it from my own pocket.

Board by itself have 5 sata 300 ports with some software/hardware raid hybrid in the bios, but only for raid 0 and 1 and one esata port on the back i/o shield, which is on separate sata channel i hope. i also upgraded southbridge heatsink since i didn't trust sensor reasing of how hot it runs 3 years ago, but now i have to remove it whenever i'm swaping drives :] it is just 3 mm too high.

I did find one more matching wdc drive, but with bad sectors so it is delegated to be scraped along with some other drives i keep for no good reason

i did also dig out 3 - 1tb drives , each different and each overworked, if one would test ok i might dump my daily driver samsung just to have fifth drive, or would have to dig deeper untill i find out where the missing samsung and one wdc went :] because for sure they would be lower hour drives . The 1 tb 3,5 inch drives are mostly from 2014 with 5 years online in security cameras so, those are slow and relible, smart test ok, but also old and even more worn out than my raid drives with exeption to broken one.

i did clean up the server case though, some cable mannagment, some new thermal pase , relocated system boot drive away from array drives and made room for 3 more drives on the sled, ironically new living space is away from air duct so it might be warmer since samsungs idle at 26, wdc's at 27 and relocated boot drive shows 29 celsius so it should be in low 40's during summer

kokodin · 9. Februar 2024

Well i do use some of those drives as mobile disk image repository for clonezilla, but that is 2 out of 15 , rest is dormand state in humidity controlled room (where 3d printer filement is), all pulled from fairly new dell laptops when we discovered you can't really use them with slow hdd and windows 10. Laptops were bought in late 2019, switch over to ssd's happened around summer 2021, (laptops were barely used becuse they were infuriatingly slow) so i wouldn't even check them to be honest other than confirming run time and smart health with crystal .They have like average 9 days (200-250 hours) of use, and most of it is on dell factory side.

Although using crystal is scary on by itself since my work computer has another one of those samsungs drives with much more uptime than a server ones, to the point i think crystal confusing hours with minutes (around 30500 hours with 5000 powercycles)but then i think about how i use my computer and it checks out. Although failed samsung had twice that. While rest of the raid drives hover around 17000 and 26000 so this one must been saved from another desktop computer used as a domain controller around 2010 and 2013 . It also had different firmware revision than all the other samsung drives i have of this model so i suspect i am his first owner, and it never left my "server room"(more of a storage for various electrical things). It is funny what you find out when you investigatefaied computer components.

Currently raid runs on 2 samsung drives with ~26000 and 19000 hours and 2 identical wdc's with 17000 and 900 hours on the clock. They all running on sata 3 speed have the same capacity and cahe size and are of the same firmware revision in pairs.

All the data has been pulled over to external laptop hdd's overnight.

I think if i can hack e-sata port of the motherboard i could run system drive from it and possibly rebuild raid as raid 5 if i find another 500gb drive in my used computers pile. Which isn't unlikely i have surprisingly amount of junk from retired computers and related electronics, mostly 80-250gb though i have some oddball 1tb drives too and i am convinced there was 4 of identical samsung + the one that failed and there should be 4 of those wdc drives too. just need to think where i put them. because i currently don't know where 3 of them are (i have a strong suspicion though) Living with raid 5 would so much less stressfull untill i can get something new. For obvious reason raid 1 or 6 is currently out of the question , I would need a motherboard swap.

kokodin · 8. Februar 2024

Well the ddrescue had been atempted. And i live hard die fast or something , no image just plain disk to disk clone .

Size of the drives is identical to a single byte.

Initially ddrescue designated around 8mb as problematic of which it recovered 6 and slightly less than 2mb was in bad sectors. Therefore i think gone. How much of it was on free space and how much in data holding area i can't tell. But most wear was between 70 and 85% of the drive where read error count jumped from 30 to 130 and first pass finished with only one read error more. (after advanced data recovery pases ddrescue reported around 5000 read errors trough wgole process)

That being said we don't care about how many error it had but did it worked, and yes it did.

Initially array self assembled with the copy but volume remained missing, but this time runing ckdsk on it let it do the job and filesystem is back, mounted and files seems to be on it.

How many of them are corrupted i will only know when i find one, but there is some redundency and i don't really need everything.

There is one conspicuous empty folder, but it might been empty for a while , there is no random extra files, no silly names.

Either checkdisk unceremonially removed all corupted files, or i was lucky

Since schools in my region have very tight budget on non esential equipment, i will probably have no budget for new drives untill december, but i will make regular backups from now on.

The only other option to replace the drives i have now is to swap them with barely used laptop toshibas mq01abd100 of which i have 15 laying around, but that bad idea of itself even in raid 1+0 configuration (server only have 5 sata ports and one is for system boot drive (which is even older) so the size would be still only 1,8TB ) i don't think running laptop drives 24/7 would work long term and they might not even live untill december, but maybe i am biased. Nothind stoping me from using them as backup usb drives though.

For now i think the raid is saved and there is no much more to be done to improve it. I will pull the data of next trough network since there is no other way. And if you think raid 1+0 of 2,5 inch 1tb slightly used laptop drives is better than running raid 0 of ancient desktop drives i might do that.

As for smart sheduled checks i had no idea that is a thing untill recently so of course there was never any of them scheduled or performed. Server wasn't even planed to exist. It was hastly assembled 3 years ago from junked vista computer, got 2 extra gigabit nics, i think at one point it even had 5, but the network is bottlenecked at 1gbps by goverment program "new network equipment", and we ended up with just feeding 3 separate networks at 1gbps instead of bundling the bandwidth into one so harddrives didn't even utilize full raid0 speed over network. And server just kind of remainded operating since one of the teachers asked me to keep it. Probably gobbles power like a champ for a job rawsberry pi would do faster too.

Thanks Krisbee you really saved my bacon this time. if you think my dodgy raid 1+0 of laptop drives is a better option let me know, because otherwise they will just rot in the box

kokodin · 7. Februar 2024

mdadm -E /dev/sdc

Code

root@sejf:/# mdadm -E /dev/sdc
/dev/sdc:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : fa9d9871:e08e0991:28b91686:674acb8f
           Name : sejf.local:magazyn
  Creation Time : Mon Aug 16 09:38:40 2021
     Raid Level : raid0
   Raid Devices : 4

 Avail Dev Size : 976508976 (465.64 GiB 499.97 GB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=0 sectors
          State : clean
    Device UUID : 9ff6f7d0:f6396510:3928587c:d931b21c

    Update Time : Mon Aug 16 09:38:40 2021
  Bad Block Log : 512 entries available at offset 8 sectors
       Checksum : ea05aeb5 - correct
         Events : 0

     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

Alles anzeigen

I will try that rescue cd, but i don't have identical drive, i do however have identical western digital drive with 900 hours online and working fine so it may be a good option. Since they are all striped equaly , valid data block should be the same size on each drive as the smallest one in the array., But they both exacly the same size jurdging by the smart report.

here some informations i can gather from smart page in the gui (part of it , because message go over the character limit of single post)

Code

smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.10.0-0.bpo.15-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F3
Device Model:     SAMSUNG HD502HJ
Serial Number:    S20BJ9GB601983
LU WWN Device Id: 5 0024e9 20575d939
Firmware Version: 1AJ10001
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 1.5 Gb/s
Local Time is:    Wed Feb  7 10:17:35 2024 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is:     254 (maximum performance), recommended: 254
APM level is:     254 (maximum performance)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         ( 4740) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  79) minutes.
SCT capabilities:            (0x003f)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR-K   100   100   051    -    7195
  2 Throughput_Performance  -OS--K   252   252   000    -    0
  3 Spin_Up_Time            PO---K   083   073   025    -    5402
  4 Start_Stop_Count        -O--CK   100   100   000    -    249
  5 Reallocated_Sector_Ct   PO--CK   252   252   010    -    0
  7 Seek_Error_Rate         -OSR-K   252   252   051    -    0
  8 Seek_Time_Performance   --S--K   252   252   015    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    67731
 10 Spin_Retry_Count        -O--CK   252   252   051    -    0
 11 Calibration_Retry_Count -O--CK   252   252   000    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    382
191 G-Sense_Error_Rate      -O---K   100   100   000    -    8
192 Power-Off_Retract_Count -O---K   252   252   000    -    0
194 Temperature_Celsius     -O----   064   050   000    -    27 (Min/Max 14/56)
195 Hardware_ECC_Recovered  -O-RCK   100   100   000    -    0
196 Reallocated_Event_Count -O--CK   252   252   000    -    0
197 Current_Pending_Sector  -O--CK   100   100   000    -    8
198 Offline_Uncorrectable   ----CK   252   252   000    -    0
199 UDMA_CRC_Error_Count    -OS-CK   091   091   000    -    5113
200 Multi_Zone_Error_Rate   -O-R-K   100   100   000    -    1852
223 Load_Retry_Count        -O--CK   252   252   000    -    0
225 Load_Cycle_Count        -O--CK   100   100   000    -    397
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      2  Comprehensive SMART error log
0x03       GPL     R/O      2  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      2  Extended self-test log
0x08       GPL     R/O      2  Power Conditions log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (2 sectors)
Device Error Count: 11273 (device log contains only the most recent 8 errors)
    CR     = Command Register
    FEATR  = Features Register
    COUNT  = Count (was: Sector Count) Register
    LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    LH     = LBA High (was: Cylinder High) Register    ]   LBA
    LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    LL     = LBA Low (was: Sector Number) Register     ]
    DV     = Device (was: Device/Head) Register
    DC     = Device Control Register
    ER     = Error register
    ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11273 [0] occurred at disk power-on lifetime: 2195 hours (91 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 08 00 00 00 04 08 00 e0 00  Error: UNC 8 sectors at LBA = 0x00040800 = 264192

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  c8 00 00 00 08 00 00 00 04 08 00 e0 08     00:00:01.002  READ DMA
  b0 00 da 00 00 00 00 00 c2 4f 00 00 08     00:00:01.002  SMART RETURN STATUS
  27 00 00 00 00 00 00 00 00 00 00 e0 08     00:00:01.002  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 00 00 00 00 00 a0 08     00:00:01.002  IDENTIFY DEVICE
  ef 00 03 00 46 00 00 00 00 00 00 a0 08     00:00:01.002  SET FEATURES [Set transfer mode]

Alles anzeigen

kokodin · 6. Februar 2024

I am pretty sure the data is byebye then. Because fsck thinks it is ext4 as it should but then spit out input output error while trying to open /dev/md0. I think if there is no way to safely probe the drive outside the raid or clone raw to another drive it will simply refuse access due to being dificult :], some samsung self defense or simply drive being broken. it may be just that smart makes it slow (some kind of recovery mode) and unable to synch up with the remaining 3 while still works as a basic storage device thus raid sees it as working.

I totaly agree it is stupid to make raid volume like this, and i had known the risks while making it. But it was made to test structural network troughput over the school and just kind of remain active due to docker serving some other functions on the same server. It mostly works as a clonezilla short term backup server for images of other computers and file server for providing installation images. So i wouldn't lie i am a bit salty about it not working. but it might be time to build recovery images from scratch after 5 years of junk acumulation. The only thing that might been valuable is the stuff that some teacher might put there for safekeeping , ironically.

thanks for help anyway

I will remain open to any ideas till thursday if there is anything else we can do with it, like special blend of clonezilla or something to a single clone drive if possible , but i think on friday the case will be close and i would have to put in place something else.

kokodin · 5. Februar 2024

Hello i'm just a stupid admin / electrician, so what i say might be incohearent for an actual tech person. Please forgive my lack of technical language.

After powerlose my homebrew fileserver at work have a missing filesystem. I am pretty sure it might have something to do with smart status of one drive being "few bad sectors" but raid does assemble and work as "clean" status with no drives missing, only the filesystem is missing.

Required commands outputs

Code

root@sejf:/# cat /proc/mdstat
Personalities : [raid0] [linear] [multipath] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid0 sdc[0] sdb[3] sdd[2] sde[1]
      1953017856 blocks super 1.2 512k chunks

unused devices: <none>

Code

root@sejf:/# blkid
/dev/sde: UUID="fa9d9871-e08e-0991-28b9-1686674acb8f" UUID_SUB="29e259c0-8d5c-d4fe-ec7d-2377526a6590" LABEL="sejf.local:magazyn" TYPE="linux_raid_member"
/dev/sdb: UUID="fa9d9871-e08e-0991-28b9-1686674acb8f" UUID_SUB="4bdc5a68-0a08-e92b-acbd-54f80eac2716" LABEL="sejf.local:magazyn" TYPE="linux_raid_member"
/dev/sda1: UUID="390b1de1-90ce-4894-95a4-90e34d2aec8c" TYPE="ext4" PARTUUID="6e05c9d1-01"
/dev/sda5: UUID="4d880235-05ac-4272-b613-34b130d10b07" TYPE="swap" PARTUUID="6e05c9d1-05"
/dev/sdc: UUID="fa9d9871-e08e-0991-28b9-1686674acb8f" UUID_SUB="9ff6f7d0-f639-6510-3928-587cd931b21c" LABEL="sejf.local:magazyn" TYPE="linux_raid_member"
/dev/sdd: UUID="fa9d9871-e08e-0991-28b9-1686674acb8f" UUID_SUB="3b7daa3f-36ed-50f8-b00b-feae1dff8b4c" LABEL="sejf.local:magazyn" TYPE="linux_raid_member"

Code

root@sejf:/# fdisk -l | grep "Disk "
Disk /dev/sde: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDC WD5000AAKX-0
Disk /dev/sdb: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: SAMSUNG HD502HJ
Disk /dev/sda: 74,5 GiB, 80026361856 bytes, 156301488 sectors
Disk model: WDC WD800JD-00MS
Disk identifier: 0x6e05c9d1
Disk /dev/sdc: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: SAMSUNG HD502HJ
Disk /dev/sdd: 465,8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: SAMSUNG HD502HJ

Alles anzeigen

Code

root@sejf:/# cat /etc/mdadm/mdadm.conf
# This file is auto-generated by openmediavault (https://www.openmediavault.org)
# WARNING: Do not edit this file, your changes will get lost.

# mdadm.conf
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default, scan all partitions (/proc/partitions) for MD superblocks.
# alternatively, specify devices to scan, using wildcards if desired.
# Note, if no DEVICE line is present, then "DEVICE partitions" is assumed.
# To avoid the auto-assembly of RAID devices a pattern that CAN'T match is
# used if no RAID devices are configured.
DEVICE partitions

# auto-create devices with Debian standard permissions
CREATE owner=root group=disk mode=0660 auto=yes

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# definitions of existing MD arrays
ARRAY /dev/md0 metadata=1.2 name=sejf.local:magazyn UUID=fa9d9871:e08e0991:28b91686:674acb8f

Alles anzeigen

Code

root@sejf:/# mdadm --detail --scan --verbose
ARRAY /dev/md0 level=raid0 num-devices=4 metadata=1.2 name=sejf.local:magazyn UUID=fa9d9871:e08e0991:28b91686:674acb8f
   devices=/dev/sdb,/dev/sdc,/dev/sdd,/dev/sde

Code

root@sejf:/# mdadm -D /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Mon Aug 16 09:38:40 2021
        Raid Level : raid0
        Array Size : 1953017856 (1862.54 GiB 1999.89 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

       Update Time : Mon Aug 16 09:38:40 2021
             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

        Chunk Size : 512K

Consistency Policy : none

              Name : sejf.local:magazyn
              UUID : fa9d9871:e08e0991:28b91686:674acb8f
            Events : 0

    Number   Major   Minor   RaidDevice State
       0       8       32        0      active sync   /dev/sdc
       1       8       64        1      active sync   /dev/sde
       2       8       48        2      active sync   /dev/sdd
       3       8       16        3      active sync   /dev/sdb

Alles anzeigen

It is a simple 4 drives striped volume i wanted to pull data from and rebuild it with a new set of drives, but if i can't mount filesystem data is as good as lost. it is not a critical storage, but some data could needs to be onlineby next week from a local computer backup. But i have no idea what users put there.

It is used as a fileserver for a small school, to share files between computer classrooms, so students or teachers might have put something up there without me knowing and there is no backup then.

One thing that might be important i might add, there is no error messages on terminal window, but local monitor there is one repeated message

Code

blk_update_request: i/o error, dev/sdc sector 264192 op 0x0 read flags 0x80700 phys seg 1 prio class 0
blk_update_request: i/o error, dev/sdc sector 264192 op 0x0 read flags 0x0 phys seg 1 prio class 0
buffer i/o error on dev md0 logical block 0 async page read

I assume it is bad since this raid has no redundancy , but i am not linux educated person so there might ba a simple fix like scandisc, pray , copy, and run away.

Beiträge von kokodin

migrating working installation from ide to ahci

Raid 0 working, as clean but filesystem is "missing"

Raid 0 working, as clean but filesystem is "missing"

Raid 0 working, as clean but filesystem is "missing"

Raid 0 working, as clean but filesystem is "missing"

Raid 0 working, as clean but filesystem is "missing"

Raid 0 working, as clean but filesystem is "missing"

Raid 0 working, as clean but filesystem is "missing"