A history of failure

    This site uses cookies. By continuing to browse this site, you are agreeing to our Cookie Policy.

    • A history of failure

      Hi,


      I just wanted to share my history of "recovering" a RAID5 volume which was degraded. I am using my NAS described in this post Homemade Home NAS - (Large pictures!)

      So, some weeks ago as I was in holidays, I received a mail from my NAS telling me the RAID5 was clean but degraded maybe because of my /dev/sdh disk.


      This is an automatically generated mail message from mdadm
      running on BioNAS

      A Fail event had been detected on md device /dev/md/BioNAS:MasterRaid.

      It could be related to component device /dev/sdh.

      Faithfully yours, etc.

      P.S. The /proc/mdstat file currently contains the following:

      Personalities : [raid6] [raid5] [raid4]
      md127 : active raid5 sdb[0] sdh[6](F) sdg[5] sdf[4] sde[3] sdd[2] sdc[1]
      17581590528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [7/6] [UUUUUU_]

      unused devices: <none>


      I verified this message on the OMV interface, confirmed something was bad and as I wanted not to continue in this state, I chose to shut the NAS down.
      Coming back from my holdays, I went to the computer store and bought a spare 3 To Seagate disk. At home, I plugged it on my NAS, booted my NAS and went on another computer in my house (my NAS dosen't have a screen attached to it).

      I could no connect to the web interface. No success neither with the SSH... the NAS was not even pingable. I came back to my NAS, plugged a CRT screen to it.. but black screen. I changed the CRT screen with a recent one working I was sure to be working... still black screen.


      Ok... I unplugged all my hdds, removed my two IBM RAID controllers and installed a PCIe graphic card instead of the very old Matrox Millunium. Still NOT BOOTING. Grrr...

      My mainboard does have two ethernet interfaces. I noticed that neither the eth port of the mainboard nor the one on the eth switch was blinking! But it was ok if I changed to the second NIC interface on my mainboard... I started to worry about a electrical shock (even if all the stuff was plugged on my UPS).


      My last effort was remove one of my RAM stick and...the computer booted. Ok, It was a RAM failure. I suppose no hdd was bad in fact. I started the recovering process of my RAID volume in the webinterface. I received the mail:


      This is an automatically generated mail message from mdadm
      running on BioNAS

      A DegradedArray event had been detected on md device /dev/md/BioNAS:MasterRaid.

      Faithfully yours, etc.

      P.S. The /proc/mdstat file currently contains the following:

      Personalities : [raid6] [raid5] [raid4]
      md127 : active raid5 sdf[7] sde[0] sdc[5] sdb[4] sdi[3] sdh[2] sdg[1]
      17581590528 blocks super 1.2 level 5, 512k chunk, algorithm 2 [7/6] [UUUUUU_]
      [==>..................] recovery = 14.1% (414113612/2930265088) finish=214.9min speed=195073K/sec

      unused devices: <none>


      On my 7 x 3 Tb RAID5 setup, it took about 5 hours to rebuild the spare disk.
      [IMG:http://img4.hostingpics.net/pics/985259recovering1.png]

      I changed the hdd order when I re-plugged them so the trouble is not wit /dev/sdh anymore.

      [IMG:http://img4.hostingpics.net/pics/715114recovering2.png]

      So, everything ends well... Just remember that RAM sticks can fail! I'll check the "faulty" hdd but I'm almost certain it's fine.

      The post was edited 3 times, last by biohazard ().