What I learnt regarding SSD+TRIM and (mdadm)RAID5+LVM+ext4

heady · 29. Januar 2019

Have re-purposed 4x WDC WDS100T1B0A (1TB) SSD Drives running (mdadm)RAID5+LVM+ext4.

This is just a collection of what I learned - so I can find it next time I go looking... and it may also be useful for someone else.

Using re-purposed consumer gear is normally OK for home use - as long as you test, test, test and test some more.
Left the disks running in a (mdadm)RAID5 only array for a few months on light duties - in that time one drive controller died; was replaced under warranty.
Ran the array for a another few months to get past the infant mortality hump.

Setup the latest OMV - setting up full disk (mdadm)RAID5+LVM+ext4 with a 800GB LV for VM images.

However, when checking TRIM support - I get fstrim providing "not supported errors"... which is not surprising given the lsblk-D below (DISC-GRAN/DISC-MAX=0=unsupported):

Code

root@mama:~# lsblk -D
NAME                                     DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda                                             0      512B       2G         0
└─md0                                           0      256K       2G         0
  └─vg_ssd_raid5_1-lv_raid5_xen_guests_1        0        0B       0B         0
sdb                                             0      512B       2G         0
└─md0                                           0      256K       2G         0
  └─vg_ssd_raid5_1-lv_raid5_xen_guests_1        0        0B       0B         0
sdc                                             0      512B       2G         0
└─md0                                           0      256K       2G         0
  └─vg_ssd_raid5_1-lv_raid5_xen_guests_1        0        0B       0B         0
sdd                                             0      512B       2G         0
└─md0                                           0      256K       2G         0
  └─vg_ssd_raid5_1-lv_raid5_xen_guests_1        0        0B       0B         0
sde                                             0      512B       2G         0
├─sde1                                          0      512B       2G         0
├─sde2                                          0      512B       2G         0
└─sde5                                          0      512B       2G         0

Alles anzeigen

The WD drives support Discard:

Code

root@mama:~# !hdparm
hdparm -I /dev/sd[abcde] | grep TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM
           *    Data Set Management TRIM supported (limit 8 blocks)
           *    Deterministic read ZEROs after TRIM

Alles anzeigen

From google-foo research:
LVM has supported Discards since: 2011(?)
(mdadm)RAID456 has supported Discards since: 2016(?)

Tried different kernels - lsblk -D displayed different confusing results including showing support... but fstrim still gave errors.

A lot of the guides unearthed by Google-foo - say that lvm.conf needs to be modified... in reality not. Only needed if you wish for Discards to be issued during lvremove/vgremove. Has no bearing on LVM transparently passing Discards down the stack. Under testing (with my setup) takes ~1.5hrs to fstrim a 800G LV and ~3.5hrs to lvremove a 1.9TB LV. So best to only use LVM issue_discards=1 when you _really_ need to then disable. (i.e. immediately after creating the RAID5+LVM stack for a clean/forced empty array).

Code

/etc/lvm/lvm.conf
issue_discards = 1

After more google-foo - came across https://current.workingdirectory.net/posts/2016/ssd-discard/ which points out devices_handle_discard_safely=Y
I also learn that RAID5 disables this as default as there is no way (mdadm)RAID5 can test the SSD Disks to confirm correct handling so enabling is a manual admin action. Verification/testing is the responsibility of the admin.

Code

/etc/modprobe.d/raid456.conf
options raid456 devices_handle_discard_safely=Y

Any changes to /etc/lvm.conf and/or /etc/modprobe.d/raid456.conf will need the following before any new/different settings take effect:

Code

update-initramfs -u
reboot

Enabled devices_handle_discard_safely can be verified enabled by:

Code

root@mama:~# cat /sys/module/raid456/parameters/devices_handle_discard_safely
Y

and

Code

root@mama:~# lsblk -D
NAME                                     DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sda                                             0      512B       2G         0
└─md0                                           0      256K       2G         0
  └─vg_ssd_raid5_1-lv_raid5_xen_guests_1        0      256K       2G         0
sdb                                             0      512B       2G         0
└─md0                                           0      256K       2G         0
  └─vg_ssd_raid5_1-lv_raid5_xen_guests_1        0      256K       2G         0
sdc                                             0      512B       2G         0
└─md0                                           0      256K       2G         0
  └─vg_ssd_raid5_1-lv_raid5_xen_guests_1        0      256K       2G         0
sdd                                             0      512B       2G         0
└─md0                                           0      256K       2G         0
  └─vg_ssd_raid5_1-lv_raid5_xen_guests_1        0      256K       2G         0
sde                                             0      512B       2G         0
├─sde1                                          0      512B       2G         0
├─sde2                                          0      512B       2G         0
└─sde5                                          0      512B       2G         0

Alles anzeigen

After setting up the RAID5 array - doing something like below will make sure the array is clean/forced empty (may take a _long_ time to complete; so run via screen).

Code

vgcreate vg_realname /dev/md0
lvcreate -l 100%FREE -n lv_trimtest vg_realname
lvremove lv_trimtest

Then test, test, test and test some more before committing real data.

tkaiser · 29. Januar 2019

Zitat von heady

4x WDC WDS100T1B0A (1TB) SSD Drives running (mdadm)RAID5+LVM+ext4

This is really dangerous. You can't adopt concepts made for spinning rust to modern flash storage.

HDD die for completely different reasons than modern flash storage. A HDD will usually die without warning due to physical damage. With flash storage products they will suffer either from firmware bugs or wear out or physical damage as well (power spike or something like this).

The risk of a firmware bug and the principle of wearing out identical with identical access patterns will result in a bunch of identical SSDs dying at (almost) the same time (at least when running identical firmwares). And that's something traditional/anachronistic RAID won't protect against. Same with RAID-1 or mirrors made out of identical SSDs --> bad idea. RAID-5 even makes no sense at all in my opinion...

heady · 30. Januar 2019

Zitat von tkaiser

This is really dangerous. You can't adopt concepts made for spinning rust to modern flash storage.
HDD die for completely different reasons than modern flash storage.

Currently you'd find EMC, NetApp, HDS, HP, Huawei & IBM would seem to disagree...

I'd be interested in any real modeling or actual large scale statistics you may have to substantiate your opinion.

HDDs have firmware bugs too.. always have done.

I agree that HDDs die for different reasons than SSDs - both though, follow the same bathtub life-cycle curve which _can_ be modeled the same way for either.

I have plugged some numbers into a reliability network and I have a result I'm happy with.
I'm happy to redo if you have any real numbers to challenge.

I'll keep doing what I'm doing thanks.

tkaiser · 30. Januar 2019

Zitat von heady

I'll keep doing what I'm doing

https://www.tomshardware.com/n…g-8mb-firmware,13250.html -- this firmware bug affects SSDs losing power (power loss, UPS failure). Try to imagine what happens if you make up an array out of identical SSDs that are all affected by the same bug
https://forums.crucial.com/t5/…-once-an-hour/ta-p/130218 -- this firmware bug affects SSDs running more than 5184 hours then becoming unresponsive until next power cycle. Try to imagine what happens if you make up an array out of identical SSDs that are all affected by the same bug
Insert random firmware bug here. Try to imagine what happens if you make up an array out of identical SSDs that are all affected by the same bug

Up to you to think about. The contractors we work with always ensure that HDDs we put into our arrays are from different batches (they learned from two nasty occasions that led to negotiation problems between Infortrend/EMC RAID controllers and disks that led to a bunch of HDDs being kicked out of arrays at the same time). For whatever reasons they don't do the same with the caching SSDs (might be one of the areas where people only learn by having to make their own experiences)

Jetzt mitmachen!