Do you have a drive rotation schedule? 24 drives. Same model. Likely the same ba...

otras · on Sept 14, 2024

Reminds me of the HN outage where two SSDs both failed after 40k hours: https://news.ycombinator.com/item?id=32031243

throwaway48476 · on Sept 14, 2024

That's a firmware bug, not wear.

hawk_ · on Sept 14, 2024

Yes and risk management dictates diversification to mitigate this kind of risk as well.

generalizations · on Sept 14, 2024

For one reason or another, the drives tended to age out at the same time. Firmware bugs are just hardware failures for solid state devices.

tcdent · on Sept 14, 2024

bug or feature?

sschueller · on Sept 14, 2024

Reminds me of the time back in the day when Dell shipped us a server with drives serial numbers being consecutive.

Of course both failed at the same time and I spent an all nighter doing a restore.

jll29 · on Sept 14, 2024

I ordered my NAS drives on Amazon, to avoid getting the same batch (all consecutive serial numbers) I used amazon.co.uk for one half and amazon.de for the other half of them. One could also stage the orders in time.

Tempest1981 · on Sept 14, 2024

Back in the day, I remember driving to different Frys and Central Computers stores to get a mix of manufacturing dates.

orbital-decay · on Sept 14, 2024

Yeah, the risk of the rest of the old drives failing under high load while rebuilding/restoring is also very real, so staging is necessary as well.

I don't exactly hoard data by dozens of terabytes, but I rotate my backup drives each few years, with a 2-year difference between them.

winrid · on Sept 14, 2024

This. I had just two drives in raid 1, and the 2nd drive failed immediately after silvering a new drive to re-create the array. very lucky :D

londons_explore · on Sept 14, 2024

Software bugs might cause that (eg. drive fails after exactly 1 billion IOPS due to some counter overflowing). But hardware wear probably won't be as consistent.

lazide · on Sept 14, 2024

That depends entirely on how good their Q&A and manufacturing quality is - the better it is, the more likely eh?

Especially in an array where it’s possible every drive operation will be identical between 2 or 3 different drives.

Tinned_Tuna · on Sept 14, 2024

I've seen this happen to a friend. Back in the noughties they built a home NAS similar to the one in the article, using fewer (smaller) drives. It was in RAID5 configuration. It lasted until one drive died and a second followed it during the rebuild. Granted, it wasn't using ZFS, there was no regular scrubbing, 00s drive failure rates were probably different, and they didn't power it down when not using it. The point is the correlated failure, not the precise cause.

Usual disclaimers, n=1, rando on the internet, etc.

layer8 · on Sept 14, 2024

This is the reason why I would always use RAID 6. A second drive failing during rebuild is significantly likely.

SlightlyLeftPad · on Sept 14, 2024

You’re far better off having two raids, one as a daily backup of progressive snapshots that only turns on occasionally to backup and is off the rest of the time.

layer8 · on Sept 15, 2024

I don’t understand how it is better to have an occasional (= significantly time-delayed) backup. You’ll lose all changes since the last backup. And you’re doubling the cost, compared to just one extra hard drive for RAID 6.

Really important stuff is already being backed up to a second location anyway.

nine_k · on Sept 15, 2024

Fine for on-site backup, but my NAS is mostly for media storage.

Also, many drives are not happy to restart too many times.

louwrentius · on Sept 14, 2024

I bought the drives in several batches from 2 or 3 different shops.

madduci · on Sept 14, 2024

That's why you buy different drives from different stores, so you can reduce the chances to get HDDs from the same batch

flemhans · on Sept 14, 2024

Drive like a maniac to the datacenter to shake 'em up a bit