Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Do you have a drive rotation schedule?

24 drives. Same model. Likely the same batch. Similar wear. Imagine most of them failing at the same time, and the rest failing as you're rebuilding it due to the increased load, because they're already almost at the same point.

Reliable storage is tricky.



Reminds me of the HN outage where two SSDs both failed after 40k hours: https://news.ycombinator.com/item?id=32031243


That's a firmware bug, not wear.


Yes and risk management dictates diversification to mitigate this kind of risk as well.


For one reason or another, the drives tended to age out at the same time. Firmware bugs are just hardware failures for solid state devices.


bug or feature?


Reminds me of the time back in the day when Dell shipped us a server with drives serial numbers being consecutive.

Of course both failed at the same time and I spent an all nighter doing a restore.


I ordered my NAS drives on Amazon, to avoid getting the same batch (all consecutive serial numbers) I used amazon.co.uk for one half and amazon.de for the other half of them. One could also stage the orders in time.


Back in the day, I remember driving to different Frys and Central Computers stores to get a mix of manufacturing dates.


Yeah, the risk of the rest of the old drives failing under high load while rebuilding/restoring is also very real, so staging is necessary as well.

I don't exactly hoard data by dozens of terabytes, but I rotate my backup drives each few years, with a 2-year difference between them.


This. I had just two drives in raid 1, and the 2nd drive failed immediately after silvering a new drive to re-create the array. very lucky :D


Software bugs might cause that (eg. drive fails after exactly 1 billion IOPS due to some counter overflowing). But hardware wear probably won't be as consistent.


That depends entirely on how good their Q&A and manufacturing quality is - the better it is, the more likely eh?

Especially in an array where it’s possible every drive operation will be identical between 2 or 3 different drives.


I've seen this happen to a friend. Back in the noughties they built a home NAS similar to the one in the article, using fewer (smaller) drives. It was in RAID5 configuration. It lasted until one drive died and a second followed it during the rebuild. Granted, it wasn't using ZFS, there was no regular scrubbing, 00s drive failure rates were probably different, and they didn't power it down when not using it. The point is the correlated failure, not the precise cause.

Usual disclaimers, n=1, rando on the internet, etc.


This is the reason why I would always use RAID 6. A second drive failing during rebuild is significantly likely.


You’re far better off having two raids, one as a daily backup of progressive snapshots that only turns on occasionally to backup and is off the rest of the time.


I don’t understand how it is better to have an occasional (= significantly time-delayed) backup. You’ll lose all changes since the last backup. And you’re doubling the cost, compared to just one extra hard drive for RAID 6.

Really important stuff is already being backed up to a second location anyway.


Fine for on-site backup, but my NAS is mostly for media storage.

Also, many drives are not happy to restart too many times.


I bought the drives in several batches from 2 or 3 different shops.


That's why you buy different drives from different stores, so you can reduce the chances to get HDDs from the same batch


Drive like a maniac to the datacenter to shake 'em up a bit




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: