Backblaze's environment is abnormal. They do very little to dampen vibration amongst other things, based on their last pod design. That puts drives under stresses that don't reflect what you're likely to see in your environments.
What you're seeing is not the most reliable, but the most reliable under their particular set of bad conditions. The reliability under your circumstances might be wildly different.
Their pod approach even results in worse performance: more vibration results in more effort spent by the drives trying to keep the head positioned in the right places, increasing latencies and decreasing read performance. The higher density they push for with their pods actually compounds this. Drives vibrate. Fans vibrate (chassis, PSU, CPU, the works) . If you don't pad them all appropriately, the vibration of the components ends up causing the entire rack to vibrate. By the time you've got several dozen of drives in the rack, the whole rack will be feeling the effect.
"The reliability under your circumstances might be wildly different."
No. I've seen this argument raised a few times whenever hard drive failure rates comparing manufacturers has occurred. (Previously on HN: https://news.ycombinator.com/item?id=7119323
) Backblaze puts more stress on their drives than typical users, but a drive which works well in their environment is also very likely to do the same in a less stressful one. This is how accelerated life testing is done.
Their pod approach even results in worse performance
Not once in the data presented in the article is performance mentioned. This is not about performance, it's about reliability. It may be anecdotal, but the numbers I see correlate well with my experience and many others' experiences I've heard of.
More amusingly, I remember coming across a forum discussing data recovery, which happened to be split by manufacturer, and noticing the Seagate forum had an overwhelmingly large number of topics containing people asking for help with their dead disks relative to the WD, Toshiba, Hitachi, etc., which was disproportionate to their marketshare.
well, if a vibration free environment led to failures being rare enoguh that the difference was immaterial, that would be good to know. you could save money or opt for better.perf at the same price.
At Google my colleague developed a simple test that made vibration effects very obvious (this was in 2008). In a server with a bunch of disks, do random reads or writes on all but one disk. On the final disk, do sequential reads. Back then you could get on the order of 100MB/s sequential read speed on the outer edge of a hard drive in ideal conditions. Under this test performance would dive to something like 10MB/s or worse. All of the seeking would cause vibration, which caused misreads on the disk under test. When a bad read happens a hard drive has to wait until the next rotation to try again. We eventually fixed this with better damping.
I am that colleague. While it great that Backblaze releases this information, there are many factors that you need to keep in mind when looking at this data.
With respect to vibration: we found vibration caused by adjacent drives in some of our earlier drive chassis could cause off-track writes. This will cause future reads to the data to return uncorrectable read errors. Based on Backblaze's methodology they will likely call out these drives as failed based on SMART or RAID/ReedSolomon sync errors.
>What you're seeing is not the most reliable, but the most reliable under their particular set of bad conditions. The reliability under your circumstances might be wildly different.
You do have a valid point but I also wanted to add that my harddrive failures in my gentle home environment matches BackBlaze's more hostile environment. I have 50+ harddrives in various capacities from 2TB to 10TB and my differences from BB include drives being iso-mounted on rubber grommets and they also do not run 24/7. (I only power them up when I need archived files off of them.)
Even with those stark operational differences, my harddrive failures exactly match the generalization of Backblaze's data: More failures with WD Green and Seagates especially the 3TB drives. My HGST drives purchased over the last 4 years have zero failures so far.
> They do very little to dampen vibration amongst other things, based on their last pod design.
Yev from Backblaze -> In Storage Pod 6 we actually did a lot more to work on the dampening (and 5.0 as well, but 6.0 builds on it). The drives are all placed in to guided rows and are held down by individuals lids w/ tension on them. You can take a gander at the latest build here -> https://www.backblaze.com/blog/open-source-data-storage-serv...
Does an average server have much to dampen vibration? Something like [1]. I occasionally replace drives in similar machines, and I haven't noticed anything that looks like it helps with vibration -- but I don't look very much.
Storage servers do tend to do stuff to dampen vibration, but then it's a question of storage density. When you've got something like a backblaze pod that's 60 drives in 4U of space. It'd be more unusual to have 60 drives in total in a rack full of normal servers. Now consider 60 drives / 4U, and you'll have multiple of these storage pods in a rack. It adds up.
Related topic: I'd like to see an update on the enterprise vs commodity hard drives.
A few years ago, the consensus (and data) was that they were mostly the same thing and it was arguable to pay double for the enterprise one.
But... in recent years, the consumer market have moved toward ever cheaper, slower and power savings hard drives (e.g. reduced RPM and stop the disk when unused for 30s).
> Related topic: I'd like to see an update on the enterprise vs commodity hard drives.
Yev from Backblaze here -> we wrote about this in 2013 and have honestly not bought too many enterprise drives since, simply because they were more expensive and the benefit was negligible in our usage. Lately though Seagate has had a nice run of enterprise drives and they're well-priced, so we might be giving those more of a shot in the coming months!
Yeah i think pricing is key and it looks like the different cloud storage providers all optimize for different types of storage (hot / cold). backblaze seems to be cheaper when storage is high and data transfer is relatively low (calculated via http://coststorage.com)
How is the benefit negligible? You are running consumer drives that are not designed to be run at a 100% load 100% of the time. The drives are not meant to be spun up constantly; in fact, many of them have mechanisms designed for them to spin down between use and lower power consumption. It's like complaining about a drag racer not being able to haul a ton of bricks without breaking the transmission on a regular basis.
Has anyone done a teardown of two drives and compared the materials? Many enterprise and OEM drives really just seem to be the same hardware as consumer drives, only with different firmware (the "power-saving" spindown) and more rigorous QC.
I believe when you're buying as much drives as they do and produce the kind of reports they do they measure and compare before reaching that conclusion.
The decision can't be made in isolation. They need to ensure that the extra cost of the drives makes up for the increased reliability - that's not necessarily the case.
Just deleting/re-writing, yes? I remember reading they were good for storage situations where you just fill the drive with writes and pretty much just read from it for life
HGST was originally IBM's hard drive business (after selling it to Hitachi, and now WDC). There's probably some heavy duty/reliability traits that's still left in the manufacturing and company culture.
Yep, it was especially bad because IBM had the gold standard for drives before that point. They were more expensive, but you paid for the reliability, so people got really angry when they started failing regularly.
From what I can tell, they fixed those problems and went back to the high quality but slightly more expensive drives of old. The most interesting part is that Backblaze still buys mostly Seagates, simply because they are easier to buy in bulk and have a better price. Even though they fail more often the failure rate isn't bad enough to be a problem. Also, Seagate's failure rate has been decreasing as they release new drives.
me too. I wonder if there is a cycle where companies have a terrible reputation then overcompensate, presumably they'll slowly get worse reliability from here on as they focus on costs instead of failure rates.
Hitachi, in general, has always been known for bullet-proof storage. I was extremely sad to see them sell off that business. Hopefully spinning drives are dead before WDC culture can bring HGST back to mediocre reliability.
I've gone all-in on HGST for personal use because I've seen their results from Backblaze and also have good luck with them so far. I have 14 of their 4TB NAS and 22 of the 6TB NAS.
I could see that getting filled up easily with a vast 1080p TV/Movie collection. Movies coming in at 1080p eats up disk really quickly.
Uncompressed 4k @ 24fps, at 10bit color comes in at 324MB/s, so just being a wedding videographer could be 1.1T per hour of video. Any given small project could eat up 15-20T per project.
I never said I required this much for personal use...I just have it. :)
I started with 12 x 4TB in one NAS and I filled it in a little over 2 years. I just recently built a second NAS with 20 x 6TB to last me hopefully for the next several years.
The other remaining drives unaccounted for are in two workstations for local storage.
Be warned, I got an HGST drive for personal use, happy to pay the premium. It was quite loud, my wife commented from across the room "What is that?". A periodic loud "tick". Turns out it does some kind of calibration and you can't turn it off. Many reviews mentioned it, didn't even look.
Anyone working with video, 3D animation, storing RAW photography (and taking 100's of photos on a daily basis), and I can imagine a few other purposes. Especially if you want an on-site backup of that data.
I've bought 10TB of storage space every year for the past 3 years and don't see myself stopping. I need the storage space! If these data-heavy hobbies were instead my daytime job I could easily see myself needing 100TB+ in storage (and then 100TB+ of backups). I fill roughly 8TB/year in storage space (4TB of data + 4TB backup data).
Some people believe in redundant backups. I only have redundant backups of extremely important information - which are mostly text documents so don't take up much space. But if someone was storing 3 copies of all of their media assets you end up using a lot of space. For example, 10TB of video turns into 30TB of video. And 180TB is really only 60TB of data, which isn't that much for data heavy hobbies.
From a little googling, 4k video appears to take about 22GB an hour[1], so 188TB would beover 8,500 hours of 4k video[2]. That's almost an entire year of constant 4k video. That's a lot of storage for personal use.
Not just encoding, but it also depends on a number of things like color depth, chroma subsampling (4:2:2), and so on. It's not uncommon for a single film to approach half a petabyte of raw in a digital world with a "typical" setup (4:2:2, ProRes maybe, 10-bit, etc). To use an example, I think I read somewhere that Gone Girl shot a few hundred TB for a technically straightforward film. Some cameras, particularly when you start getting to digital cinema (which is > "4K"), can shoot as much as a terabyte and a half per hour. There are 8K sensors in common availability now; RED has one that can shoot 75fps 8K at 2.4:1, which I don't even want to calculate, but that I know for a fact it can't write to its own storage at native bitrate. Productions eat storage these days and moving that data around is a challenge.
Hint, hint, to clever founders: hard drives in Hollywood.
> From a little googling, 4k video appears to take about 22GB an hour
That's "home delivery" video, not the stuff you work on which would be far less compressed or even uncompressed, and would include reams of data thrown out entirely at the final production stages.
Think about the difference between "work" (mixing/mastering/production) audio data (uncompressed and at 24 or 32 bits) versus "consumer" audio (mp3/aac at 48/16).
Editing also tends to create a lot of huge temporary and intermediate files that will eat up disk space in a hurry. 188TB is a lot of space, but nowhere near unreasonable for someone doing video work. Plus, as other people have mentioned you need to account for filesystem overhead, redundancy (RAID), backups, TB vs. TiB, and so on.
My 1440p 4:4:4 videogame captures at 60 fps come out to about 700 GB per hour, sometimes more depending on the scene complexity. That would fill up a 200 TB array in no time.
I'd record to ProRes or DNxHR if I could, but there's no viable method for doing that on Windows without using an FFMPEG pipeline (which won't do 4:4:4 color sampling with those codecs and simply uses too much CPU at 4:2:2).
I've done the same, in smaller quantity (12x 2TB drives.) I wouldn't buy any other brand, after having been burned by Seagate and WD too many times. These all run 24x7 on my home network.
It's especially interesting because Western Digital bought HGST in 2012. [0]
To my knowledge, there have been innovations in the spinning rust (ahem hard drive) market since 2012. So why is HGST, a WD subsidiary, still making much more reliable drives?
Different design teams? HGST factories instead of WD?
Has anyone made a failure rate vs current price chart for all drives included in these reports to help guide purchasing decisions for normal people? I'd love to find where the dollar per terabyte sweet spot is, with respect to reliability.
Reviewers can't really believe anything the vendors say and can't afford to run hundreds of SSDs for years to gather real data so there's not much they can say. By the time there is real-world experience with an SSD model it's probably obsolete anyway.
There isn't really a compelling reason to run significant storage off 2.5" HDDs anymore -- before SSDs 10k or 15k 2.5" HDDs made sense (access time), but nowadays...
Here's a compelling reason: cheap, portable mass storage. Eg. USB3/Thunderbolt-powered multi-terabyte drives that fit in a backpack and don't require external power.
Yev from Backblaze here w/ some insight -> Yes! They are expensive. We are looking to test some in our environment very soon, but may not have enough to (we want at least 45) to truly test them. Once they drop further in price and we can afford to drop them in some of our pods we'll be all over it!
Since you have many years of data, do you explicitly factor the observed the failure rate patterns of particular brands into cost models when buying drives? (E.g. more anticipated replacement cost factor when buying Seagate drives.) Or is the failure data purely for curiosity and not fed back into a spreadsheet? In other words, does every cost projection treat every brand with the same predefined anticipated failure rate regardless of brand?
> Since you have many years of data, do you explicitly factor the observed the failure rate patterns of particular brands into cost models when buying drives?
Sometimes. We do take it in to account but if a hard drive that costs less but fails more comes around and the failure rate in our environment is within some "tolerance" we'll get that drive, even if we might see some more failure. So we do use the data to inform our purchasing department of the "future costs" of drives, but a good deal's a good deal :)
I was worried that my question wasn't worded well enough to make sense but your calculation answer was exactly what I was looking for. Very insightful and thanks for sharing the thought process.
What about archive disks? You guys are storing people's backup data, I guess most of them are rarely access them if any. Or you wouldn't need to maintain all of the redundant data online all the time. Would it make sense to use cheap SMR disks like ST8000AS0002?
You'd be surprised! We've restored over 20 Billion files at this point, folks are CONSTANTLY downloading one or two files, and because we encrypt everything before sending it up, when someone browses their tree we have to "find" all of those files, so we do have to have them more readily available!
It's tough b/c we don't really keep track of prices after we buy them. Sometimes we go out and look up different drives on Amazon and add those in, but I don't remember if we keep track of exactly what you're trying to find - sorry! You could always look at the drive model and do a quick search and cross-check with our rates. Just know that your environment is different from ours so it might not be 1:1 :)
Always! But sometimes I miss the comments b/c I don't an alert when someone responds to me, sorry for the delay, hope you see this. We recycle the drives once we take them out of circulation. First we secure wipe all the of them, and then they get recycled.
How are you generating the raw data? Can you share your scripts that are used to generate it?
I'd be very interested in generating raw data against the disks I maintain as well, might also be able to share it, which might be interesting since its primarily SSDs.
I love these statistics, but I would just like some clarification on what exactly constitutes a "failure". Does the drive have to stop responding completely, or are a few bad sectors enough to consider the drive "bad"?
Why do they buy such weird proportions of each type of harddrive? Do they just use anything they can get their hands on? If the purpose was actually to test drive failure rates wouldn't you want to buy similar amounts of every drive?
Drive failure rates can be computed across different batch sizes, provide the batches are big enough to provide meaningful numbers. There's no need to have the batches be exactly the same size to get useful and accurate data.
They typically purchase the best $ per MB. Even "shucking" external hard drives bought at Costco after Thailand experienced a flood that took offline several hard drive manufacturing plants. [1]
Does anyone has any experience on the reliability of the Seagate 8TB archive drives? They are incredibly cheap for cold storage, but a bit concerned by the Seagate numbers in the Backblaze stats...
Yes, I run about 800 of them now. They have not been as reliable as the 6TB archive drives. I think this was related to a bad run that seagate had while manufacturing. Our numbers, overall (with 6,000+ drives now, if I remember correctly) are much lower than Backblaze's reported failures.
We also classify a drive as failed when it throws ANY SMART error, then it's diagnosed with Seagate's internal tool (which I believe they are open sourcing, if they haven't already) and put back into production
Seagate has given us good warranties on these drives as well. Yes a higher immediate failure rate (first 1 month) with the 8TB drives was annoying, but they more than made it right replacing the drives.
My anecdata is that I bought one for home use, and it failed after a couple of months. I sent it back for replacement, and the replacement failed after a couple of months. So I gave up, and got something else instead.
Was not surprised to see Seagate leading the failure rate. Had a portable 2TB seagate drive fail within weeks. There goes my vacation photos and videos!
I believe the SG failure stats are largely dominated by a specific model line which is no longer produced. I bet SG will recover next year when we see these. Keep in mind these stats reflect the AFR of drives, which in some case can be several years worth of data.
I wish backblaze would share some data around accelerated lifecycle testing.
You could argue that Seagate has already recovered. That 4TB Seagate model (ST4000DM000) is doing pretty well. From the chart, it's Backblaze's most popular model by far, and they've been using it for at least 2 years. Its failure rate is lower than most WD models they have.
Yes. From my experience they have though the performance of those 2Ts definitely hurt their reputation a ton. Might be a while before SG becomes known as the standard for quality again.
I had four of the Seagate 3TB drives that are listed in their ratings with high failure rates. All four of mine died in less than 3-years. They were a really disappointing purchase.
I had 4 Seagate drives of varying capacity and model numbers die on me within 11 weeks of each other, all were less than 3 years old and one of them was in use for less than an hour.
my two seagate drives in my nas failed, one in 09/16 and one 01/17. thank god for raid. i replaced them with bigger seagate drives. maybee i should not.
Sounds like you might be able to benefit from a cloud backup solution....
In all seriousness though, I use backblaze and while I haven't had any HDD failures I have gone back to check on the backup from time to time and it's always looking good.
All portable hdds die quickly, no fans and they're moved a lot.
My photos are a) still on my sd card until it's full, b) on my raidz1 and raidz2 drives on my nas, c) backed up to flickr, free 1TB storage using https://github.com/richq/folders2flickr
Having only 1 copy of something that's easy to perfectly replicate is very silly.
What do you mean by “moved a lot”? I assumed the HDD heads are moved to a parking position when the disk is turned off to not accidentally slam into the disks.
Portable means the storage has a compact form factor and a case so you can carry it around (turned off). I have never seen anyone actually use a portable disk while moving. Everyone puts them on their desk next to the laptop. The only thing I can imagine is working on a train.
What you're seeing is not the most reliable, but the most reliable under their particular set of bad conditions. The reliability under your circumstances might be wildly different.
Their pod approach even results in worse performance: more vibration results in more effort spent by the drives trying to keep the head positioned in the right places, increasing latencies and decreasing read performance. The higher density they push for with their pods actually compounds this. Drives vibrate. Fans vibrate (chassis, PSU, CPU, the works) . If you don't pad them all appropriately, the vibration of the components ends up causing the entire rack to vibrate. By the time you've got several dozen of drives in the rack, the whole rack will be feeling the effect.