Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Backblaze Hard Drive Stats for 2016 (backblaze.com)
325 points by ingve on Jan 31, 2017 | hide | past | favorite | 132 comments


Backblaze's environment is abnormal. They do very little to dampen vibration amongst other things, based on their last pod design. That puts drives under stresses that don't reflect what you're likely to see in your environments.

What you're seeing is not the most reliable, but the most reliable under their particular set of bad conditions. The reliability under your circumstances might be wildly different.

Their pod approach even results in worse performance: more vibration results in more effort spent by the drives trying to keep the head positioned in the right places, increasing latencies and decreasing read performance. The higher density they push for with their pods actually compounds this. Drives vibrate. Fans vibrate (chassis, PSU, CPU, the works) . If you don't pad them all appropriately, the vibration of the components ends up causing the entire rack to vibrate. By the time you've got several dozen of drives in the rack, the whole rack will be feeling the effect.


"The reliability under your circumstances might be wildly different."

No. I've seen this argument raised a few times whenever hard drive failure rates comparing manufacturers has occurred. (Previously on HN: https://news.ycombinator.com/item?id=7119323 ) Backblaze puts more stress on their drives than typical users, but a drive which works well in their environment is also very likely to do the same in a less stressful one. This is how accelerated life testing is done.

https://en.wikipedia.org/wiki/Accelerated_life_testing

Their pod approach even results in worse performance

Not once in the data presented in the article is performance mentioned. This is not about performance, it's about reliability. It may be anecdotal, but the numbers I see correlate well with my experience and many others' experiences I've heard of.

More amusingly, I remember coming across a forum discussing data recovery, which happened to be split by manufacturer, and noticing the Seagate forum had an overwhelmingly large number of topics containing people asking for help with their dead disks relative to the WD, Toshiba, Hitachi, etc., which was disproportionate to their marketshare.


That was my understanding too, but is it sure that there's no counter intuitive factor possible ?


well, if a vibration free environment led to failures being rare enoguh that the difference was immaterial, that would be good to know. you could save money or opt for better.perf at the same price.


At Google my colleague developed a simple test that made vibration effects very obvious (this was in 2008). In a server with a bunch of disks, do random reads or writes on all but one disk. On the final disk, do sequential reads. Back then you could get on the order of 100MB/s sequential read speed on the outer edge of a hard drive in ideal conditions. Under this test performance would dive to something like 10MB/s or worse. All of the seeking would cause vibration, which caused misreads on the disk under test. When a bad read happens a hard drive has to wait until the next rotation to try again. We eventually fixed this with better damping.


I am that colleague. While it great that Backblaze releases this information, there are many factors that you need to keep in mind when looking at this data.

With respect to vibration: we found vibration caused by adjacent drives in some of our earlier drive chassis could cause off-track writes. This will cause future reads to the data to return uncorrectable read errors. Based on Backblaze's methodology they will likely call out these drives as failed based on SMART or RAID/ReedSolomon sync errors.


HN is amazing.


Threads like this are what it's all about :)


Does Google use off the shelf drives like these guys?


I don't think I can comment on that! Sorry. Someone who is working on storage at Google today would have a better idea of what's public vs not.


I think it's been known for a while! Here's an old video of a guy inducing read latency spikes by shouting at some servers: https://www.youtube.com/watch?v=tDacjrSCeq4


"a guy" in this case being Brendan Gregg, http://www.brendangregg.com/, one of the prominent experts in server performance!


I had an argument from a user who wanted consumer drives instead of "RAID edition drivers". So I ran a test in a 1U server with 8 or so 15k rpm fans.

With consumer drive outside the case, on a bench: http://broadley.org/bill/consumer-no-vibration.png

Inside: http://broadley.org/bill/consumer.png

Server drive inside: http://broadley.org/bill/server.png

We went with the RAID edition.


>What you're seeing is not the most reliable, but the most reliable under their particular set of bad conditions. The reliability under your circumstances might be wildly different.

You do have a valid point but I also wanted to add that my harddrive failures in my gentle home environment matches BackBlaze's more hostile environment. I have 50+ harddrives in various capacities from 2TB to 10TB and my differences from BB include drives being iso-mounted on rubber grommets and they also do not run 24/7. (I only power them up when I need archived files off of them.)

Even with those stark operational differences, my harddrive failures exactly match the generalization of Backblaze's data: More failures with WD Green and Seagates especially the 3TB drives. My HGST drives purchased over the last 4 years have zero failures so far.


Curious, what are you doing with so much storage at home?



uhh, videos of the adult type...


> They do very little to dampen vibration amongst other things, based on their last pod design.

Yev from Backblaze -> In Storage Pod 6 we actually did a lot more to work on the dampening (and 5.0 as well, but 6.0 builds on it). The drives are all placed in to guided rows and are held down by individuals lids w/ tension on them. You can take a gander at the latest build here -> https://www.backblaze.com/blog/open-source-data-storage-serv...


Does an average server have much to dampen vibration? Something like [1]. I occasionally replace drives in similar machines, and I haven't noticed anything that looks like it helps with vibration -- but I don't look very much.

[1] http://www.dell.com/us/business/p/poweredge-r530/pd?ref=PD_O...


Average servers don't tend to have much.

Storage servers do tend to do stuff to dampen vibration, but then it's a question of storage density. When you've got something like a backblaze pod that's 60 drives in 4U of space. It'd be more unusual to have 60 drives in total in a rack full of normal servers. Now consider 60 drives / 4U, and you'll have multiple of these storage pods in a rack. It adds up.


Where can I see a public stats breakdown of similar magnitude of your preferred environment?


Funny how it is this exact blog post, every year, that really sets it in for me how fast the last year has gone by.

Thanks for that, I guess.


Time..../sigh


More over I can always expect "Yev from Backblaze here". Hey Yev, miss you from the mashupfm days!


Heeeey ;-) Yea, I look back fondly on them. Had much less to do, less responsibility - could spend my day frolicking on the internet...no more!


Related topic: I'd like to see an update on the enterprise vs commodity hard drives.

A few years ago, the consensus (and data) was that they were mostly the same thing and it was arguable to pay double for the enterprise one.

But... in recent years, the consumer market have moved toward ever cheaper, slower and power savings hard drives (e.g. reduced RPM and stop the disk when unused for 30s).

That calls for a re evaluation of the situation.


> Related topic: I'd like to see an update on the enterprise vs commodity hard drives.

Yev from Backblaze here -> we wrote about this in 2013 and have honestly not bought too many enterprise drives since, simply because they were more expensive and the benefit was negligible in our usage. Lately though Seagate has had a nice run of enterprise drives and they're well-priced, so we might be giving those more of a shot in the coming months!


Yeah i think pricing is key and it looks like the different cloud storage providers all optimize for different types of storage (hot / cold). backblaze seems to be cheaper when storage is high and data transfer is relatively low (calculated via http://coststorage.com)


How is the benefit negligible? You are running consumer drives that are not designed to be run at a 100% load 100% of the time. The drives are not meant to be spun up constantly; in fact, many of them have mechanisms designed for them to spin down between use and lower power consumption. It's like complaining about a drag racer not being able to haul a ton of bricks without breaking the transmission on a regular basis.


Has anyone done a teardown of two drives and compared the materials? Many enterprise and OEM drives really just seem to be the same hardware as consumer drives, only with different firmware (the "power-saving" spindown) and more rigorous QC.


>How is the benefit negligible?

I believe when you're buying as much drives as they do and produce the kind of reports they do they measure and compare before reaching that conclusion.


The decision can't be made in isolation. They need to ensure that the extra cost of the drives makes up for the increased reliability - that's not necessarily the case.


> The drives are not meant to be spun up constantly;

I think having the drives spin constantly is the best for lifespan, because there are less spinups and thus components wear less.


There should be no wear at all once it's running, read: https://en.wikipedia.org/wiki/Air_bearing

So barring any manufacturing defects that I'm soure would be abundantly clear early on, it's down to logic or electrical failures.


More importantly: shingled recording slows things down as well.


Just deleting/re-writing, yes? I remember reading they were good for storage situations where you just fill the drive with writes and pretty much just read from it for life


Always amazed on how far ahead HGST is in reliability. What are they doing that is 2-6x better in failure rates?


HGST was originally IBM's hard drive business (after selling it to Hitachi, and now WDC). There's probably some heavy duty/reliability traits that's still left in the manufacturing and company culture.


The only thing I remember from IBM is the DeathStar(Deskstar) drives, that ALL failed on me. Even the garbage refurbs they sent back.


Yep, it was especially bad because IBM had the gold standard for drives before that point. They were more expensive, but you paid for the reliability, so people got really angry when they started failing regularly.

From what I can tell, they fixed those problems and went back to the high quality but slightly more expensive drives of old. The most interesting part is that Backblaze still buys mostly Seagates, simply because they are easier to buy in bulk and have a better price. Even though they fail more often the failure rate isn't bad enough to be a problem. Also, Seagate's failure rate has been decreasing as they release new drives.


me too. I wonder if there is a cycle where companies have a terrible reputation then overcompensate, presumably they'll slowly get worse reliability from here on as they focus on costs instead of failure rates.


It would also be interesting to see if the personnel involved in the engineering correlated with failure rates.


Hitachi, in general, has always been known for bullet-proof storage. I was extremely sad to see them sell off that business. Hopefully spinning drives are dead before WDC culture can bring HGST back to mediocre reliability.


I've gone all-in on HGST for personal use because I've seen their results from Backblaze and also have good luck with them so far. I have 14 of their 4TB NAS and 22 of the 6TB NAS.


I'm amazed that personal use requires 188TB of storage.

That's more than my employer's Hadoop cluster...


I could see that getting filled up easily with a vast 1080p TV/Movie collection. Movies coming in at 1080p eats up disk really quickly.

Uncompressed 4k @ 24fps, at 10bit color comes in at 324MB/s, so just being a wedding videographer could be 1.1T per hour of video. Any given small project could eat up 15-20T per project.


I never said I required this much for personal use...I just have it. :)

I started with 12 x 4TB in one NAS and I filled it in a little over 2 years. I just recently built a second NAS with 20 x 6TB to last me hopefully for the next several years.

The other remaining drives unaccounted for are in two workstations for local storage.


Maybe he does his own offsite backups with "personal use" HD's in several physical locations, each in a highly-redundant raid configuration?

Even so it seems like a lot.


Someone's gotta back up that climate data...


Actually...I am doing some of that also.


Small hadron collider?


Perhaps a small hardon collider.

(Context: http://archive.fortune.com/2006/11/30/magazines/fortune/obri... )


I was debating whether I should "favorite" this thread. You made my decision for me, thank you.


Be warned, I got an HGST drive for personal use, happy to pay the premium. It was quite loud, my wife commented from across the room "What is that?". A periodic loud "tick". Turns out it does some kind of calibration and you can't turn it off. Many reviews mentioned it, didn't even look.

I returned it.


WD blacks do that too, although it's more of a grunty, almost snort-like oink.


> 188 TB of storage

> personal use

Who needs that much storage for personal use?


Anyone working with video, 3D animation, storing RAW photography (and taking 100's of photos on a daily basis), and I can imagine a few other purposes. Especially if you want an on-site backup of that data.

I've bought 10TB of storage space every year for the past 3 years and don't see myself stopping. I need the storage space! If these data-heavy hobbies were instead my daytime job I could easily see myself needing 100TB+ in storage (and then 100TB+ of backups). I fill roughly 8TB/year in storage space (4TB of data + 4TB backup data).

Some people believe in redundant backups. I only have redundant backups of extremely important information - which are mostly text documents so don't take up much space. But if someone was storing 3 copies of all of their media assets you end up using a lot of space. For example, 10TB of video turns into 30TB of video. And 180TB is really only 60TB of data, which isn't that much for data heavy hobbies.


Mostly Linux ISOs.


You wouldn't happen to frequent a certain, data focused, subreddit now would you?


All of them?


Someone shooting a lot of 4K or even HD video, for example.


From a little googling, 4k video appears to take about 22GB an hour[1], so 188TB would beover 8,500 hours of 4k video[2]. That's almost an entire year of constant 4k video. That's a lot of storage for personal use.

1: Not sure what encodig

2: Napkin calculation, corrections welcome


Not just encoding, but it also depends on a number of things like color depth, chroma subsampling (4:2:2), and so on. It's not uncommon for a single film to approach half a petabyte of raw in a digital world with a "typical" setup (4:2:2, ProRes maybe, 10-bit, etc). To use an example, I think I read somewhere that Gone Girl shot a few hundred TB for a technically straightforward film. Some cameras, particularly when you start getting to digital cinema (which is > "4K"), can shoot as much as a terabyte and a half per hour. There are 8K sensors in common availability now; RED has one that can shoot 75fps 8K at 2.4:1, which I don't even want to calculate, but that I know for a fact it can't write to its own storage at native bitrate. Productions eat storage these days and moving that data around is a challenge.

Hint, hint, to clever founders: hard drives in Hollywood.


22 GB for an hour of 4k video is fairly compressed. Raw 4k footage runs closer to 300 GB hour.


Don't forget parity drives for availability take away from usable storage.

NAS #1 is 12 x 4TB with 2 drives used for parity.

NAS #2 is 20 x 6TB with 4 drives used for parity.

That takes away roughly 32TB of usable space under zfs. NAS #1 has 40TB usable. NAS #2 has 96TB usable. Total 136TB usable.

The remainder of the drives not accounted in the list above are in workstations for local storage and other random tasks.


> From a little googling, 4k video appears to take about 22GB an hour

That's "home delivery" video, not the stuff you work on which would be far less compressed or even uncompressed, and would include reams of data thrown out entirely at the final production stages.

Think about the difference between "work" (mixing/mastering/production) audio data (uncompressed and at 24 or 32 bits) versus "consumer" audio (mp3/aac at 48/16).


Editing also tends to create a lot of huge temporary and intermediate files that will eat up disk space in a hurry. 188TB is a lot of space, but nowhere near unreasonable for someone doing video work. Plus, as other people have mentioned you need to account for filesystem overhead, redundancy (RAID), backups, TB vs. TiB, and so on.


My 1440p 4:4:4 videogame captures at 60 fps come out to about 700 GB per hour, sometimes more depending on the scene complexity. That would fill up a 200 TB array in no time.

I'd record to ProRes or DNxHR if I could, but there's no viable method for doing that on Windows without using an FFMPEG pipeline (which won't do 4:4:4 color sampling with those codecs and simply uses too much CPU at 4:2:2).


Why do you do that?


I would haircut the number for RAID then probably half it if he uses a backup.


I've done the same, in smaller quantity (12x 2TB drives.) I wouldn't buy any other brand, after having been burned by Seagate and WD too many times. These all run 24x7 on my home network.


It's especially interesting because Western Digital bought HGST in 2012. [0]

To my knowledge, there have been innovations in the spinning rust (ahem hard drive) market since 2012. So why is HGST, a WD subsidiary, still making much more reliable drives?

Different design teams? HGST factories instead of WD?

[0] http://m.theregister.co.uk/2012/03/09/wd_closes_hgst_buy


HGST was acquired by WD, but they have very different manufacturing processes and AFAIK the design teams are still seperate.


I'd love a set of charts made that would plot Hours Runtime vs Failure Rate and Data Transfer Rate vs Failure Rate.

These tell us little about how good these drives hold up, it tells us more of the churn cost that backblaze has.


Has anyone made a failure rate vs current price chart for all drives included in these reports to help guide purchasing decisions for normal people? I'd love to find where the dollar per terabyte sweet spot is, with respect to reliability.


Are there any reasonably scientific "death matches" like these for 2.5" spinning hard drives?

Edit: clarification


The 2.5" market has gone almost entirely SSD.

I, too, would like to see more regular SSD death matches.


Particularly given that almost no SSD review says a single word about reliability, which is my principal concern.


Reviewers can't really believe anything the vendors say and can't afford to run hundreds of SSDs for years to gather real data so there's not much they can say. By the time there is real-world experience with an SSD model it's probably obsolete anyway.


Yep. But that's a problem...


There isn't really a compelling reason to run significant storage off 2.5" HDDs anymore -- before SSDs 10k or 15k 2.5" HDDs made sense (access time), but nowadays...


Here's a compelling reason: cheap, portable mass storage. Eg. USB3/Thunderbolt-powered multi-terabyte drives that fit in a backpack and don't require external power.


A 3.5" drive fits in most backpacks, and it can be powered of USB 3.


Not easily. Even an energy-saving drive will probably draw 15 watts of spinup current.

Though of course you can fit a 12v/2a adapter in the backpack next to the drive.


Does anyone have any insight into why they don't have any 10TB drives running?


Yev from Backblaze here w/ some insight -> Yes! They are expensive. We are looking to test some in our environment very soon, but may not have enough to (we want at least 45) to truly test them. Once they drop further in price and we can afford to drop them in some of our pods we'll be all over it!


Since you have many years of data, do you explicitly factor the observed the failure rate patterns of particular brands into cost models when buying drives? (E.g. more anticipated replacement cost factor when buying Seagate drives.) Or is the failure data purely for curiosity and not fed back into a spreadsheet? In other words, does every cost projection treat every brand with the same predefined anticipated failure rate regardless of brand?


> Since you have many years of data, do you explicitly factor the observed the failure rate patterns of particular brands into cost models when buying drives?

Sometimes. We do take it in to account but if a hard drive that costs less but fails more comes around and the failure rate in our environment is within some "tolerance" we'll get that drive, even if we might see some more failure. So we do use the data to inform our purchasing department of the "future costs" of drives, but a good deal's a good deal :)


I was worried that my question wasn't worded well enough to make sense but your calculation answer was exactly what I was looking for. Very insightful and thanks for sharing the thought process.


I got you :D


What about archive disks? You guys are storing people's backup data, I guess most of them are rarely access them if any. Or you wouldn't need to maintain all of the redundant data online all the time. Would it make sense to use cheap SMR disks like ST8000AS0002?


You'd be surprised! We've restored over 20 Billion files at this point, folks are CONSTANTLY downloading one or two files, and because we encrypt everything before sending it up, when someone browses their tree we have to "find" all of those files, so we do have to have them more readily available!


Thanks Yev! I'm looking forward to those. I didn't realize they were that much more/TB than the 8TB guys.


Yea still pretty pricey! Getting there though! We're waiting :D


Is there a graph somewhere that shows the price/GB vs drive capacity that you would suggest?

I've briefly tried looking for this in the past before but was never able to find a well maintained source.


It's tough b/c we don't really keep track of prices after we buy them. Sometimes we go out and look up different drives on Amazon and add those in, but I don't remember if we keep track of exactly what you're trying to find - sorry! You could always look at the drive model and do a quick search and cross-check with our rates. Just know that your environment is different from ours so it might not be 1:1 :)


If you're still willing to answer questions, what do you do with all the failed drives?


Always! But sometimes I miss the comments b/c I don't an alert when someone responds to me, sorry for the delay, hope you see this. We recycle the drives once we take them out of circulation. First we secure wipe all the of them, and then they get recycled.


How are you generating the raw data? Can you share your scripts that are used to generate it?

I'd be very interested in generating raw data against the disks I maintain as well, might also be able to share it, which might be interesting since its primarily SSDs.


I love these statistics, but I would just like some clarification on what exactly constitutes a "failure". Does the drive have to stop responding completely, or are a few bad sectors enough to consider the drive "bad"?


Yev from Backblaze here -> We wrote a bit about this in our SMART Stats blog post -> https://www.backblaze.com/blog/hard-drive-smart-stats/.


Thanks!


Why do they buy such weird proportions of each type of harddrive? Do they just use anything they can get their hands on? If the purpose was actually to test drive failure rates wouldn't you want to buy similar amounts of every drive?


Drive failure rates can be computed across different batch sizes, provide the batches are big enough to provide meaningful numbers. There's no need to have the batches be exactly the same size to get useful and accurate data.


Sure, but a drive count of 45 is not very reliable.


a pod was 45 drives, so everything they do was in pod units. they would want at lest 45-90 of a drive to have a complete set.

pods are 60 drives per unit now.


They typically purchase the best $ per MB. Even "shucking" external hard drives bought at Costco after Thailand experienced a flood that took offline several hard drive manufacturing plants. [1]

[1] https://www.backblaze.com/blog/backblaze_drive_farming/


Who owns the company that produces the HGST 3.5" drives? (the former Hitachi drives)

Is it Western Digital? Is it Toshiba? someone else? (afaik the HGST 2.5 and 3.5 production lines were sold to different companies -right?)



Thanks a lot, it's indeed complicated.


WDC owns HGST.


Does anyone has any experience on the reliability of the Seagate 8TB archive drives? They are incredibly cheap for cold storage, but a bit concerned by the Seagate numbers in the Backblaze stats...


Yes, I run about 800 of them now. They have not been as reliable as the 6TB archive drives. I think this was related to a bad run that seagate had while manufacturing. Our numbers, overall (with 6,000+ drives now, if I remember correctly) are much lower than Backblaze's reported failures.

We also classify a drive as failed when it throws ANY SMART error, then it's diagnosed with Seagate's internal tool (which I believe they are open sourcing, if they haven't already) and put back into production

Seagate has given us good warranties on these drives as well. Yes a higher immediate failure rate (first 1 month) with the 8TB drives was annoying, but they more than made it right replacing the drives.


Thanks!


My anecdata is that I bought one for home use, and it failed after a couple of months. I sent it back for replacement, and the replacement failed after a couple of months. So I gave up, and got something else instead.


I remember a bunch of people saying that seagate's drives with 15% failure rates were a 'one time thing'.


Was not surprised to see Seagate leading the failure rate. Had a portable 2TB seagate drive fail within weeks. There goes my vacation photos and videos!


I believe the SG failure stats are largely dominated by a specific model line which is no longer produced. I bet SG will recover next year when we see these. Keep in mind these stats reflect the AFR of drives, which in some case can be several years worth of data.

I wish backblaze would share some data around accelerated lifecycle testing.


You could argue that Seagate has already recovered. That 4TB Seagate model (ST4000DM000) is doing pretty well. From the chart, it's Backblaze's most popular model by far, and they've been using it for at least 2 years. Its failure rate is lower than most WD models they have.


Yes. From my experience they have though the performance of those 2Ts definitely hurt their reputation a ton. Might be a while before SG becomes known as the standard for quality again.


I had four of the Seagate 3TB drives that are listed in their ratings with high failure rates. All four of mine died in less than 3-years. They were a really disappointing purchase.


I had 4 Seagate drives of varying capacity and model numbers die on me within 11 weeks of each other, all were less than 3 years old and one of them was in use for less than an hour.

Never again.


Same - had two and both died. No recourse for an obviously defective product.


It looks like more recent models have a lower failure rate than WD's.


my two seagate drives in my nas failed, one in 09/16 and one 01/17. thank god for raid. i replaced them with bigger seagate drives. maybee i should not.


remember raid != backup. you and everyone who has valuable data should have offsite storage separate from your nas.


Sounds like you might be able to benefit from a cloud backup solution....

In all seriousness though, I use backblaze and while I haven't had any HDD failures I have gone back to check on the backup from time to time and it's always looking good.


All portable hdds die quickly, no fans and they're moved a lot.

My photos are a) still on my sd card until it's full, b) on my raidz1 and raidz2 drives on my nas, c) backed up to flickr, free 1TB storage using https://github.com/richq/folders2flickr

Having only 1 copy of something that's easy to perfectly replicate is very silly.


What do you mean by “moved a lot”? I assumed the HDD heads are moved to a parking position when the disk is turned off to not accidentally slam into the disks.

Portable means the storage has a compact form factor and a case so you can carry it around (turned off). I have never seen anyone actually use a portable disk while moving. Everyone puts them on their desk next to the laptop. The only thing I can imagine is working on a train.


Even the 2.5" ones inside laptops that are basically desktop replacements and run cool enough seem to fail a lot compared to their 3.5" brothers.

Good thing SSDs exist now.


Accidentally knocked, or dragged by the cable because the laptop was moved a bit.


"easy"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: