Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask YC: 40Tb in a year. Would you use Amazon?
13 points by inovica on June 26, 2008 | hide | past | favorite | 19 comments
Hi there. We're building a system which stores voice audio files. We're looking at good compression codecs for it (speex) but we're looking at 30 million minutes of audio a month which is looking like 40Tb of data storage a year. These kind of figures scare me a little!! Wondering if you would use Amazon for this or go another route? Just interested to hear from anyone who's doing anything of a similar scale


SmugMug currently hosts 600TB of pictures on Amazon S3.

http://gigaom.com/2008/06/25/structure-08-werner-vogels-amaz...

So yeah I'd probably use them.


Here's a S3 calculator that can help you figure out costs:

http://calculator.s3.amazonaws.com/calc5.html

A quick check tells me that ~40,000GB storage is about $6k/month, not including xfer costs.


My first concern would be access to capital.

There are plenty of options with cloud storage.

I hope you have a way to balance that cash flow.


THanks for this and thanks to everyone who commented - I really appreciate it. I'll check out some of the others too. If anyone is interested, I'm starting this with one client already on board, so would be using them to fund it initially. Appreciate that this is a dangerous route, however we don't need to look at loans or other types of funding initially. Thanks again. Ade


40TB?!

Look into other CDNs like Akamai or Limelight. I think the bulk price deal you're going to get with an established CDN is better than the rates you'll get with Amazon flat rates.


If you arent doing that much traffic, amazon is a nice option. Otherwise CDN's will offer you a better price. Check out bit gravity, they are smaller then limelight and akamai but may be able to give you better attention. They also have some nice features the other CDN's arent offering.


Ah, yes, BitGravity was the one I forgot. Diggnation uses them, I believe.


Justin.tv uses Bitgravity to store over 60TB (I think it's much higher but I haven't checked personally in a while). I can't recommend them highly enough; we've had problems at 9pm on a sunday night and gotten the CTO on the phone personally in 5 minutes. BG is flat out the best of any CDN/storage/any server provider we've ever dealt with.


Vimeo currently uses them and College Humor/Todays Big Thing is in the process of moving over there.


We run a small specialized storage company and the things that seem to matter most are: storage capacity, availability, reliability, transfer rates for both current data usage and new data addition.

40Tb can be handled pretty well by S3 and other storage services and they have pretty good pricing information to model your costs. Note that they don't (yet) provide very specific SLA's for data availability, so keep that in mind when designing your system.

Maintaining your own drives with some sort of redundancy (RAID, automatic copies, etc.) or using something like (bias alert) our open-source project http://allmydata.org which is effectively a software RAID layer both require some IT and systems energy, so this has to be bundled into your operational costs if you choose that route.

Just to emphasize what others have mentioned, it is important to incorporate the new data influx rate into your model. If you are successful, 40Tb this year might turn to 120Tb next year, so make sure that your cashflow model can support the underlying cost of whatever system you choose.


Depending on your projections for future growth and how much cash you have, I'd consider opening my own data center for that kind of storage.....


Isn't 40TB just a paltry 40 hard drives (maybe 50 with redundancies)? A dedicated data center seems a bit overblown for that?


Actually, it's more like 50 hard drives without redundancy (keeping in mind TiB vs TB) and at least double that if we're talking quality redundancy (assuming this is a for-profit company hosting people's sound clips then they better have duplicates of everything).

I don't know enough about their model and what services they provide though. I'm thinking 40TB a year storage, but ~100TB transfer per month or more - which isn't a little.

But my biggest point is future expansion. 40TB in year one... how many in year 2?


I have to admit I don't know anything about operating in those dimensions (ianaa - I am not an admin). But HD capacities are growing fast - maybe faster than the data needs of that company? So perhaps they would not have to buy more and more hard drives, only replace the old ones with bigger ones?

All hypothetical, though - personally I think I would go for something like S3.


I havent done anything similar. But just my 2c - try to see if you can make use of existing file sharing systems like rapidshare/megaupload etc and link to them. I believe hotornot guys used free yahoo photos hosting and just linked to them to save on bandwidth(it worked then)


Rapidshare, Megaupload, etc. don't allow audio streaming. Nor do I want to wait 120 seconds and solve a captcha that has to do with finding which letters have cats attached to them as opposed to dogs to get an audio file.


They have premium accounts which doesnt have issues of waiting(not sure about captcha part). You can buy a premium service and in your streaming player just point to the link in megaupload ?


Why is the parent being voted down?

Sure you may not agree with his idea and service suggestions but finding a creative solution to this problem wouldn't hurt. If I remember correctly, the hot or not guy did this (mentioned in Jessica's book). I don't have the book with me so feel free to correct me if I'm wrong.


Do not underestimate the transfer costs. The information you have here - size of data "stored" - is only one factor. You need to have some estimates about your transfer in and out, and that will tell you whether Amazon makes sense or you have to go with a CDN.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: