For the longest time we tried to convince people that they should have an off-amazon archive of their S3 data ... we even ran an ads to that effect in 2012[1].
The (obvious) reason this isn't compelling is the cost of egress. It's just (relatively) too expensive to offload your S3 assets to some third party on a regular basis.
So if R2 is S3 with no egress, suddenly there is a value proposition again.
Further, unlike in 2012, in 2021 we have really great tooling in the form of 'rclone'[2][3] which allows you to move data from cloud to cloud without involving your own bandwidth.
[1] The tagline was "Your infrastructure is on AWS and your backups are on AWS. You're doing it wrong."
> So if R2 is S3 with no egress, suddenly there is a value proposition again.
That doesn't appear to be what they're doing, they don't seem to have changed their existing operating model at all:
> R2 will zero-rate infrequent storage operations under a threshold — currently planned to be in the single digit requests per second range. Above this range, R2 will charge significantly less per-operation than the major providers. Our object storage will be extremely inexpensive for infrequent access and yet capable of and cheaper than major incumbent providers at scale.
What I read this as is "we won't bill you until your traffic spikes, then you'll pay us, oh how you'll pay us"
Transparent bandwidth pricing would be a far more interesting announcement. This is the second post I've seen from CloudFlare in recent months throwing bricks at AWS over bandwidth pricing, while failing to mention CloudFlare bandwidth is some of the most expensive available.
The way I read it is that for low-scale users, they're not going to have request pricing. For higher-scale users, "R2 will charge significantly less per-operation than the major providers". AWS charges $0.0004 per thousand GET requests. Let's say that R2 charges $0.0003 per thousand GET requests. That's still cheaper than AWS or Backblaze's B2 (even if just barely) and if they're not charging for bandwidth, then it's really cheap.
The announcement says that they're eliminating bandwidth charges three times.
I don't know the whole economics around cloud storage and bandwidth so maybe this is unrealistic pricing and your suspicions are well founded. However, Backblaze seems is offering storage at $0.005/GB and bandwidth at $0.01/GB. Cloudflare is charging 3x more than Backblaze for the storage and $0 for the bandwidth. Given that Cloudflare's costs are probably lower than Backblaze for bandwidth, that doesn't seem so unreasonable - but I could be very wrong.
I think Cloudflare probably sees R2 as something that is sustainable, but creates demand for their enterprise products. You start The NextBigThing with R2 and suddenly your application servers are under attack. You have a relationship with Cloudflare, you're used to their control panel, you trust them, and when you're at the scale that you're getting attacked like this you can drop $10,000/mo because you're bringing in a bunch of revenue - $10,000/mo is less than 1 software engineer in the US.
R2, in a certain way, can be a marketing tool. "Come use our S3 competitor with free bandwidth rather than getting locked into AWS's transfer pricing." 6-12 months go by and you're substantially larger and want more complex stuff and you're already getting emails from Cloudflare about their other offerings, you see them in the control panel, etc.
It seems like Cloudflare might be trying to move in on AWS's market. R2 is an easy way for them to do it. It seems like S3 has high margins. Competing storage services can be a fraction of the cost per GB and AWS's bandwidth markup is incredibly high. If you're looking to attack a competitor's market, it seems like going after one of their highest-margin product could make the most sense. Again, R2 becomes a marketing tool for future cloud offerings.
Part of Cloudflare's strategy might be targeting things that they see very high margins on and being willing to accept lower margins. If something has 50% margins and you're willing to accept 20% margins, you're still doing pretty great. Plus, over time, the cost of hardware comes down and you can keep your prices at the same level once people are happily inside your ecosystem and don't want to deal with migrations.
> CloudFlare bandwidth is some of the most expensive available
It sounds like you might have gotten burned by something with Cloudflare. I don't have any horror stories, but I'm always interested in new data points if you have them.
> Given that Cloudflare's costs are probably lower than Backblaze for bandwidth, that doesn't seem so unreasonable - but I could be very wrong.
At scale, bandwidth capacity purchases are symmetric - you buy the same amount up as you do down. As a provider of DDOS protection services, Cloudflare has to maintain a huge amount of ingress capacity - meaning they have a ton of egress capacity sitting unused.
This is about the operation, not about bandwidth the way that I read it. All providers have prices for bandwidth and prices for different "tiers" of operations (store, retrieve, delete, list, etc). The way I read it is that bandwidth is always 100% free, and storage operations are free under a certain threshold. I hope I'm right ;)
This is correct. Bandwidth (ingress and egress) always free, regardless of volume. Transactions free at low volume (~<1/sec) but we’ll charge at higher volumes. Storage we charge for. For both transactions and storage, we aim to be at least 10% less expensive than S3. And, again, for the sake of absolute clarity: egress/ingress always free.
Does this mean we should contact our enterprise account manager regarding our existing spend? For the sake of absolute clarity: we're currently paying for bandwidth
> So if R2 is S3 with no egress, suddenly there is a value proposition again.
Isn't B2 from Backblaze already filling that need? I means more choice is always better for sure, but considering R2 goal seems really to be a CDN more than a backup space and it does feel like their money maker is in the CDN part, not the storage part... I feel like trusting them to store it long-term without using the CDN part is a little bit risky.
Huh, thanks for that. I hadn't noticed (was that always there??). So, sustained egress at rate R/sec means I have to use 2500000 * R amount of storage per month, hrm...
Possibly can't use it for one of the "ponies" I was working on, but probably still good as "ye huge media archive".
Wasabi doesn’t charge for egress (‘fair use’ policy applies), but they do have a 3 month minimum for data, including deleted data.
This caught me out when I was transferring 100GB files that only needed to be up for a few hours, and I ended up getting charged as if I had hosted them for 3 months.
(approximately) Cloudflare provides a proxy service that you'd use to access your B2 data from home or other cloud without paying for egress.
They can do this because it costs almost nothing to move data between B2 and Cloudflare, and then from Cloudflare to almost anywhere.
Moving data from B2 to most other places on the internet likely costs them more because Backblaze isn't in a position to negotiate adventagous peering agreements with ISPs.
Note that you can't use a free Cloudflare account just for things like images, video and other binary files, as they'll suspend the account. It must be used primarily for a website, not content hosting. If you only want to use Cloudflare for files, you need a paid account.
Five dollars a month, fixed, yes. fortepan.hu does this, it's a few terabytes of Hungarian historic photos. The site couldn't exist as it does now without Cloudflare essentially donating the bandwidth.
In addition, you need to use Cloudflare web workers if you want any sort of access controls. (I think this is part of why it makes financial sense for Cloudflare to do this)
Wow! Cool! Very surprised that Cloudflare wouldn't charge an arm and a leg for such a service... considering they're moving the actual bits.
I'm poking around at the Cloudflare website, what's the name of the aforementioned service? What term should I google?
I'm ignorant of "modern Cloudflare" -- other than reading their fantastic technical blog, I've never used them in a professional capacity and don't know their product offering -- other than a cache, CDN, DDOS protection, and a Lambda.
"Bandwidth Alliance" seems to be some sort of B2B consortium.
I'll dig into this more later, but unless I'm missing something obvious (I very well might be...) there's not a easy/inexpensive method for me to sign up and join the "bandwidth alliance", so that data transfer from B2 to my laptop is free.
I have a few VPSs with Linode, which is a member of the "Bandwidth Alliance" but I don't see any details, numbers, prices, specs... Just a bunch of marketing :/
> there's not a easy/inexpensive method for me to sign up and join the "bandwidth alliance"
Not unless you run a data center and intend to peer with other bandwidth providers to join your intranet to the internet. It's intended for large service providers like Cloudflare/Backblaze that do direct bandwidth interconnect at physical locations and don't have to involve other intermediaries (like Level 3) to move data between members.
Otherwise you "join" by hosting services/content with an Alliance member and making sure you only use other services that do the same. Even then, bandwidth isn't always free (Azure and GCP still have some costs, for example, but discounted).
If you setup a free cloudflare proxy fronting your B2 bucket, then download from that, the egress from B2 is free because it's going to cloudflare and the egress from cloudflare is free because they don't charge for it.
> the Bandwidth Alliance is a group of forward-thinking cloud and networking companies that are committed to discounting or waiving data transfer fees for shared customers.
Now though I don't understands the original comment... why care about egress for backup storage? I means as long as it's not absurd (and I agree that AWS egress price is absurd, though the original comment wasn't complaining of that...), you usually don't expect to have to retrieve it and if required, you are ready to pay much more for it as it will be worth it.
Frankly, backblaze looks like an over-specialized player whereas clouflare is already used for a lot of stuff.
Eg: my employer already has stuff on clouflare, using their services is just as easy as pulling their terraform provider. OTOH, for backblaze, I'd have to go through the whole evaluation process, security and legal compliance etc etc...
Backblaze has only two locations and they cannot be used with the same account. Your data is (within one account) always just in California oder Amsterdam. For many needs, having multiple PoPs is crucial.
Yev from Backblaze here -> we have more than two data center locations, but we do have two regions (US-West, which is spread out across California and Arizona and EU-Central which is in Amsterdam). Slight nuance, but very different!
You're right, but for me as a customer it doesn't matter. I can choose from two geographical locations and cannot use both at once with one account. So there's no option to get data close to my users.
> 'rclone'[2][3] which allows you to move data from cloud to cloud without involving your own bandwidth
Maybe I'm reading this wrong, but the data does pass through the machine where rclone is running. rclone does support remote-to-remote transfers[0], but I believe only for remotes of the same type (ie S3 to S3).
Does this mean "remotes that speak the S3 protocol", or "remotes that are S3"? The former would require S3 supporting induced requests (so it POSTed the data somewhere), the latter would require a "copy" operation on S3. I don't know which one is supported.
The [3] resource is fantastic! Have you tried sponsoring rclone? I was studying their docs last week, and I'm sure people reading the docs are interested in this use case of moving between clouds without using their own bandwidth.
Really curious to see how this goes. If they live up to the following paragraph, that's pretty game-changing:
>> This cheaper price doesn’t come with reduced scalability. Behind the scenes, R2 automatically and intelligently manages the tiering of data to drive both performance at peak load and low-cost for infrequently requested objects. We’ve gotten rid of complex, manual tiering policies in favor of what developers have always wanted out of object storage: limitless scale at the lowest possible cost.
The amount of effort it takes to understand and account for S3 Intelligent Tiering is somewhat mind-blowing so to get rid of all of that (and the corresponding fees) would be really nice and TheWayThingsShouldBe™ for the customer -- on top of that most users just don't even know S3 Intelligent Tiering exists so it'll be great if Cloudflare just handles that automatically.
We at https://vantage.sh/ (disclosure, I'm the Co-Founder and CEO) recently launched a cross-provider cost recommendation for CloudFront Egress to Cloudflare which was really popular and I can imagine doing something similar for S3 -> R2 once it is live and we are able to vet it.
Does Vantage offer comparisons for Backblaze B2, OVH etc?
When looking at object storage, tail latency is probably the single most overlooked metric, and the most material differentiator between providers after bandwidth costs. Don't sweat the cent spent on storing an object, worry about the cost of the 6,000,0000 copies of it you'll ship after it's stored.
As for bandwidth, CloudFlare becomes uninteresting the moment your account starts to see any real consumption, even AWS are easier to negotiate with.
We are working a more holistic approach to ingesting and providing recommendations from companies like Backblaze, OVH, etc. in addition to many, many more providers for other AWS services. The goal being that we can give customers the tools they need to get visibility on what options exist and take action themselves from there.
Your average cable modem can't do a good job of hosting much more than 50GB/day. Would you say that 1TB/day is well into 'real' then? You seem dismissive of that much.
I know the Cloudflare team hangs out here, so thanks, and great job! This was absolutely necessary for my line of work. Couple of quick questions/confirmations:
* R2 will support the same object sizes as S3? We have 500GB+ objects and could go to a 1TB per object.
* R2 will support HTTP Range GETs, right?
Egress bandwidth for objects on S3 is the biggest line item on the AWS bill for a company I work for, by an order of magnitude, and this will just wipe it off for the most part.
Yes to range requests. Current object limit is smaller than that, but we don't have a fundamental restriction there. Shoot me an email to gmckeon [at] cloudflare.
I personally think uploading massive files is not usually desirable. Better would be an easy way to chunk it and upload and have the server put the file back together, which would increase reliability.
Tus is over-complicated IMO. Why require the server to store state that's already available on the file system? Just support PATCH requests with an offset parameter. If an upload fails partway through, HEAD the file to see how much got copied to disk and resume from there.
Really excited for R2 Storage. I am wondering if R2 can solve a limitation I was running into with BackBlaze.
I had tried using BackBlaze 8 months ago as a much cheaper (especially with B2 and CF's Free Data Transfer partnership) replacement for Amazon S3 and was running into a limitation on B2.
I had a scenario where my users can upload images from the browser to BackBlaze. I wanted the ability to control the file name of the uploaded file. I don't want the user to be able to modify the network request to upload the file with a different file name. Nor do I want the users to be able to upload files with names which would overwrite existing files.
B2 didn't let me upload files with a specific filename in the pre-signed URL.
But this allowed my users to upload file with any name they want. It would also allow them to overwrite existing files (from other users).
My question is more from a security point of view so preventing one user from overwriting another user's content is crucial. For example, lets say you right click on an image from someone else on facebook and get the actual image's file name. Now you try to upload an image on facebook and you edit the network request in the browser's inspector tool to the image file name which you got for another user. Facebook obviously prevents this in their own way using pre-signed urls which include the filename in the signature. However on BackBlaze if I try this, the "pod" url which is received doesn't include any file name signature. The pod URL is just where the image gets stored on your end. A user can easily edit the network request and modify the "X-Bz-File-Name" Header to another user's filename. This would be a major security vulnerability if I went with BackBlaze. As a workaround, right now it seems like users would first have to upload files to my own server, then my server would have to upload them to BackBlaze to avoid this issue. This sounded like hassle.
Amazon S3 solves this problem using createPresignedPost which includes a signature of the filename in the URL. I contacted BackBlaze's support and got a response their S3 api doesn't support createPresignedPost:
Is there a way to prevent this on R2? Something where the link provided by b2_get_upload_url (whatever R2's equivalent will be) only works for specific a file name?
Nilay from Backblaze here. Amongst other things, the solution engineering team reports up to me... and when I saw this comment yesterday, I had them dig into why the B2 S3 API doesn't did not work for you.
It turns out that Backblaze B2 absolutely does support uploading objects via S3 style presigned URLs. However, there are a couple of caveats:
1. B2 doesn't currently support PostObject operations. Therefore, you must use PutObject to upload to a pre-signed URL. Many of the AWS SDKs default to PostObject.
2. B2 only supports AWS v4 authentication and not v2, which AWS has deprecated. However, some AWS SDKs default to v2.
I am currently away for this week, so I can't try it this week. Can I reach out to you next week with my findings? Is there an email address I can reach you at?
Just wanted to mention that I had raised my issues on the backblaze subreddit back then:
I have this exact same use case for an app I’m building and would love an answer as well. I built on S3 as a result. When a product says “full S3 api compatibility” this becomes my question.
Going further I don’t want to become someone else’s back door file repo for illegal shit. So presigned upload urls with an enforced file name and configurability over the size limit and expiration of the presignedpost (both in terms of time and number of files) is pretty important to me. S3 does a good job here.
I've always found pre-signed URLs fragile and a pain to work with, when we solved this problem we just put cloudflare workers in front of b2, so the worker can sign requests to b2 itself, so the interface to the user can be simpler. Could just be a straight POST with bytes, then the worker will turn that into a PutObject call to the s3 api, it works pretty damn well.
If you are reading files back out again too you can use cloudflare caching to speed things up, its a good combo.
Sounds great (nearly too-good-to-be-true great). Wonder how the SLA will look like. I have been using gcs, s3 and firestore - and their actual reliability varies significantly, while advertised slas are similar. For instance, with firestore one has to implement a pretty lenient expotential backoff in case of a timeout, and if the backoff results in the object being retrieved in, say, 2 minutes -- thats still ok as per gcs sla. It obviously makes it hard to use it for user-facing stuff, such as chatbots, where you can't afford to wait that long. In my anecdotal experience of using firestore for about 10 million operations per day, we will usually have a problem like that every few days, and that means user-noticeable failure. It would be great to read more on cloudflare's approach to reliability defined as "99%" percentile max latencuy. Can't wait to give it a try with our workloads.
Having done hundreds of TCO analyses for customers moving object storage providers, this seems like it carves out a very interesting niche for Cloudflare. R2's higher storage costs (roughly triple) also make it a more manageable threat to specialized solutions like Storj DCS, Wasabi and Backblaze B2.
At Taloflow (https://www.taloflow.ai), (disclosure: I'm the CEO/Cofounder) we provide buying insights for cloud object storage (and soon other IaaS/PaaS). We will definitely be adding Cloudflare R2 to the mix.
As someone who took up making travel videos as a hobby, this is definitely on my radar.
Video files are large, although ~20 cents per video streamed for a small website is manageable (S3, Cloud Storage, Azure...), it's the potential for abuse that could drive my bill up that terrifies me, which is why I decided to stick to Hetzner VMs with their 20TB of free egress.
I have, I've also taken a look at Mux. Both would be fantastic options, and I'm still considering them, but I don't have many films and I'm biased towards distributing the original high-quality encodes.
Both of these services significantly reduce the file-size with a re-encode, even if they promote an "impercievable quality loss". They seem to be more suited to high traffic on-demand streaming for websites, promotional material, etc.
What format, bitrate, and resolution are your video outputs? Or just a length and file size of one that’s handy. (I’m a streaming video engineer and curious.) Reduction in file size from re-encoding doesn’t mean there will be perceivable loss in quality. Your videos should almost certainly not require the kind of bandwidth that you mention unless your deliverables are higher-quality than those from, say, Netflix :)
FYI Mux isn’t an okayish platform for mid-level use-cases, it’s the gold standard for high-performance live streaming and VOD, created by the guys behind a bunch of other stuff pretty central to the streaming industry overall, and is used at the highest level by folks like CBS, Mandolin, etc. Of course you don’t have to use it, but it’s certainly no toy.
Just curious, does your audience expect to download it in full to watch later or on a USB stick as you describe, or is that a side effect due to the large file sizes?
EDIT: I am not a Mux employee, nor is any of this very important, I've just been working in this space for 10 years and very seldom run across something that needs these requirements and I'm curious :)
No worries, I've always been more interested by the technical aspect more than the creative aspect of filmmaking, hence my interest in programming. It's a personal project, I've been making films since the age of 11, although I haven't been able to do much since I started my Bachelor degree...
I encode H.264/AAC for compatibility, usually 2-pass at 8Mbit/s for FHD content or around 30Mbit/s for UHD, 256-320 kbps for AAC. This gives a size of 1-4 GB per video. My Dad worked in broadcast, gave me those figures 10 years ago, I generally stick by them for any work I'm proud of!
You are right, that bitrate absolutely isn't necessary :D , and there are better codecs too. I don't have more than 17 films at the moment, the whole backend costs me about 4 euros a month on Hetzner with an attached 32 GB of block storage, no bandwidth costs (300Mbit/s max), single-core running Node.js and Nginx.
I make films during travel and events, and share them with friends and family who were there and for who it has some intrinsic value. They're mostly personal, not just for consumption. Short compilations that I would like to keep for the future, like old photos. Hence why people watch (and hopefully rewatch!) them on a TV and don't mind the wait.
Buffer-less streaming is absolutely not a priority (although nowadays I think it's more expected, people have shorter attention spans, despite the hours of work that goes into a short travel film). It's a very niche case, but would have cost me at least $50 in bandwidth alone with the big three. It's not going to break the bank, but it's also unnecessary.
You don't usually notice the drop in quality on YouTube or Netflix, until you actually try working with high quality source footage (high-bitrate from a dedicated camera, not phone footage). Then it's very noticeable (purple/green flickering on even surfaces from chroma subsampling, I'm guessing), and makes you sad when you realise what everyone is missing!
If you're still curious, my website is still live. I beg you not to judge, I started making short films very young and never got very good at them either (I'm studying nanotech & engineering, no film career for me)!
I suggest to use constant quality encoding instead of a constant bitrate, this way encoder will automatically adapt bitrate for a particular scene. This is a much better approach, it will give you better quality and a smaller file at the same time.
For example, encoder might choose 1Mbps bitrate for a scene which is mostly static and 20Mbps for a part of the video with a lot of movement. Your 8Mbps constant bitrate will be an overkill for the first scene and too low for the second. Let encoder decide the optimal bitrate.
Not sure why this is more appealing than Wasabi? As far as I can see, Wasabi is cheaper, has great speeds, fantastic S3 compatibility, their dashboard is a joy to use so what is the actual "special" thing here? I mean sure, it's a good thing to have more competition but the way everyone here is describing the situation makes it seem as if Cloudflare is going to be the cheapest & the best.
>If your monthly egress data transfer is greater than your active storage volume, then your storage use case is not a good fit for Wasabi’s free egress policy
>If your use case exceeds the guidelines of our free egress policy on a regular basis, we reserve the right to limit or suspend your service
Unless I am misreading, wasabi can shut you down if your egress is high.
My understanding with Wasabi is that it is only intended for up to 100% egress. It doesn't look like r2 has this limitation. Though I can't imagine egress is actually completely free beyond a certain point.
1 - Their portfolio is bigger. Making it easier for existing clients to adopt. Eg. Why a lot of companies use AWS.
This product seems a good fit for their cloudflare workers.
2 - Cloudflare is going heavy on egress costs. Which they have a lot of extra opportunity costs, comparible to the big 3 cloud providers and they are putting their "weight" in good use:
I believe the minimum object retention time is 3 months, which killed it for our use case. I.e. create a file today, delete it today, and pay for it for 3 months.
Nice to see someone flipping the script and encroaching on AWS' territory rather than vice a versa
Taking the have-a-much-better-product route to siphoning use from AWS is particularly ambitious. I hope it works out. AWS has had it a little too easy for too long
Actually many companies are trying/tried to compete with AWS especially in an egress price territory but because of a lack of exposure it is very hard to get a traction and get big enough.
And there is a lot of things that can be improved - like egress price, better support, get rid of regions completely, etc...
Interesting pricing considering Backblaze is another Bandwidth Alliance member and they only charge $0.005/GBmonth (vs. $0.015/GBmonth). B2 + CloudFlare gives you a similar deal at a third the cost.
Yes but you can only use B2 via CloudFlare for web pages. Using it as a data storage platform isn't allowed. Unless of course you're willing to pay handsomely via an enterprise contract, but then the pricing changes.
Use of the Services for serving video or a disproportionate percentage of pictures, audio files, or other non-HTML content is prohibited, unless purchased separately as part of a Paid Service or expressly allowed under our Supplemental Terms for a specific Service. [1]
You may use Cloudflare Pages and Workers (whether in conjunction with a storage offering such as Cloudflare Workers KV and Durable Objects or not) to serve HTML content as well as non-HTML content (e.g., image files, audio files) other than video files.
The said limitation should apply however to their tranditional service with orange on whether it is B2 or not. I am not sure if being a Bandwidth Alliance partner makes a difference.
So the gray area comes from an exception being granted from R2 not specified in that linked page. R2, like B2 is part of Cloudflare's bandwidth alliance, so is the unwritten exception for R2 or for the bandwidth alliance?
I recently started keeping about 30TB of ElasticSearch and Postgres backups in Backblaze B2. The price is great, but getting data in is not particularly easy as the Backblaze S3 API seems to fail a high proportion of requests when under load.
If R2 can be approximately as reliable on ingest as AWS/GCS/Azure is, but without the egress fees of the other major providers, then $0.015/GB-month seems like a pretty good deal.
B2 really is an exercise in making sure your code is robust with respect to external APIs but I'll be damned if it isn't cheap. We ended up building a whole queueing system for it because ad-hoc retry logic stopped being good enough.
I'm excited because while B2 + Cloudflare is great, the speed+latency isn't the greatest for some applications. So there's definitely a place for R2 here to compete more with AWS S3 than B2.
I'm a fan of B2 as well, but for some use-cases they seriously need to up their game. They only have three datacenters (CA, AZ, and Amsterdam), they still don't have a public status page, their admin UI is lacking lots of features (like proper invoices and different 2FA options), their permission system is very basic and inflexible, and they are not integrating compute with storage like AWS already does and Cloudflare will eventually be able to. However they are impossible to beat on cost, and for me their latency has recently improved significantly and has become much more stable, so I'm not going to move anytime soon.
Latency (time to first byte) in serving infrequently accessed images was a big problem with me and B2. The cost was low enough that I've stuck with it though and coded on the front end of the site to use placeholder images until the real media can be retrieved.
Very much depends on what you're utilizing Backblaze for. It's unusable for volume image hosting for example. It has severe performance problems with small files, as does much of the object storage field (including DigitalOcean, they directly warn customers not to bother using their object storage for small files if performance is a concern). The CDN doesn't help much, Backblaze chokes if you try to shovel a large number of small files at it (their infrastructure was not designed for that and they admit it in their docs), and or attempt to pull a lot of small files out of it. AWS is pretty great when it comes to that by contrast, and with the price to go with it. I'm looking forward to finding out what R2 can do.
This is true - most of the object storage providers are really bad at serving data fast, they more focused on a cold storage. Tebi.io works really fast for small files, plus it is a geo-distributed object storage meaning that data is physically stored in different locations around the world, greatly reducing latency.
Backblaze, DO Spaces simply were not designed for this in the first place.
> As transformative as cloud storage has been, a downside emerged: actually getting your data back... When they go to retrieve that data, they're hit with massive egress fees that don't correspond to any customer value — just a tax developers have grown accustomed to paying.
Strategy Letter V, commoditize your competitor's advantages!
> We’ve gotten rid of complex, manual tiering policies in favor of what developers have always wanted out of object storage: limitless scale at the lowest possible cost.
Cloudflare has a clear strategy: Be the simplest cloud platform to deploy to. It has been a breeze as a small dev shop adopting their tech. AWS started with the startups, but have since long struggled to keep up that simplicity in face of supporting what must be a dizzying array of customer requirements. Remains to be seen how Cloudflare fares in that regard. I like my Golang better than Rust.
> Cloudflare R2 will include automatic migration from other S3-compatible cloud storage services. Migrations are designed to be dead simple.
Taking a leaf out of Amazon Data Migration Service and its free transfers from elsewhere into RedShift/RDS/Aurora/OpenSearch. Niice.
> ...we designed R2 for data durability and resilience at its core. R2 will provide 99.999999999% (eleven 9’s) of annual durability, which describes the likelihood of data loss... R2 is designed with redundancy across a large number of regions for reliability.
S3 goes upto 16 9s with cross-region replication... and so wondering why R2's still at 11 9s? May be the mutli-region tiering is just acceleration (ala S3 Accelerated Buckets) and not replication?
> ...bind a Worker to a specific bucket, dynamically transforming objects as they are written to or read from storage buckets.
This is huge, if we could open objects in append-mode. Something that's expensive to do in S3 (download -> append -> upload) even after all these years.
> For example, streaming data from a large number of IoT devices becomes a breeze with R2. Starting with a Worker to transform and manipulate the data, R2 can ingest large volumes of sensor data and store it at low cost.
Very exciting. Object storage is getting really competitive and i love the naming scheme alliance - S3, B2 (Backblaze) and now R2, who will do one with a “1”?
On a serious note i’m wondering about the signed urls and ACL capabilities of the cloudflare offering cause this is something we use.
I’m also interested does R2 replace S3 and CloudFront at the same time? That’d be nice and one headache less.
The Cloudflare offering supports their serverless Workers compute which would make signed URLs and ACLs trivial. Cloudflare certainly would also be replacing CloudFront.
> Our vision for R2 includes multi-region storage that automatically replicates objects to the locations they’re frequently requested from.
Seems like they are going to automatically replicate data to other regions. Something like tebi.io is doing for a long time already, it is a geo-distributed S3-compatible storage that is replicating data across the world to reduce read/write latency and increase speed.
If it is done right, this might increase download speeds by a lot especially for big infrequently accessed files.
Cost/GB is three times that of Backblaze, bandwith is generally too cheap to meter (outside aws, gcp, azure who use egress costs for lock-in).
The offer is no doubt competitively prized, but it's no doubt much more lucrative than people using Cloudflare as a CDN for their s3/b2/azure/whatever.
You need to account IO load as well, more bandwidth means more requests to the disks and I assume they will use mechanical HDDs for the storage, SSDs are still too expensive to support $0.015/GB. Good mechanical drive can give you up to 200 IOPS and maybe 200MB/s, so they will have to copy the data a lot to be able to increase capacity on demand. Making it free doesn't seems to be sustainable at all.
They seem to learn a lot by centralizing a lot of the internet's traffic through their core. Perhaps it makes their ddos-protection more robust when they can train on "what normal usage" looks like "for the longer tail".
Perhaps they are building moats around that business and expecting a future when Fortune1000 requires their cdn.
> what does being member of the "bandwidth alliance" mean?
Cloud providers normally have huge profits on egress data ( outgoing) and not incoming data ( to attract giving them your data).
This additionally incentives customers to stay within a certain cloud and not using competing products based on costs ( since egress costs to a 3rd party makes the price difference obsolete)
Cloudflare ( i think) started the bandwidth alliance to make these prices more fair. As in: work together, have direct network connections and reduce customer billing.
Getting 100k people to download 100GB seems like a rather lofty goal. Has this ever been done via social media? The only case I can think of is a game like Call of Duty, which has a ginormous marketing campaign behind it. Even so the download wasn't linked via social media.
Most people don't even have 100k followers. The follower-to-click ratio isn't going to be 100% in any situation. Click-to-full-download ratio is going to be low too, especially so at 100GB. A lot of people don't have that much free space on their devices even!
I think this scenario is firmly in outlier land, and thus not really relevant to cloudflare's calculations.
Wow, with global replication by default, this looks absolutely perfect for what I'm currently building, even before taking costs into account.
I'm hoping this means what I think it means, that write latencies will be minimal across the globe, since writes will be persisted and ack'd at the closest region and then eventually consistently propagated to other regions?
If so, curious what would happen in a scenario where a region requests an object that has been persisted at another region but not yet propagated? Will it result in a 404 or is the system smart enough to route the request to the region that has the file at the cost higher latency?
From my research so far into S3's cross region replication, the latter behavior doesn't seem possible out of the box since requests have to specify a single region (S3 experts, please do correct me if I'm wrong), so I'm hoping CloudFlare with its deep expertise in managing a global network can differentiate here. Even if it's not offered out of the box, due to the lack of egress costs, it's a lot more feasible to build in this behavior in the application layer with R2 by just racing requests across several regions and taking the one that resolves first (or at all), so very promising regardless.
Also, would love to hear some numbers on what kinds of write latency to expect. From my experience so far, S3 writes for tiny files in a single region take on the order of 50ms ish even for clients in close physical proximity, which is serviceable for my use case, but seems higher than it needs to be (and every little bit I can shave off on latency helps tremendously for what I'm building). Really looking forward to seeing what the CloudFlare solution is capable of here.
Lastly, S3 didn't advertise and guarantee strong read-after-write consistency for same region read/write until late last year. Will R2 offer this out of the gate?
R2 is still under development, we will see how it will work, but I can tell you how Tebi.io global replication works:
- Writes and reads always goes to the closest region, this keeps network latency low. You can define different metadata write concern levels for each bucket, this way you can define how fast or how consistent your writes will be on a global scale. You can even make them asynchronous, meaning that once the data transfer is complete, metadata write and propagation is performed without you waiting for it to complete.
- If you write to one region and someone is trying to read that object from another region - it will be possible as soon as metadata is replicated to that region (usually it takes less than 500ms). If data is not yet replicated to that region, then it will be read from another region. If the data is partially replicated, then that part will be read from the local storage and the rest of the data from another region. Additionally, Tebi supports synchronous replication allowing almost instant data propagation across the world.
- Write latency depends on metadata replication write concern - it can be faster than AWS or slower, you can configure it yourself.
- You can define where you want to store your data and how many copies you need in each region.
> If so, curious what would happen in a scenario where a region requests an object that has been persisted at another region but not yet propagated? Will it result in a 404 or is the system smart enough to route the request to the region that has the file at the cost higher latency?
It's eventually consistent for global replication, but additionally there should be a consistent index of things that are available in other regions? I suppose that's plausible. Seems to defeat a lot of what they're avoiding.
Nothing stopping you from doing a scatter gather on a 404 base on some heuristic in your application code though.
Really interesting, will R2 support lifecycle rules like S3 does? We write around 90 million files per month to S3, if we could replace that with R2 and have the files automatically expire after 30 days that'd be a pretty amazing price reduction for us.
The full lifecycle support s3 has is really powerful. One use case we have for ugc content is disallowing online serving systems the ability to permanently delete an object. Instead the object is marked as deleted and then automatically cleaned up after 30 days.
They appear to be focused on an automatic version:
> Behind the scenes, R2 automatically and intelligently manages the tiering of data to drive both performance at peak load and low-cost for infrequently requested objects. We’ve gotten rid of complex, manual tiering policies in favor of what developers have always wanted out of object storage: limitless scale at the lowest possible cost.
I recently did a pricing comparison of cloud object storage services for my article "How to Create a Very Inexpensive Serverless Database" (https://aws.plainenglish.io/very-inexpensive-serverless-data...). It describes using object storage as an inexpensive serverless key-value database.
Although egress (outbound network) can be a significant part of object storage expenses, if you are reading and writing small objects, per-request expenses can be much bigger. Cloudflare indicates that for low request rates there won't be any request fees, but doesn't state what they will charge for high request rates.
My article points out that the best deal when working with high request rates is to use services that don't charge per request such as DigitalOcean, Linode, and Vultr. If it's S3 that you want, even Amazon has recently joined the budget club with Lightsail Object Storage which has monthly plans of $1, $3, and $5 (250 GB storage and 500 GB egress) with no per-request fees.
Yes, we have read after write consistency across regions today. We're considering relaxing that if we need to reduce latency (one of the things to be worked out in beta!).
> R2 is designed with redundancy across a large number of regions for reliability. We plan on starting from automatic global distribution and adding back region-specific controls for when data has to be stored locally, as described above.
Does that mean automatic caching across regions? Low-latency read access everywhere without an extra CDN in front of it?
We're still deciding on how we want to handle caching. We integrate with Workers, so manually caching is always possible. The catch is we're currently building for strong consistency - if we added a cache in front, we'd weaken that - so it will likely be a configuration option we add later.
Check tebi.io - it is a geo-distributed S3-compatible storage that does exactly that. You can configure global consistency level for each bucket individually.
Cloudflare Pages locks you into git-based deployment, which isn't always practical, especially for sites that are heavy with images and other non-text static assets. I don't want to use git to manage the images for my website and I don't want to have to pay Github every month for Git LFS storage and bandwidth costs.
Am I reading this right that this is pretty much aiming to be a full competitor to S3? Or is this a more limited special purpose tool? I didn't really follow what Cloudflare is doing, so I only know them as a CDN. Are private buckets possible with this?
We all knew that the big players are really maximising their profits on the egress charges, so I can see that this left some potential for someone to step in. No egress charges at all still sound a bit too good to be true, but it would be nice as that's just one parameter less to think about.
Another interesting aspect are CloudFlare Workers. As far as I can tell they're not a full replacement for something like AWS Lambda if e.g. I need to do a bit heavier stuff on the data in R2. Being able to do heavier processing close to the actual data would be really interesting as well.
There is one major reason S3 remains the king of storage for mobile media uploads: bucket notifications. Does R2 implement this feature? If so, I’m going to have to run some experiments with this...
To be honest, I think the biggest draw will be for companies (like where I work) that put large objects on S3 and distribute it to hundreds / thousands / millions of customers. The egress direct from S3 is on the order of 8 cents a GB, and with Cloudfront in front of it it’s a few cents lower, and you can negotiate pricing a little lower if you’re big enough. But not an order of magnitude.
We’d stick R2 in front of an S3 bucket and wipe off a the biggest portion of our bill.
S3 does absolutely have a ton of other stuff like events, lifecycle, Glacier, Lambda etc and is plugged into the AWS ecosystem, so I doubt we’re exiting it completely. But this is a solid option as a front for egress from S3.
We're fully integrated with Workers, so you can write a Worker that calls additional logic when a request is made to the bucket's url.
We have notifications, where a change to the bucket invokes a worker with a specific event, on our roadmap, but have a bunch of other work prioritized ahead of them.
I know this is irrational and not very helpful, but it actually makes me angry when I see a response like "but you can cobble together some janky custom code with Workers to do what you want".
It's like going to a restaurant, asking if they have shrimp scampi, and getting a reply that you can go to a supermarket and buy the ingredients and make the dish and bring it back to the restaurant to have with your meal.
Hi Greg, thanks for the feedback. It’d be great if you could get around to putting up an example of how to do bucket notifications + lifecycle using Workers as a temporary workaround until it’s part of the “core.” I don’t think I’m the only person with this use case, but maybe I’m more of an edge-case minority than I imagine... In any case, a code library / “recipe collection” (do we still call them cookbooks?) would be great when this launches.
Will we be able to serve files from a R2 bucket through a subdomain like static.project.com/filename WITHOUT using a worker that wastes money on every request for no reason?
> Our object storage will be extremely inexpensive for infrequent access and yet capable of and cheaper than major incumbent providers at scale.
How frequent is infrequent? In our case it's "never unless other backups fail" and for that S3 Glacier Deep Archive is still cheaper ($0.00099 per GB).
"At launch, R2 will cost $0.015 per gigabyte of data stored on the service, about half as much as AWS charges for customers on its frequent access tier. Infrequent access to R2 will be free, the company said.
"
So.... I still need to test, but if that is true can be a game changer. Of course... $0.00099 is almost free, but if everything is done automagically will be awesome.
I'm not a software developer so please pardon my ignorance, doesn't this put them at basically a "core complete modern cloud provider" - if you fully bought into architecting against them, with their workers product and all, could you fully build and run for CF?
You could probably build a lot on their infra and maybe small b2b startups could get by on their offering, but you can easily outgrow their current offering. Azure/GCP/AWS have significantly more tools that solve expensive problems (like ML hosting, events, SQL database, search, etc.).
I think they're still PaaS and not IaaS. So yes, you can build fully-fledged app with them if you're willing to change your engineering paradigm and not use any of the usual arbitrary computing workloads.
Nice, and it seems to be quite cheap as well.
It's unfortunate, tho, they don't talk about data residency: where are servers located? Where will my data be copied?
we just got this big grant that works with healthcare and genomic data for a new type of therapy in Australia, so we have jurisdiction requirements (data needs to stay within Australia, some data needs to stay within New South Wales). We're currently talking with your run of the mill providers, but I'd pretty excited to try this out with Cloudflare instead... esp. when previous similar projects have been hit with some nasty egress costs
We plan to add support for data residency requirements on an object-by-object basis. In other words, if you need one object to stay in the EU and another to stay in the US and a third to stay in India you tag each object using the same API and we handle ensuring the residency requirements behind the scenes.
I have been doing work on ensuring my app is compatible with the many Object Storage services. Would be great to get access to this early and make it compatible too.
Love the team at DigitalOcean but Spaces was NOT reliable the last time I played with it. It’s also lacking key features in lifecycle and notification areas. Maybe they’ve gotten things together in recent months/years and I haven’t gotten the update, but it was very “you get what you pay for” when I tried to stage a project using it in 2018/19.
Spaces is not good, in my experience. Their Frankfurt storage just stopped working for like a week. And last time I checked they still didn't support critical features like per-bucket access keys or index.html support.
Scaleway also provides a similar service. Had an issue recently (deleted a bucket, and couldn't create another bucket with the same name) and their support replied back in minutes.
The claim is that the service handles distribution across reliability, so I think the more interesting question is the odds of that _mechanism_ failing when your site would otherwise be up[1].
Similarly, 3-2-1 is a backup strategy and the pricing appears to already include multiple copies using the same mechanism so the correct calculation would be the cost of R2 plus whatever _different_ mechanism you choose for disaster recovery purposes such as on-premise storage or a completely different provider.
1. For example, if you use Cloudflare as your CDN / DNS provider and they have a catastrophic failure, the fact that your storage is inaccessible is just another facet of the same issue.
That's the reason why most enterprises have 4-7 copies of their data..... No inherent geo-replication by default (and as CF shares, it's a "hotel California" problem.... Too expensive to egress completely from AWS.
Is this going to be content-neutral, like Cloudflare was when fronting ISIS websites?
Or is this going to be fine-until-bad-PR, like when Cloudflare decided to stop hosting The Daily Stormer?
There is a special kind of lock-in when it comes to object storage, as generally you use something like this when the data is too big to store another copy of locally
or at another provider. It's not like you can easily maintain provider independence, and if Cloudflare decides one day that some of your UGC in a bucket isn't something they want to host, what happens then?
Is the data lost forever because your account is nuked? Is there a warning or grace period?
I am hesitant to put any large amount of data into a service without a crystal clear statement on this, so that I can know up front whether or not a business needs to maintain a second, duplicate object store somewhere else for business continuity.
If Cloudflare in practice is going to nuke the account the moment your site ends up hosting something objectionable, this DR requirement (a second provider that also stores all objects) needs to be factored into a customer's costs. (It may be that the bandwidth savings still make it worth it to use Cloudflare even with double storage.)
> I am hesitant to put any large amount of data into a service without a crystal clear statement on this, so that I can know up front whether or not a business needs to maintain a second, duplicate object store somewhere else for business continuity.
It's a mistake to rely on a clear statement when you can't afford to lose your data. Stuff happens all the time... mistakes, malware, an expired credit card, etc. Independently of the provider you decide to use, I'm not sure if a backup is optional in your case.
For the longest time we tried to convince people that they should have an off-amazon archive of their S3 data ... we even ran an ads to that effect in 2012[1].
The (obvious) reason this isn't compelling is the cost of egress. It's just (relatively) too expensive to offload your S3 assets to some third party on a regular basis.
So if R2 is S3 with no egress, suddenly there is a value proposition again.
Further, unlike in 2012, in 2021 we have really great tooling in the form of 'rclone'[2][3] which allows you to move data from cloud to cloud without involving your own bandwidth.
[1] The tagline was "Your infrastructure is on AWS and your backups are on AWS. You're doing it wrong."
[2] https://rclone.org/
[3] https://www.rsync.net/resources/howto/rclone.html