Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Based upon outages when a single AWS zone goes offline, a whole hell of a lot more than side projects aren't taking advantage of multi-zone redundancy.

I think what many fail to realize is that - surprise surprise! It isn't that simple. Many think "if my zone goes out, I'll just migrate to a different zone!" Nope! AWS doesn't have the capacity for it - as we've seen many times, AWS can't seem to handle the migratory load when a zone goes out. Sure, the other zones don't really die, but good luck migrating your workload to them.

But when touting multi zones, no one ever mentions this little nugget of information.



At my previous work we often were able to saturate all similar instance types in given region..

All AWS had to say was “we are working hard on providing additional capacity”.

The same often happens on black friday, where companies scale up their platforms just in case because there might be no capacity on AWS.


We have that 'issue' with many of our Spark workloads where there isn't any of our desired capacity available as spot, but we have a baseline reserved up front instances for anything realtime anyway so with a bit of planning it's a non-issue.

It does cost money, but then again, so does not running certain processes. The trick becomes calculating the intersection at which point the costs outweigh the benefits, and that calculation applies everywhere.


If you're in a single AZ you're not in multiple AZ. Migrating between AZs isn't multi-zone either. Running at 130% capacity in three AZs, that is multi-AZ (to us, in our availability configuration). If an AZ goes down (which in some regions we use has happend 0 times) we lose about 30% capacity, but since that's our margin of scaling anyway we can keep going as-is, even if there was no 15% additional capacity available in the remaining AZs.

Some sort of manual active-standby configuration really doesn't require AWS or a Cloud, that stuff is the same 90's implementation it has always been and practically boils down to attaching your RAID1 USB HDDs from one PC to another PC and booting that bad boy up as 'failover'. (yes, that's an example, and yes it's an extreme one)

If you have capacity planning, and you plan accordingly, you take service provider limits into account, just like you would with anything else. Having two power feeds into a distribution warehouse doesn't help much if neither can't handle 100% of the load in an industrial park. So while having two feeds might seem 'redundant' to a single tenant or customer, it's only really redundant if either can supply all the demand of all connected customers.

The same applies to fiber connections, plenty of fake-redundant connections that are suggested by customers to be 'redundant' turn out to end up at the same PoP and if the PoP goes down your redundant fibers are worthless. In the same logistics distribution scenario, your trucks can't deliver goods if the destination warehouse itself is offline, and now you need redundant warehouses.

That's obviously a weird thing to do at smaller scales, but the fact remains that AWS having an AZ go down is only a small piece of the puzzle, and only really a problem if you didn't plan for it appropriately.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: