Which of the following is the more likely failure mode for a new product:
A.) So many customers wanted it that we couldn't scale fast enough.
B.) We ran out of time or money before we shipped.
The author appears to be worried about A, but in my experience it's B that you need to think about when starting out.
Imagine if, instead of building any of those crazy 30-node-diagrams of an architecture in the article, he'd had one guy build his entire product in a day as the equivalent of the "20 Minute Rails Blog Demo". Then shipped it via any of the thousand-odd boring ways to deploy such a thing.
He'd still have the same number of months to worry about stacking all those blocks into that unmaintainable tower of pain, but in the meantime his product would be out in the wild. Possibly even attracting the users that might one day make such a silly architecture necessary.
As it is, he'll still ship one day. But my money is that he'll never see traffic that would overload a single server.
Because that's what happens with 99% of the things one ships. The other 1% you can fix as needed. Possibly using AWS Lambda for the pieces that need it.
I am not worried about A at all. This was a technical article, but here's the explanation I posted to twitter:
Although I enjoy talking about technology, and I enjoy working with serverless, it's important to note that using serverless for @slackvacation is not a technology decision. It's a business decision. Not because the auto -scaling, that's nice to have, but that's not a problem startups are facing in their early phase. It's about financial incentives — also, about focus, ability to move fast and test ideas, and the ability to grow without shooting future self into the foot.
We still often fail to do things fast enough, but we are aware of the problem and working hard to fix it.
It sounds like part of your decision was that Lambda comes out cheaper for hosting. That also doesn't pass the arithmetic I use to evaluate such things.
A dev, fully loaded, will cost you $200/hour. A half cage at a colo with a really fast machine in it will cost $800/month. So if you choose a stack that adds 4 hours of work each month (or 80 extra hours upfront), you're behind.
Given that (as I touched on above), your chances of outgrowing a beefy box in a colo (or its managed equivalent) can be thought of as zero until you see the big success event that proves otherwise, my money is still on boring tech and boring hosting.
Granted, Lambda is cool. I love building stuff on it for other people on their dime. But as a guy who also builds businesses on my dime, it remains a tool for tiny niche cases that can safely go down on a saturday morning without making me cancel my weekend.
Lambda I guess would be like using the coffee pot in a hotel room? Don't worry about power, don't worry about beans. Every time you come into your room it's ready to go, just press "on".
the point of serverless is that everything is much simpler, it is just functions! it is more a question of how early do you break even on ROI in simplicity, and maybe that isn’t in the first 6 months of a startup’s life yet. But in the future the break even point may be zero minutes.
This a thousand times ^^ running on single box (or 2 + load balancing if you want HA) on something like DO would be a reasonable start for 99% of startups.
In retrospect, going serverless (using Serverless Framework) has been a terrible decision for us.
First, exposing an API served by AWS Lambda has the infamous cold start problem. It's not fun to wait a couple of seconds for a mobile app to respond just because the request hit in an unfortunate time. One solution we found is to use a Serverless Framework plugin to periodically ping lambdas to keep them hot. But each concurrent lambda execution is a separate container, so you have to anticipate a number of concurrent requests you will be receiving at peak, or want to handle without a ~1s latency. Ouch - what happened to effortless scaling? And Amazon API Gateway adds another 100-200ms of latency on top of Lambda.
If you want to use an SQL database, you have to add your lambdas to a VPC, unless you want to expose your database to the Internet. But if you need to access the Internet from your lambdas, then you need a NAT (which you pay for). And you get another couple of seconds to cold starts while AWS attaches a network interface to your lambda. So, if you don't want this, you're stuck with e.g. DynamoDB. DynamoDB is optimized for scaling but it's a poor fit for relational data. Very basic support for indexes, no transactions, and data model migrations are very painful. There is also no spatial data story for DynamoDB, whereas if we had used Postgres, we'd be able to make use of PostGIS which is awesome.
The tooling is terrible. Serverless Framework is very rigid and riddled with bugs - we ended up maintaining our own fork of Serverless alongside forks of 3 plugins and another custom plugin we wrote to support our very simple workflow. There's no faithful offline reproduction of the API Gateway -> Lambda environment. We frequently ran into issues when the code is deployed that wouldn't show up when testing with a "simulation" plugin locally.
There's a lot of other problems we ran into, but these are the biggest ones I could think of. It didn't help that we didn't have much experience with this technology before deciding to serve our API using it (this was probably our biggest mistake). I guess we should have stuck with what we're faimiliar with - normal, "serverful" apps. And with the DyanmoDB provisioned capacity costs, I'm not sure we saved that much in the end.
Unless you are going to run your db on the same server as your server application, are you not going to have a NAT regardless?
The Dynamo complaints isn't a Serverless issue, but a NoSQL issue. Hopefully NoSQL first/everywhere is finally dying as people realize all the benefits of a RDBMS (and shortcomings of NoSQL). And, if you knew you were doing something with geo data, picking Dynamo was a tool selection mistake from the get-go.
Our geo data needs still don't require a GIS, but we're slowly getting to the point where they do. At the time, it was unclear if a GIS would ever be needed - the company and product have done a 180° turn, as startups usually do. In hindsight, DynamoDB was a terrible choice, but we didn't have that hindsight back then. Had we stuck to the tools that we knew how to use, we'd have made the right choice, which is what I was trying to say in the parent comment.
This is very good insight and I suppose lot of people would be facing the same issues.
I think latency is an issue when you use Lambdas for sync operations. Fundamentally when using serverless, I think your application should become fully async i.e. event-driven. You may look at the 3factor pattern[1] to see if it's any better. Basically, 3factor says that all you should do in your application is simple operations, like CRUD, and emit events which trigger serverless functions (which create more events and so on). You should asynchronously receive updates via something like realtime GraphQL.
It's a managed service by Amazon and you access it using the AWS SDK, for which you need an IAM role, so everything is secure. Whereas with RDS, Amazon maintains a database for you, which you just connect to using ordinary TCP. IIRC it's not even possible to run an RDS database exposed to the Internet (well, you can always install Postgres on an EC2 instance, but that's even crazier).
Interesting, looks like that might be possible after all, though the docs are a bit unclear how this would work from Lambda specifically, and it looks like it needs a NAT regardless. Guess I misunderstood the docs at the time. (This ties into my comment about inexperience being our largest mistake. :))
This is great marketing but suggesting some symbiosis between serverless and startups is odd. Startups for the most part dont have scaling issues that are experienced on the server. They’re usually people, process, and financial issues.
I see posts like this and wonder how many people will use some new paradigm because people say “its fast” when they still have no traction (not suggesting the author doesnt) or they dont even understand the scaling properties of their software or business.
This completely depends on what the startup is doing. For my use case (company name in profile) auto scaling is a god send due to individual users being able to influence how much work our jobs backend needs to do. This impossible to calculate and plan upfront so we totally rely on Lambda for this.
I think I fail to understand how Lambda works better for startups. I see primarily 2 arguments.
The first one is simplicity. You pack your Node.js application, upload to lambda and it works. But then in the article I see a chart with API Gateway, then Lambda, then SNS and then Lambda again. How is that simpler than deploying a single Node.js application on DigitalOcean?
The 2nd argument is scalability. I see how this is relevant, I'm wondering though how often it becomes really useful. How many products experience unpredictable spikes in traffic that cannot be handled by a single server costing $250/month? (that's one c5.2xlarge server on demand).
I believe FaaS and Lambda are very useful, but I think I miss the point why would people move their whole applications there.
Have you done much with a framework like Serverless? The simplicity comes in because you can start writing business logic functions immediately. Plain Node typically uses a library like Express or Hapi. While not complicated, it is one more thing that is boiler plate and doesn't provide any business value.
If the app needs a queue or messaging, sticking to plain node does not really change the need.
Express and Hapi seem simpler than the Serverless framework you're replacing it with. You have to specify that one more thing[0] anyway, don't you? I honestly don't see how this is simpler.
There are frameworks like Serverless (the kinda default choice, `npm install -g serverless`), Zappa (in Python, not really up to date) and Apex (simple and nice option that is installed as a single Golang executable), that automate the back and forth between the API Gateway and Lambda in a config file.
I would think the biggest concern with "Serverless" would be vendor lock in. If a large portion of your SaaS product is "serverless" it's going to be very difficult to move when company A raises their prices, or company B comes in with a much more compelling product or price point.
Admittedly I don't know enough about the details to know how big of a deal this could be, or whether there is work toward a universal "serverless" standard. But I don't want Amazon/Google/Whoever suddenly raising their rates 2 years from now causing my startup to go under because the margins are so tight. And for those that say this will never happen, look at what Google just did with their Maps pricing.
In the grand scheme of things, at least for AWS, the only thing you do to make your app “serverless” is add a function with two parameters as your entry point. If you’re following standard software engineering principals of keeping your interface and your business logic separate, it’s not that hard.
On the other hand, the fear of “vendor lock in” is overrated. You’re always “locked in” to your infrastructure. Companies hardly ever change their infrastructure wholescale. The risk of regressions are too high and it’s usually not worth it.
Besides, if you rely on AWS services and are letting AWS do the “undifferentiated heavy lifting”, converting lambdas is the least of your issues.
If you are just using AWS to host a bunch of VMs, congratulations, you now have the worse of all worlds. You’re spending more than baremetal and you still have all of the management overhead.
Lock in is very easy to avoid. Keep you business logic in separate modules and wrap any specific cloud sdk calls in helper functions. Any major move to a new provider should be much easier. Lock in is often a non-concern if you stick to basic engineering principles.
For a startup, though, if it succeeds, you have to assume that a good portion of the code will eventually be refactored at some point, or retired as the buisiness is flushed out.
I have a feeling we'll also see some projects for transferring Lambdas between IaaS providers, including on-Prem at some point. I'd be more concerned about lockin at the DB-level but even that could be accounted for with up-front decisions to not use Dynamo, etc. if you were really concerned.
>> But I don't want Amazon/Google/Whoever suddenly raising their rates 2 years from now causing my startup to go under because the margins are so tight.
What kind of businesses run on very tight margins, that even some increase in the IT bill can cause them to fail ?
> What kind of businesses run on very tight margins, that even some increase in the IT bill can cause them to fail ?
Well, Startups! That's sort of common for startups, isn't it? I realize the term is basically being used to describe almost all "small businesses" these days, but I think it's fairly common for startups to have very short runway. That would (ideally) get longer as the startup matures. But I don't think it's uncommon at all.
Next to the nice drawings and good tips on testing I 100% agree with the message. Lambda has enabled me to run a fledgling business and I’m porting my last Puppeteer EC2 workloads to Lambda as we speak. Node 8 and the new Layers feature made this possible.
I was recently talking with some folks at an incubator about a website I was building, and one of the technical guys suggested I consider switching to serverless before launching (and I thought it was a good suggestion).
Serverless is an easy way for startups to overcome one of the harder technical challenges: scalability. This is purely anecdotal, but the majority of startups I've seen - including my own work - do not have the time or expertise to build out robust auto-scaling systems. They also don't have the money to dump onto a bunch of servers they can fall back to when needed.
But auto-scaling is arguably more important for startups than for well-established companies. Thanks to the unpredictablity of some random high-visibility influencer or journalist sharing your product without notifying you, it's easy for smaller startups to suddenly get hit with traffic that they can't handle with whatever infrastructure they have in place. Sometimes you only get one shot, and if a hundred thousand people hit an empty 503 page on their first visit, they may not come back for another try. Serverless design greatly mitigates that problem.
> Serverless is an easy way for startups to overcome one of the harder technical challenges: scalability.
Only as long as you (like Amazon's post) don't have a relational database in that architecture graph.
Serverless makes it way easier to scale your application server logic, but not necessarily your persistence layer. If you choose a relational database, you can run into peak load problems where you can't scale the database fast enough to meet huge spikes of load. It's completely up to you to either set a processing limit somewhere, or avoid persistence layers with peak scale problems.
I look forward to the day when AWS Aurora Serverless Postgres is in GA, but until then, if you want to deal with really bursty loads (like suddenly going from zero to thousands of simultaneous requests scaled out across Lambda when an ad runs), you either have to manually scale things up ahead of time, or be prepared to deal with Aurora Postgres cluster failovers and outages. You can't go from zero to huge on the relational database side without minutes/hours of infrastructure changes (or huge ongoing bills for capacity you're not using.)
> Serverless is an easy way for startups to overcome one of the harder technical challenges: scalability.
But is that kind of scaling really the hard part? If you're already using AWS for example, it is trivial to set up an autoscaling group that will add more EC2 instances the keep up with the rate of traffic. IME, scaling data stores is the hard part of scalability.
I'm not saying serverless doesn't make it a little easier, but it is a different tradeoff. AWS Lambda for example has a lot of limitations too. Maybe I'm interpreting your comment wrong, but I don't think it's fair to imply that serverless is the obvious choice for startups just because it helps overcome scalability.
Two things, based on my experience (we migrated from Heroku to Lambda in 2016, so been there for a while)
1. you don’t need to worry about reserving capacity, so you don’t need to pay for growth you expect to happen, or worry about not meeting a spike in demand if it happens
2. Most of the operations stuff (apart from packaging) is included in the price, so things like monitoring, alerts, dead-letter queues, traffic shifting beween canary versions, failovers, balancing... and it’s priced per request, so this comes to effectively free if you don’t have a lot of traffic.
The big catch is that you don’t control the containers, so there’s no session stickiness. Getting the benefits from Lambda requires re-thinking how you do sessions and storage.
That the thing though the dev overhead and limitations for startup with avg compute needs might be more $ than tiny savings (e.g who cares if it is $200/month vs $70/month) if it will cost 50k extra in dev time to architect for lambda
Because you're immediately having to go to a MSA, which immediately adds a whole new layer of complexity, monitoring, network latency, API coordination, and configuration to otherwise simple webapps.
In the case of .Net at least, you write your API like you always do and the SDK provides a wrapper. You can have multiple actions/controllers just like you would with a traditional Web API project.
having persistent processes ? Not needing API gateway? Not having to learn a whole bunch of AWS services? Not having to work around latency? Having a simple system that easy to debug?
If by “persistent processes” you mean storing state locally. You usually shouldn’t be counting on that anyway. My VMs are just as disposable as lambda.
API Gateway - we use API Gateway even for our VM hosted APIs for the other features it provides. We also integrate with swagger so creating and maintaining our API Gateway set up is basically automatic.
Not having to learn a bunch of AWS services - if you are using AWS for more than just hosting VMs, you would have to learn AWS anyway. Even if you are hosting a bunch of VMs, you still need to know about security groups, VPCs, subnets, ELBs, route 53, autoscsling etc.
Creating a lambda is adding one function to your code.
Easy to debug? Your code should not know nor care that you’re running serverlessly outside of your lambda handler. You test and debug your lambda code just like people have been testing APIs forever.
You dry up a test harness that calls the entry point of your domain logic. I don’t mean real AAA automated test, just an entry point for manual debugging.
there is startup cost you pay in latency. You have to debug systems in prod not just in dev and debugging lambda used to be pretty crappy experience. "if you are using AWS for more than just hosting VMs" thats the thing the original question was how is better for avg starup than simple DO setup?
You debug systems in prod with lambda just like you do with anything else - via a good logging infrastructure.
As far as how it’s better for a startup?
With AWS I get hosted Load balancers, multi AZ redundancies, autoscaling, managed queues, databases with failover, a mobile push infrastructure, unlimited storage, etc.
All stuff I don’t have to manage myself. There is absolutely no way that I could both develop and babysit infrastructure if it was all just a bunch of VMs. The money we save by not having a dedicated ops department and just one guy who does the grunt work more than makes up for the cost of AWS.
Not to mention, the speed at which I can stand up infrastructure.
I would never work for a company that wants to save a few bucks by having as part of their process “manual failovers” that might cause me to wake up in the middle of the night when they could have just allowed me to properly architect the thing in the first place.
Ive seen people having to wake up way more often when running overarchitected systems that is beyond their ops capability vs having simple solution that requires manual intervention a few times a year. So the proper solution would be what say for PG: HAProxy in HA configuration + Patroni (so ETCD). So from running 2 postgres instances you went to running 3 postgres instances a distributed database Patroni HAproxy in HA configuration so you need floating IP some heartbeat service etc.
Sadly, you'll still encounter pages in the middle of the night (well, if you implement the monitoring) when AWS services fail and you need to write up a support ticket.
Granted, they're usually already aware of it, but it can still take tens of minutes to hours to resolve - time during which you are down.
In the case of the parent poster - using AWS to manage load balancers and database failovers, how often would a failure be noticed by the end user?
Yes, I know what happens when client apps/servers cache the DNS entry too long and don’t notice when a failover takes place - been there done that.
And the quickest way to recover from a failure of servers that don’t store state is just to let autoscaling kill it and bring up another instance based on an AMI/startup scripts.
I’m ruthless about not having servers as pets for anything that we have to manage ourselves.
I tell our Devops guy that it is completely useless to keep an inventory of server names, IP addresses, etc. for anything that I’m responsible for.
And too often, developers (smaller companies) and ops (larger companies) time is considered free because they are salaried and are expected to work more than 40-45 hours per week.
Yes, earlier in my career I would have accepted that, but I’ve successfully pushed back a few times and said I won’t babysit a process/server etc but I will design a method where it is auto healing, autoscaling, etc.
As someone who has been both the server baby sitter and implementor of auto-healing/scaling, it still puts a smile on my face when a server fails a health check, AWS kills it, starts another one, and moves the work over. All with zero downtime. I just see the emails in the morning that it happened.
The other huge benefit of building this process is we can use dirt cheap spot instances with near abandon. For me, any infrastructure that I can't treat as cattle is on my hit list.
I am just pointing out that no downtime applies to failing over very specific services. In general AWS had pretty horrible track record on outages. A single top tier DC has way better uptime vs AWS Region.
Region. The multi Region thing is partial BS too if US East has a total failure there is not enough spare capacity to absorb all of US East compute in other Regions.
I do wonder if, in a few years time, if we're not going to be seeing a new genre of technical blog posts: "How we migrated off serverless to reduce our costs"
Yes but probably not for the reasons you're thinking.
If you try to get things perfect the first time you'll never ship on time.
Doing the right things in the right order usually involves optimizing different constraints at the start and re-balancing your optimizations as you prove your product idea, learn what you don't yet know, and gain additional resources and time.
Absolutely. Saving 30% on a cloud bill is not a super high priority for new companies. Staying frugal is important, but I find this mostly applies to unnecessary expenses, not necessary but slightly more costly expenses.
Don’t get the 27” inch iMac, get a second hand office chair. This will free up enough cash so you can take the more expensive but quicker hosting / infra options like Heroku and Lambda.
One of the things with Lambda etc is that while they are cost-effective (and flexible) when a business starts, they also can also lead to lock-in (not just with the serverless functions themselves, but queueing services etc). A few years down the track, a business may find themselves spending quite a lot of money with Amazon, and being unable to move away without a major re-engineering effort.
Then again if you're aiming at high growth, you'll be looking at some degree of re-architecting at least every few years anyway to cope with the 10x system problem.
If using FaaS can be a quick bandaid solution to get you up and running without worrying about having to design your own equivalent of the fail whale, then a certain degree of lock-in might be worth paying.
Well, at least you have a business. I find vendor lock in to be about the lowest priority on my list. Also, the lock in is pretty exaggerated. Just make sure to keep business logic separate from the cloud plumbing, as per the OP example.
It's not just about separating business logic from "cloud plumbing". It's building you infrastructure around services that cannot simply be replaced elsewhere, or cannot be replaced easily - if you want to move from AWS Kinesis to (say) Kafka for message broking, or from DynamoDB to Postgres.
There is a reason Amazon gives away a lot of free credits to startups, and it's the same reason drug dealers give away free samples. If the economics are right, and growth is strong then you might be okay - but you also don't want to find yourself with a $100k/month AWS bill in a startup with only $2m of annual revenue.
(P.S Greetings from the other side of the Spree. I love La Lucha on your street!)
The useful thing is that in the case of at least Azure Functions, their back-end is open source. There's plenty of open source alternatives, so you can migrate to a bare metal hosted alternative if it brings down costs and maintains the same level of scalability. This also emphasizes the need for cloud agnostic frameworks.
Every startup wants to reach the point where vendor lock-in is an issue. IMO, it's a useless boogeyman. If vendor lock-in lets the start up move faster, find the business, and make it 2-3 years down the road, they should pick vendor lock-in every single time.
I think there is place for all kinds of software and infrastructure, based on your business model and scale.
You can start with serverless, move to normal cloud instances and finally to bare metal or the other way around!
For example, the core value provided by an ERP software is not performance but the business logic. Serverless is a good fit for startups creating such software. If you are running a conference call business or serving tons of video like Netflix, a different architecture better suits your need. Such as having edge servers with every ISP.
A.) So many customers wanted it that we couldn't scale fast enough.
B.) We ran out of time or money before we shipped.
The author appears to be worried about A, but in my experience it's B that you need to think about when starting out.
Imagine if, instead of building any of those crazy 30-node-diagrams of an architecture in the article, he'd had one guy build his entire product in a day as the equivalent of the "20 Minute Rails Blog Demo". Then shipped it via any of the thousand-odd boring ways to deploy such a thing.
He'd still have the same number of months to worry about stacking all those blocks into that unmaintainable tower of pain, but in the meantime his product would be out in the wild. Possibly even attracting the users that might one day make such a silly architecture necessary.
As it is, he'll still ship one day. But my money is that he'll never see traffic that would overload a single server.
Because that's what happens with 99% of the things one ships. The other 1% you can fix as needed. Possibly using AWS Lambda for the pieces that need it.