We deploy to production over 100 times a day

Nathanba · on May 16, 2022

I really wonder how an HTTP application doesn't suffer from performance hits when it's based on 2000 micro services. Lets say even just 30 of those get used by a call to monzo.com. How does this not cause at least let's say ~300ms delay? I guess all the calls actually come from a local memory cache and there is almost never a real http call made to these microservices. Otherwise I have no idea how microservices are ever viable.

singron · on May 16, 2022

Some of the calls can fan out in parallel, but in my experience it's not good for performance, even with fewer services. The remote call overhead certainly adds up, but another issue is that each service does redundant work. E.g. each service might have to fetch settings for the user in order to respond. You can refactor to eliminate this (e.g. fetch settings once and pass as a parameter to each call), but it's a lot more work to make this change across many services.

ItsBob · on May 16, 2022

If you look at something like K8s, it'll try and make contact with the other service running on the same node as itself, thus removing networking delays.

It's entirely feasible that all incoming requests will hit a single node and stay there.

Still, hitting multiple microservices in a single request will have an overhead when marshalling all the http requests and whatnot.

samhw · on May 16, 2022

I worked at Monzo a while back, and, while the 'Platform' (~= SRE) team were brilliant, and did what you describe along with much much more, regardless the performance impact of thousands of microservices could be characterised as approximately "what you would expect". Hundreds of thousands a month on AWS to service a few million customers, a whole team needing to work on a project for [I can't recall how many] months to write some bodge-y fixes so app load could be brought under 10 seconds, lots of pathological request paths efflorescing in the service graph, etc.

That being said, there was genuinely need for microservices, to an extent. A bank's architecture is very different from a CRUD web app. Most of the code running wasn't servicing synchronous HTTP requests, but was doing batch or asynchronous work related (usually at two or three degrees of separation) to card payments, transfers, various kinds of fraud prevention, onboarding (which was an absolutely colossal edifice, very very different from ordinary SaaS onboarding), etc.

So we'd have had lots of daemons and crons in any case. And, to be fair, we started on Kubernetes before it was super-trendy and easy to deploy - it very much wasn't the 'default' choice, and we had to do a lot of very fundamental work ourselves.

But yeah, in my view we took it too far, out of ideological fervour. Dozens of - or at most a hundred-ish - microservices would have been the sweet spot. Your architectural structure doesn't need to be isomorphic to your code or team structure.

willsewell · on May 16, 2022

This is a potential downside of this architecture. As others have already said, there are mitigations, but fundamentally each edge request is going to be more costly (time or money) to serve than having it served by a single machine/DB.

One of our engineering principals is to avoid premature optimisation, which is possibly one of the reasons our architecture has grown in this way. So far, whenever we've needed to fix a performance issue we've been able to solve it locally rather than change the architecture.

At the business level, we've been optimising for growth rather than costs, but this could change in the future, at which point we may need to reconsider our architecture. But for now it's working for us.

mc4ndr3 · on May 16, 2022

Where I've worked, we would use Jenkins for "Continuous Delivery" but then require upper management signoff for every single damn change.

"We don't have time to test" -> Production poopfires -> "We need an out of band code review process"

The faces on these people when one proposes deployment in terms of basic git triggers on protected branches.

thekiptxt · on May 16, 2022

I can relate to this, and it gets worse.

Sign-off would take 24 hrs to process, and the mysterious entity that signed off would have no context of our product or the changes, and no way to assess the risk.

Then, due to our lack of trusted regression testing, every damn change would take ~6 developers an entire work day to manually confirm that everything was working.

Why? We were measured on “number of tests.” The tech debt was too high to write quality tests before the next review, so we opted for quantity if we wanted promotion.

This was the hot path in a major (Fortune 10) financial company.

throw1234651234 · on May 16, 2022

This is comical. Your company needs 3,000 deploys a month? You deliver 3,000 features a month? Your BAs/PMs identify 3,000 useful features a month? This isn't trolling - this is a call out of the absurdity you are presenting as a positive.

TheCoelacanth · on May 16, 2022

Why would a deploy need to be a feature or something that PMs identify?

On a high-functioning team, you should have enough logging and monitoring that developers can identify many useful changes without PM involvement.

A deploy could be something as minor as "I noticed an error getting logged in production, here's a one-line change to fix it" or "This operation is running slowly, here's a tweak to the query so that it hits an index".

throw1234651234 · on May 17, 2022

I didn't consider bugs / performance enhancements.

samhw · on May 16, 2022

Hmm, there were some absurdities when I was on the team, but I wouldn't have identified that as one of them. I'd characterise the service count as possibly slightly overboard, but not the deploy count: making changes super-incrementally was absolutely a positive thing compared with other places I've worked, although it would never work without an almost-zero-friction deployment process.

Done right, it avoids bugs - especially complex hard-to-roll-back state-related bugs - that can arise from discrete waterfall-style releases of big and complex features all at once.

Your comment seems to be making a hell of a lot of assumptions and associations which seem very specific to the environment you work in[0], but stated as though you think the entire world must be working in the same way. It feels a bit like a child insisting that they don't speak with an accent.

[0] "One deploy is one feature", "every company has 'BA's and 'PM's" [we had the latter but I'm scare-quoting both because they are far from universal], "each feature has to be 'identified' by said BA or PM", etc. Also, this seems to be written not only from a staid perspective but from a small-co perspective; for large companies, even 3000 features is not a huge number, and wouldn't be more than maybe 5 or 10 per team.

throw1234651234 · on May 17, 2022

See comment on bugs. Our releases are "frictionless" too, but yes, I am hard-pressed to even imagine 3000 improvements a month to any major brand that comes to top of mind.

samhw · on May 18, 2022

Bugs? Roll back. Bugs are generally proportional to LOC, not number of deploys - given that's a DevOps variable and not a programming one – so the only relevant difference I can imagine is less chance of a multi-bug, multi-factorial, multi-dependent, nightmarish-to-fix incident.

tauchunfall · on May 18, 2022

Depends on the perspective. Adding one data field to a Kafka record, or Protobuf message, or Bigtable cell, or Postgres table is already an improvement.

There is already so much work to do consuming data from A, transforming it, and storing it into B.

tauchunfall · on May 16, 2022

I work on a project with micro-services. Just looked into our repository: we merged around 1,100 pull requests in the last 30 days. That's a least this number of re-deployed services to our development cluster with a release train to staging and production clusters.

I'm not sure how many "feature" or stories that is. Maybe we need two to four pull requests per story.

Destiner · on May 17, 2022

I think the idea is more about "avoid bundled changes that are harder to debug and roll back". A change doesn't have to be a new feature, might be a bug or typo fix.