Yes, perfs are better on bare metal, and it's cheaper too. In fact, even a cheap VPS will perform better than the cloud for much less money, and scale vertically to incredible highs.
You do have to do a lot of things manually, and even doing that, you won't get the high availability and elasticity you have the potential to get from a cloud offer. Potential being the key word though.
But honestly, most projects don't need it.
Crashing used to terrify me. One day I started to work for a client that had his service down once a month.
Guess what ? Nothing happened. No customers ever complained. No money was lost, in fact, the cash flow kept growing.
Most services are not so important they can't get switched off once in a while.
Not to mention, monoliths are more robust they are given credit for.
I've seen a fair amount of outages caused by the extra complexity brought on by making a system distributed for the purposes of high availability.
Hardware is actually quite reliable nowadays, and I'll trust a hardware single-point of failure running a monolithic application more than a distributed microservice-based system with lots of moving parts.
Sure, in theory, the distributed system should win, but in practice, it fails more often (due to operator error or unforeseen bugs) than hardware.
> Sure, in theory, the distributed system should win, but in practice, it fails more often (due to operator error or unforeseen bugs) than hardware.
Isn't this because of the rampant introduction of accidental complexity whenever you attempt to make a system horizontally scalable - e.g. for whatever reason the developers or the people in charge suddenly try to cram in as many technological solutions as possible because apparently that's what the large companies are doing?
There's no reason why you couldn't think about which data can or cannot be shared, and develop your system as one that's scalable, possibly modular, but with the codebase still being largely monolithic in nature. I'd argue that there's a large gray area between the the opposite ends of that spectrum - monoliths vs microservices.
Not to say that some systems don't introduce accidental complexity, however, getting distributed systems right/correct is necessarily complicated. Design for failure and you will get a good picture of what tools you need to employ.
I think that you can keep systems simple and HA if you forget about multi-master components and design active/passive solutions instead, whose behaviour in failure scenarios is much easier to understand and much less likely to go wrong, especially if you also forgo attempting to autorecover failed components after a failover event.
If no HA is fine 90% of the time, simple HA is fine 99% of the time.
I think another issue is that there are plenty of robust, battle-tested building blocks such as databases for monolithic, vertically-scalable applications, but much fewer for horizontally-scalable ones, meaning you'll need to roll your own (at the application layer) and most likely screw it up.
I do agree with you in that regard, however, that's also a dangerous line of thinking.
There are attempts to provide horizontal scalability for RDBMSes in a transparent way, like TiDB https://pingcap.com/ (which is compatible with the MySQL 5.7 drivers), however, the list of functionality that's sacrificed to achieve easily extensible clusters is a long one: https://docs.pingcap.com/tidb/stable/mysql-compatibility
There are other technologies, like MongoDB, which sometimes are more successful at a clustered configuration, however most of the traditional RDBMSes work best in a leader-follower type of replication scenario, because even those aforementioned systems oftentimes have data consistency issues that may eventually pop up.
Essentially, my argument is that the lack of good horizontally scalable databases or other data storage solutions is easily explainable by the fact that the problem itself isn't solvable in any easy way, apart from adopting eventual consistency, which is probably going to create more problems than it will solve in case of any pre-existing code that makes assumptions about what ways it'll be able to access data and operate on it: https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...
To that end, i'd perhaps like to suggest an alternative: use a single vertically scalable RDBMS instance when possible, with a hot standby if you have the resources for that. Let the architecture around it be horizontally scalable instead, and let it deal with the complexities of balancing the load and dealing with backpressure - introduce a message queue if you must, maybe even an in-memory one for simplicity's sake, or consider an event based architecture where "what needs to be done" is encapsulated within a data structure that can be passed around and applied whenever possible. In my eyes, such solutions can in many cases be better than losing the many benefits of having a single source of truth.
Alternatively, when you actually hit issues with the above approach (and only then), consider sharding as a possibility, or, alternatively, do some domain driven design, figure out where to draw some boundaries and split your service into multiple ones that cover the domain with which you need to work with. Then you have one DB for sales, one for account management, one for reports and so on, all with services that are separated by something as simple as REST interfaces and with rate limits or any of the other mechanisms.
If, however, neither of those two groups of approaches don't seem to be suitable for the loads that you're dealing with, then you probably have a team of very smart people and a large amount of resources to figure out what will work best.
To sum up, if there are no good solutions in the space, perhaps that's because the problems themselves haven't been solved yet. Thus, sooner or later, they'll need to be sidestepped and their impact mitigated in whatever capacity is possible. Not all components can or should scale horizontally.
Have you ever seen CockroachDB, or Yugabyte? Both are not eventually consistent, but are horizontally scalable. It looks like to me it is a solved problem.
The biggest benefit of HA architectures is IMO not resilience from crashing or overloaded systems, but more that it is often a prerequisite for doing zero-downtime updates and partial rollout of updates.
No more emergency revert to last version because the new version didn't start up, only to realize there was a schema update so you also must restore your data from a snapshot, which nobody knows how to do. All this under stress from the service being completely down.
In most cases, it's only cheaper if you value engineering time at $0. Fine for a personal project, but the math changes when you're paying a fully-loaded cost of $100-200/hr for engineers and you have a long backlog of more valuable things for them to work on.
That's the real reason companies use cloud services: Not because it's cheaper or faster, but because it lets the engineers focus their efforts on the things that differentiate the company.
From another perspective: I can change the oil in my car cheaper and faster than driving to a shop and having someone else do it. However, if my time is better spent doing an hour of freelancing work then I'll gladly pay the shop to change my oil while I work.
> In most cases, it's only cheaper if you value engineering time at $0.
Clearly not at $0 since the cloud is much more expensive and you need more of it because it's slower. If you could have someone do setup and maintenance for free, obviously you'd be way ahead.
So the question is really how much does it cost and whether it's cheaper than the cloud tax.
At a previous startup our AWS bill would've been enough to hire about 3 to 4 full time sysadmins at silicon valley rates. Our workload wasn't huge. I estimated at the time we could've taken one month of AWS cost to buy more than enough equipment to run everything in-house with redundancy, then hire two people to run it all and bank the rest.
At current startup, we're still very small and have no real customer traffic, but the AWS bill is getting close, not there yet, to pay for one full time person. For a workload that could be hosted on a few ancient laptops over a home DSL line if we wanted.
Yes, there's quite the convenience in being able to just click a new instance into existence, but the true cost is also very high. Sure it's just VC money we're all lighting on fire, but I often wonder how much more could be accomplished if we didn't collectively just hand over most of the funding to AWS.
Eh, everything can be automated just as well on bare metal. "The cloud" tends to add complexity not remove it, or at best replace one kind of complexity with another. Bare metal tooling is less familiar and could use some advancement, but basically anything that a cloud compute provider can do can be done on bare metal.
A lot of orgs just never bothered to learn how to automate bare metal and ended up doing a lot of manual work.
> Eh, everything can be automated just as well on bare metal.
Right, but that's not free.
Anything can be automated to match the level of commercial cloud providers given sufficient time and money, but it's far from free.
It's easy to underestimate engineering costs when they're hidden behind a fixed annual salary, but once you're looking at the cost/benefit analysis and actually accounting for all the time spent managing bare metal, it's not a good tradeoff for small or even medium size companies.
People universally build big complicated systems around cloud providers which are equally not free.
Unless you are doing something extremely basic, this is what gets done, with a good deal of experience it seems the complexity of what people do to manage and automate cloud providers is more or less equal to what they do to manage bare metal hardware well.
Well, not really. If you are using a cloud solution, you usually needs an engineer that knows this particular solution. Outside of the HN bubble, it's a rare breed and it cost a lot more than your traditional linux admin, that you probably already have anyway.
Then you need to design, maintain and debug the distributed cloud system, which is more complex.
So you'll have a dedicated person or team for that in both cases.
On the other end, setuping a linux box for the common tasks (web server, cache, db, etc) never takes me more than a day.
> Outside of the HN bubble, it's a rare breed and it cost a lot more than your traditional linux admin, that you probably already have anyway.
Strongly disagree. It's much easier to find people with experience deploying to particular cloud services than it is to find someone who knows how to setup, managed, and maintain a bare metal Linux system from scratch. It's also much easier to teach a cloud solution than to teach someone everything to know about bootstrapping and maintaining Linux servers in a safe and secure manner.
> On the other end, setuping a linux box for the common tasks (web server, cache, db, etc) never takes me more than a day.
That's ignoring all of the future overhead of maintaining it and keeping up with updates. Not to mention the required documentation and knowledge sharing to help others understand your custom system so the bus factor isn't 1.
It's always easy to get started with one-off systems, but that's missing the bulk of the work.
Kinda playing the devil’s advocate, but cloud solution aren’t inherently more complex than administrating a traditional linux.
Thinking of securing the OS, configuring a deploy mechanism, managing updates, dealing with lib compatibilities once you have more than 1 app running, etc. That’s a pretty vast and deep set of skills needed.
You assume there is already one of such admin in the company, but going cloud first reduce all of the above to Docker/container knowledge, and understanding cloud configurations, which can go from a heroku setup, a GCE set of instances up to a small GKE cluster. You can introduce as much complexity as you are comfortable with.
Oh like that one system I saw once with an uptime of 10 years that was happily chugging away data (not web facing tho).
Bare metal servers with a propper fallback and a working monitoring/notification system can be incredibly reliable and for most purposes definitely enough.
>"You do have to do a lot of things manually, and even doing that, you won't get the high availability and elasticity you get from a cloud offer."
I run my things on bare metal. In case of hardware failure it takes less than 1 hour to restore servers from backup and there was exactly zero hardware failures for all its lifetime anyways. I also have configurations with standby servers but I am questioning it now as again there was not a single time (bar testing) when standby was needed.
As for performance - I use native C++ backends and modern multicore CPU with lots of RAM. Those babies can process thousands of requests per second sustainably without ever breaking a sweat. It is more than enough for any reasonable business. All while costing fractions of a cloudy stuff.
With ZFS or even RAID, there should ideally never be a need to "restore from backup" because of a conventional hardware failure; storage drive malfunctions nowadays can and (IMO) should be resolved online.
This is of course not a reason to avoid backups, but nowadays "restoring from backups" should be because of operator error or large-scale disaster (fire, etc), not because of storage drive failure.
Nowadays I'd be more worried about compute hardware failure - think CPU, RAM or the system board. Storage redundancy is IMO a long-solved problem provided you don't cheap out.
Either ZFS or some form of RAID with redundancy cranked up to the max, so that it can tolerate a high number of drives failing and still continue operating. ZFS configured in RAIDZ3 mode for example can tolerate 3 drives failing and still be able to operate (and rebuild itself once the failed drives are replaced).
You are very unlikely to have 3 drives fail at the same time, so you're pretty much immune to storage hardware failures. Again, this is not an argument against backups (they're useful for other reasons), but with this amount of redundancy on the storage level, I'd expect something else in the system to die before the storage layer fails catastrophically.
> You are very unlikely to have 3 drives fail at the same time, so you're pretty much immune to storage hardware failures.
You have to be careful. If you take three storage devices off the production line and use them as a mirrored array and one fails, there's a good chance they all fail, because you're not just dealing with storage hardware, but storage software too.
HP had two batches of SSDs that failed when the power on time hit some rollover point. And I don't think they're the only one. Fabrication issues are also likely to turn up simultaneously if given equal write loads.
If failures were independent events, you're spot on, but they may not be.
I would add to this that certain combinations of modern SSDs (like nvme) can put disk performance equal to or in excess of RAM performance, too, so there are other considerations besides resilience, too.
I've seen people do this for DB-intenstive things like large search server deployments and map rendering. Doesn't take many nvme mirrors to get past DDR4 speed.
Feel free to send me an email (address in my profile) and we can discuss. I don't think I'm the right person to teach system administration, but we can always have an informal conversation and I can hopefully point you to better resources.
I have a service at work that only needs to be up during a couple of 4 hour intervals every week. It's one of the least stressful elements of my job.
There's a cost to always-on systems, and I don't think we're accounting those costs properly (externalities). It's likely that in many cases the benefits do not outweigh the costs.
I think it comes down to your ability to route around a problem. If the budgeting or ordering system is down, there are any number of other tasks most of your people can be doing. They can make a note, and stay productive for hours or days.
If you put all of your software on one server or one provider, and that goes down, then your people are pretty much locked out of everything. Partial availability needs to be prioritized over total availability, because of the Precautionary Principle. The cost of everything being down is orders of magnitude higher than the cost of having some things down.
One thing I don't get is why not have an on prem solution with cloud fallback? It's hard but not super hard. You would just need a cloud data store. And depending on your app, you can have on prem data store that periodically backs up (and no egress under normal operation) if you can design with an eventual consistency models.
I recently moved a smallish business application from two bare-metal servers onto Azure VMs. It's a standard PHP 7.4 application, MySQL, Redis, nginx. Despite the VMs costing more and having twice the spec of the bare-metal servers, it has consistently performed slower and less reliably throughout the stack. The client's budget is too small to spend much time looking into it. Instead, they upped the spec even further, and what used to cost £600 per month now costs £2000.
(Disclaimer) As a bare metal provider, I hope more people become aware what I've been saying for years: cloud is great for scaling down, but not that great for scaling up. If you want to have lots of VM's that don't warrant their own hardware that you can conveniently turn up and down, then cloud is fantastic. If you have a big application that needs to scale, you can get further vertically with bare metal, and if you need to scale horizontally, you need to optimize better higher up in the stack anyway, and the much lower cost for equivalent resources (without even taking any virtualization performance hit into account), more flexibility and thus more/better fitted performance of bare metal should have the clear advantage.
Sorry, the website is pretty outdated. We're almost exclusively rolling out AMD EPYC3's these days, and we'd price any of those older configurations much lower than what the website lists them at. Nobody, other than spammers, ever orders through our website (although to be fair, our website may be to blame for that also). We get all of our business through word of mouth, and keep busy enough on that alone, so the website hasn't been a priority.
You really ought to take your price lists down if they're that out of date. As it stands, they're probably driving some potential customers away -- even ones who heard of you through word of mouth, but decided to do a bit of research before proceeding.
Look at Hetzner / OVH. I got incredibly good deals from them on dedicated servers. I think I am paying around $150 Canadian for AMD 16 cores, 128GB ECC RAM, do not remember storage.
Updated: I also run some things right from my home office since I have symmetric 1Gbps fiber. For that I paid around $4000 Cdn for off lease 20 core 3.4GHz 512GB RAM server from HP
From our experience, if we price too low we get people who expect the world for bottom dollar. $199 is more the minimum price point at which we’re generally willing to take someone on a customer, than a reflection of the price of a base configuration server. Anyone e-mailing us for a quote, if they seem like they’re on the up and up and we like what they're about, we will usually give a pretty good discount. Most business nowadays are for people ordering several servers at a time and they will always request a custom quotation anyhow, and we're pretty aggressive with larger volume orders.
Before there was a "cloud" in the early 2000s, we were paying anywhere from $100 (cheap, low spec machines) to $199 (new hardware, more ram, faster disks) per month for rented bare metal from places like Servermatrix, Softlayer, etc.
The going rate also typically included anywhere from 1TB to 2TB of egress, as well.
Another way of looking at it for the low end: How many things do you run on a single bare metal host versus an equivalent amount of discrete "services"?
> what I've been saying for years: cloud is great for scaling down, but not that great for scaling up.
Yes and no. The cloud isn’t cheap at running any lift and shift type project. Where the cloud comes into its own is serverless and SaaS. If you have an application that’s a typical VM farm and you want to host it in the cloud, then you should at least first pause and identify how, if at all, your application can be re-architected to be more “cloudy”. Not doing this is usually the first mistake people make when deploying to the cloud.
I think “the cloud” never claimed to be cheaper, though. Its promise is mainly that you’ll offset a lot of risks with it against a higher price. And of course the workflow with the cloud is very different, with virtual machine images and whatnot.
Whether that’s worth the price is up debate, though. I hope we’ll get more mature bare metal management software, perhaps even standardized in some remote management hardware so you can flash entire bare metal systems.
Right now I’m mostly running Debian stable and everything inside Docker on the host machine, making it effectively a Docker “hypervisor”. It can get you quite far, without leaking any application state to the host machine.
Oh I agree, it is easier to mitigate some risks in a cloud solution. But this client - and they’re not unusual in their thinking - believes the cloud is somehow automatically mitigating those risks, when in fact it’s doing nothing because they’re not paying for it.
In this specific case, they chose Azure. They had a consultant in a fancy suit tell them Azure would be safer, and proposed an architecture that ended up not working at all. But they still went with Azure, and it’s difficult to point to any gains they’ve got for the 200% price increase.
Who actually runs Centos 7 ( Kernel 3.10 ) for benchmarks in 2021? Run something recent like Ubuntu 20 + KVM, you will see big difference. I don't beleive modern virtualization has ~20% overhead ( it should be less than 5% ).
Can you find sources for those "less than 5%"-numbers form people who aren't selling kubernetes or cloud-related services?
It's generally pretty easy to construct benchmarks that make look favorable. It's why there's constantly blog posts to the effect of "$INTERPRETED_HIGH_LEVEL_LANGUAGE is faster than C++".
I mean one could say kubernetes has virtualized components, and requires VT-d extensions to operate at an accelerated speed, but I don't think containers are truly virtualized. So you can probably get away with a less than 5% benchmark if the stars aligned.
With a hypervisor you're looking at 10-15% overhead, typically. maybe getting down to 7-12% using tricks (paravirtualization, pci-passthrough etc). In my environment i am around 12% OH on a good day
i've seen way les than 10% overhead on modern hardware with sr-iov compat devices. it really depends on your network card/raid controller. some servers (and some cpus) are just not optimal for running virtualized stuff.
Indeed, standard container runtimes neither require VT-x (virtualization instructions) or VT-d (IOMMU for PCIe devices). They're "just" process isolation (cgroups), network stack isolation (netns), and filesystem isolation (overlayfs). Workloads on Kubernetes are not using virtual hardware unless you choose a managed provider or container runtime that states they are.
I saw quite a significant performance and resource usage benefit from migrating away from a virtualized kubernetes environment to bare metal debian, so their findings align well with my anecdata as well.
Another thing to check is how nginx was compiled. Using generic optimizations vs. x86_64 can do interesting things on VM's vs bare metal. nginx and haproxy specifically should be compiled generic for VM's. I don't have any links, just my own performance testing in the past.
A binary running in a VM is still executing native machine code, so compiler optimizations should have the same effect whether running on bare metal or a VM.
Should being the key word. In truth the implementations of each hypervisor vary. Try it on each hypervisor that you use. I found KVM to have the most parity to bare metal performance.
I'm struggling to think of a situation when running virtualized vs. bare metal where compiler optimizations would matter.
Certain hypervisors have the ability to disable features on the virtual CPU to enable live migration between different generation physical CPUs, in which case a binary that depends on a disabled virtual CPU feature (a.g., AVX-512) will simply crash (or otherwise fail) when it executes an unsupported instruction.
Other than that, I'm drawing a blank.
Hypervisor performance will vary, but I can't envision any scenario where a binary optimized for the processor's architecture would perform worse than one without any optimizations when running on a VM vs bare metal.
That's the point. It shouldn't matter but in fact it does. You can see this for yourself if you use the x86_64 optimizations on VM's in benchmark tests. And you will see various results depending on hypervisor used and what application is used. This will even change with time as updates are made to each hypervisor. What I am describing is exactly what is not supposed to happen which is why you are struggling to think of a situation where this should matter. You are being entirely logical.
IIRC, hypervisors have to preserve CPU registers and other processor-related state when switching between "worlds". This is why mitigations for CPU vulnerabilities affects hypervisors too.
Most compilers assume that emitting the code in certain modes (SSE/AVX etc.) have particular cost. That cost may drastically change depending on how the implementation of the hypervisors handles the registers in question.
The point of a hypervisor is that any instruction can potentially be trapped and either emulated or substituted with others. If your application uses a lot of instructions that get trapped and end up using slower emulation it will hurt performance.
Yeah but most the instructions that get emulated are not used in applications. They are instructions things that operating systems do like sending interrupts to physical cores (which ofc need the hyper visor).
Am I reading correctly that there is a huge difference in http versus SSL requests per second? e.g. in the Bare Metal 1 CPU case it's 48k http to 800 SSL? I had no idea the performance impact of SSL was that huge, if this is correct.
>The Ixia client sent a series of HTTPS requests, each on a new connection. The Ixia client and NGINX performed a TLS handshake to establish a secure connection, then NGINX proxied the request to the backend. The connection was closed after the request was satisfied.
I honestly struggle to understand why they didn't incorporate keepalives in the testing. Reusing an existing TLS connection, something done far more often than not in the wild, will have a dramatic positive effect on throughput.
Sure, for a first order approximation that works well. I still think there’s something worth exploring here relative to bare metal vs hypervisors around AES-NI passthrough and cipher suites.
My reaction came from the fact that someone that doesn’t understand how much cheaper symmetric encryption is might look at this and just think wow there’s no way I’m using SSL on my back end.
I wonder how many of the extra cpu cycles are in PV network interfaces. It would be interesting to see how this works out with SR-IOV capable NICs with a VF in each VM.
You do have to do a lot of things manually, and even doing that, you won't get the high availability and elasticity you have the potential to get from a cloud offer. Potential being the key word though.
But honestly, most projects don't need it.
Crashing used to terrify me. One day I started to work for a client that had his service down once a month.
Guess what ? Nothing happened. No customers ever complained. No money was lost, in fact, the cash flow kept growing.
Most services are not so important they can't get switched off once in a while.
Not to mention, monoliths are more robust they are given credit for.