Kubernetes handles most of this seemlessly for the cluster infrastructure.
The central master handles node failures by removing nodes that aren't heartbeating.
On the node, we require a process monitor for the kubelet (by default we use supervisord) but then the kubelet monitors Docker [and also does garbage collection and resource limiting ], and then all of the other node daemons (e.g. the kubernetes proxy) are run/monitored/restarted by the kubelet.
Fully agree - K8s has tons lots of self-healing capabilities as long as it functions correctly. Our goal was to extend this with a lower layer that will detect failures that are mission critical for k8s deployments to run properly like etcd, docker state and skydns
this was originally sent out on twitter by a bunch of people (including myself) who are working on containers (and have a container-oriented set of followers) so I'm pretty certain there is a forward looking bias in the data.
I tried to describe this bias at the top of the blog post.
Its great to see this launch. We've had flexible resources for container workloads for a while, its great to see them be available for more traditional VM workloads too!
Or of course the "as a service" versions like Google Container Engine, CoreOS Tectonic, etc.
We're definitely also working on simplifying the install instructions.
I'd also like to expand a little on some of the features that differentiate Swarm from Kuberentes, namely:
* Secrets
* Replicated sets of containers
* Rolling update from one version of code to the next
* AutoScaling (Kubernetes 1.1, in Release Candidate now)
* HTTP load balancing for autoscaling
* Load balancing for sets of objects (e.g. Frontend)
* Service discovery of those replicated sets (Swarm has discovery for individual containers, but no concept of Service)
In general, our goal is to build a system that makes distributed system construction easier.
To expand on what Brendan wrote, Kubernetes does a lot more than just run containers. It provides container-centric infrastructure and a platform for building robust automation.
In a VM-centric world, one wouldn't just use raw VMs in production, but would also use managed groups, load balancing, autoscaling, DNS, Spinnaker, etc. If one manually pins specific containers to specific hosts, IaaS APIs and tools can still be used directly. When dynamically scheduling containers, that doesn't work.
(acknowledgement, I'm a contributor to Kubernetes)
It is well supported on AWS as well, and a variety of bare-metal solutions (e.g. Red Hat Atomic, CoreOS)
However, concretely, it is a challenge to maintain good support for N different platforms without an owner who is willing to stand up and ensure that it works, and continues to work for that platform.
We have gotten a number of drive-by contributions of "how to" guides that (sadly) bit-rot over time. As always, we're working on improving the situation, but it is complicated and requires a great deal of time and access to infrastructure (e.g. Rackspace) that the core team simply doesn't have.
Users of the Google Cloud run Docker in VMs, since VMs are what the Google Cloud Platform sells.
(as does every public cloud provider [e.g. AWS])
For now, VMs are required to ensure a security barrier between different user's containers on the same physical machine. See some of Dan Walsh's posts on the subject (e.g. https://opensource.com/business/14/9/security-for-docker)
for more context.
It's most likely that even the "CM"s from both providers are actually Virtual Machines running on a hypervisor running on bare metal. You just can't tell and don't need to care (for most workloads).
Because you're the one setting them up. Basically you run Amazon provided agent on an EC2 instance and ECS will see it as a host for ECS.
Also Amazon bills you for that EC2 instance as any other instance.
Personally I have hard time understanding the benefits of running docker in public cloud, you still run a VM you still pay for that VM. It just one extra abstraction layer which increases complexity of your infrastructure and also reduces performance.
I do understand the benefits of using containers in own data center, when you run it on bare hosts. There's simplicity and and lower costs (because you don't have VM) you have more resources which lets you run more containers than VMs on that host.
> Personally I have hard time understanding the benefits of running docker in public cloud, you still run a VM you still pay for that VM. It just one extra abstraction layer which increases complexity of your infrastructure and also reduces performance.
Simpler deployment and basically forcing "12-factor", as well as easier development environment setups. Nothing you can't achieve with other tooling, but it's nice to be able to guarantee that your dev environment is identical to your prod.
My problem is that I don't believe you can use docker without using containers. And if you want to simplify pipeline, why not just use rpm-maven-plugin[1] you can easily deploy including dependencies, it is fast, you can easily upgrade or downgrade. And no need to trying to figure complexities imposed due to involving LXC.
SDN isn't required for k8s, what is required is that each Pod (group of containers) get it's own IP address, and that the IP address is routeable in the cluster. In many cases, the easiest way to achieve this is via an SDN, but it is also achievable by programming traditional routers.
The reason for wanting an IP address per pod is that it eliminates the need for port mangling, which dramatically simplifies wiring applications together.
the problem with port mangling is that your application starts running on random ports, so in addition to requiring discovery for IP addresses, you now also have to do discovery for ports, which pretty much requires custom code and infrastructure linked into your binaries (how do you convince nginx/redis/... to use your lookup service for ports?)
And ports are different between different replicas of your service, since they're chosen at random during scheduling.
It also makes ACLs and QoS harder to define for the network, since you don't have a clean network identity (e.g IP Address) for each application.
Agreed. I am deeply impressed about how "approachable" kubernetes actually is, considering what it does. The overall design concepts are quite simple and the reasoning behind them is clear. It's a small set of self-contained components (api, controller, scheduler, kubelet, proxy sitting on coreos' etcd), so the complexity is fairly manageable. Peeking into the source code of components won't give you the creeps and the build system (cross-compiling) could not be any easier.
I have not yet tried any other docker orchestration framework (there seem to be a few popping up right now), but concerning clustering: In comparison Mesos appears intimidating to me (there is certainly not the 2min "I get this" experience, I've had with tools like etcd & kubernetes) and I remember building clusters w/ technology like heartbeat, corosync, openais & drbd not so long ago - compared to this distributed computing became incredibly easy.
My advise for starters would be to pick some ready2go vagrant-coreos-setup and get it running on your workstation, this should be pretty straightforward. (We are running k8s on openstack/rackspace and there were too many moving parts involved to get the included starter-scripts to reliably bootstrap a kubernetes installation)
Then look at the user-data/cloud-init of that project and try to rebuild things on your preferred stack from the bottom upwards, step after step - I feel a lot more sovereign when doing that. The components' logfiles are actually helpful when you assemble things. It also helps to look at the generated (and documented, thx for this) iptables nat rules, when you have problems with service discovery/communication.
Omega is a separate system than both Borg and Kubernetes.
Kubernetes is heavily inspired by both Borg and Omega, and incorporates many of the ideas from both, as well as lessons learned along the way. And many of the engineers who work on Kubernetes at Google, also worked on Omega and Borg.
The central master handles node failures by removing nodes that aren't heartbeating.
On the node, we require a process monitor for the kubelet (by default we use supervisord) but then the kubelet monitors Docker [and also does garbage collection and resource limiting ], and then all of the other node daemons (e.g. the kubernetes proxy) are run/monitored/restarted by the kubelet.