Kubernetes handles most of this seemlessly for the cluster infrastructure.
The central master handles node failures by removing nodes that aren't heartbeating.
On the node, we require a process monitor for the kubelet (by default we use supervisord) but then the kubelet monitors Docker [and also does garbage collection and resource limiting ], and then all of the other node daemons (e.g. the kubernetes proxy) are run/monitored/restarted by the kubelet.
Fully agree - K8s has tons lots of self-healing capabilities as long as it functions correctly. Our goal was to extend this with a lower layer that will detect failures that are mission critical for k8s deployments to run properly like etcd, docker state and skydns
The central master handles node failures by removing nodes that aren't heartbeating.
On the node, we require a process monitor for the kubelet (by default we use supervisord) but then the kubelet monitors Docker [and also does garbage collection and resource limiting ], and then all of the other node daemons (e.g. the kubernetes proxy) are run/monitored/restarted by the kubelet.