How do you handle updating the machine that Incus itself runs on? I imagine you ...

CoolCold · 2025-07-13T05:10:47 1752383447

I'm not quite sure what's your question here? Very much similar to any other system which needs to reboot and you getting ready to these reboots in advance.

To some extent, of course things like vSphere/Virtuozzo and even LXD/Incus, and even simple Qemu/Virsh systems can do live migration of VMs, so you may care less on preparing things inside VMs to be fault taulerant, but to some extent.

I.e. if your team do run PostgreSQL, your run it in cluster with Patroni and VIPs and all that lovely industry standard magic and tell dev teams to use that VIP as entry point (in reality things bit more complicated with Haproxy/Pgbouncer on top, but enough to express the idea).

63stack · 2025-07-13T15:46:28 1752421588

I missed the part that Incus supports clustering, for some reason I thought it's single node only.

CoolCold · 2025-07-13T17:21:00 1752427260

I'm not sure that clustering goes beyond "multiple hosts with single API to rule them all" - thus I assume when physical node needs maintenance, it won't magically migrate / restart VEs on other cluster members. May be wrong here.

P.S. Microcloud tries to achieve this AFAIR, but it's from Canonical, so on LXD.

63stack · 2025-07-13T23:18:32 1752448712

Okay then my question still stands. You are saying "similar to any other system which needs to reboot", but this is nowhere near similar to something like k8s, which has 1st class support for this. You cordon the node you are about to take off for maintenance, kubernetes automatically redistributes all the workloads to the other nodes, and after you are done you uncordon the node.

How does this look with Incus? Obviously if the workload you are running has some kind of multinode support you can use that, but I'm wondering if Incus a way to do this in some kind of generalized way like k8s?

But I did some more reading, there seems to be support for live migration for VMs, and limited live migration for containers. Moving stopped instances is supported for both VMs and containers.

tok1 · 2025-07-14T09:13:03 1752484383

I think what you are asking falls into "cluster member evacuation and re-balancing" [0], combined with live migration [1] with minimal downtime.

[0] https://linuxcontainers.org/incus/docs/main/howto/cluster_ma...

[1] https://linuxcontainers.org/incus/docs/main/howto/move_insta...

CoolCold · 2025-07-15T11:36:47 1752579407

Thank you!

Indeed, container live migration is limited and a bit unclear on "network devices" - bridged interface is network device or not?

Bit ironic, that even with using CRUI, which AFAIK was created by the same Virtuozzo guys which provided OpenVZ back then, and that VEs could live migrate, was personally testing it in 2007-2008. Granted, there we no systemd by that days, if this complicates things. And of course required their patched kernel.

> Live migration for containers For containers, there is limited support for live migration using CRIU. However, because of extensive kernel dependencies, only very basic containers (non-systemd containers without a network device) can be migrated reliably. In most real-world scenarios, you should stop the container, move it over and then start it again.

63stack · 2025-07-14T13:43:21 1752500601

Amazing, exactly what I was looking for, thank you.

dsr_ · 2025-07-12T11:17:42 1752319062

As with any such system, you need a spare box. Upgrade the spare, move the clients to it, upgrade the original.

loloquwowndueo · 2025-07-12T12:15:41 1752322541

But then the clients have downtime while they’re being moved.

pezezin · 2025-07-12T13:35:15 1752327315

I don't know about Incus, but on ProxMox the downtime when moving a VM is around 200 ms.

pylotlight · 2025-07-12T12:56:40 1752325000

Isn't that the exact problem that k8s workloads solve by scaling onto new nodes first etc? No downtime required.

loloquwowndueo · 2025-07-12T13:04:25 1752325465

Right but incus is not k8s. You can stand up spares and switch traffic, but it’s not built in functionality and requires extra orchestration.

goku12 · 2025-07-12T16:41:44 1752338504

It is a built-in functionality [1] and requires no extra orchestration. In a cluster setup, you would be using virtualized storage (ceph based) and virtualized network (ovn). You can replace a container/VM on one host with another on a different host with the same storage volumes, network and address. This is what k8s does with pod migrations too (edit: except the address).

There are a couple of differences though. The first is the pet vs cattle treatment of containers by Incus and k8s respectively. Incus tries to resurrect dead containers as faithfully as possible. This means that Incus treats container crashes like system crashes, and its recovery involves systemd bootup inside the container (kernel too in case of VMs). This is what accounts for the delay. K8s on the other hand, doesn't care about dead containers/pods at all. It just creates another pod, likely with a different address and expects it to handle the interruption.

Another difference is the orchestration mechanism behind this. K8s, as you may be aware, uses control loops on controller nodes to detect the crash and initiate the recovery. The recovery is mediated by the kubelets on the worker nodes. Incus seems to have the orchestrator on all nodes. They take decisions based on consensus and manage the recovery process themselves.

[1] https://linuxcontainers.org/incus/docs/main/howto/cluster_ma...

mdaniel · 2025-07-12T17:11:25 1752340285

> and address. This is what k8s does with pod migrations too.

That's not true of Pods; each Pod has its own distinct network identity. You're correct about the network, though, since AFAIK Service and Pod CIDR are fixed for the lifespan of the k8s cluster

You spoke to it further down, but guarded it with "likely" and I can say with certainty that it's not likely, it unconditionally does. That's not to say address re-use isn't possible over a long enough time horizon, but that bookkeeeping is delegated to the CNI

---

Your "dead container" one also has some nuance, in that kubelet will for sure restart a failed container, in place, with the same network identity. When fresh identity comes into play is if the Node fails, or the control loop determines something in the Pod's configuration has changed (env-vars, resources, scheduling constraints, etc) in which case it will be recreated, even if by coincidence on the same Node

moondev · 2025-07-12T21:11:04 1752354664

> I can say with certainty that it's not likely, it unconditionally does. That's not to say address re-use isn't possible over a long enough time horizon, but that bookkeeeping is delegated to the CNI

You are 100% wrong then. The kube-ovn CNI enables static address assignment and "sticky" IPAM on both pods and kubevirt vms.

https://kubeovn.github.io/docs/v1.12.x/en/guide/static-ip-ma...

mdaniel · 2025-07-12T21:32:19 1752355939

Heh, I knew I was going to get in trouble since the CNI could do whatever it likes, but felt safe due to Pods having mostly random identities. But at that second I had forgotten about StatefulSets, which I agree with your linked CNI's opinion would actually be a great candidate for static address assignment

Sorry for the lapse and I'll try to be more careful when using "unconditional" to describe pluggable software

moondev · 2025-07-12T21:40:34 1752356434

All good and i'll cheers you on the composability of k8s for sure

goku12 · 2025-07-12T17:43:09 1752342189

I agree with everything you pointed out. They were what I had in my mind too. However, I avoided those points on purpose for the sake of brevity. It was getting too long winded and convoluted for my liking. Thanks for adding a separate clarification, though.