The Homa Network Protocol

wmf · 2024-12-30T21:23:51 1735593831

Various previous discussions:

It's time to replace TCP in the datacenter position paper https://news.ycombinator.com/item?id=33401480 https://news.ycombinator.com/item?id=42168997

Review of Homa protocol https://news.ycombinator.com/item?id=28204808

Review of Linux implementation of Homa https://news.ycombinator.com/item?id=28440542

TCP vs. RPC part 1 https://news.ycombinator.com/item?id=34871670 part 2 https://news.ycombinator.com/item?id=34871710 part 3 https://news.ycombinator.com/item?id=35228716

sesm · 2024-12-30T21:53:30 1735595610

For broader context: SCTP is already in the kernel, solves many of the same issues and has been used in mobile core networks for decades.

musicale · 2024-12-30T23:30:06 1735601406

As I understand it, SCTP is still a TCP-like stream protocol with sender-driven congestion control (using packet drops or ECN as a signal) and multipathing to deal with in-network congestion, while HOMA is aimed at fast RPC (1 packet request, 1 packet response) for short messages and has receiver-driven congestion control to deal with incast congestion at the receiver.

SCTP still has TCP-like slow start and other things that HOMA seeks to avoid.

p_l · 2024-12-30T23:50:09 1735602609

SCTP is message oriented and allows explicit control over how reliable a message is. You can run a TCP-like stream over it but it is not required. Slow-start is unfortunately a thing. SCTP was designed with lossy networks in mind, just like other long-range network protocols, not overprovisioned pseudo ethernet with DCB.

mananaysiempre · 2024-12-31T00:27:09 1735604829

At least looking from the outside, SCTP is also kind of stupid in places due to its telecom origins, e.g. you get only as many data streams as you negotiated for at the start of the connection[1], because we just can’t let circuit switching go.

[1] https://www.rfc-editor.org/rfc/rfc9260.html#name-sequenced-d...

zokier · 2024-12-30T23:16:07 1735600567

The connection establishment in SCTP is so much more heavyweight that I wouldn't really put them in the same category

m463 · 2024-12-30T23:17:53 1735600673

I haven't heard of it, but wikipedia makes it sound interesting:

https://en.wikipedia.org/wiki/Stream_Control_Transmission_Pr...

meltyness · 2024-12-30T21:28:58 1735594138

Without regard to the detailed background, design or analysis, the approach of plucking a concern out of the Network/Data Link layer (prioritization / QoS) and moving it up to the Transport is a remarkably simple / clever start.

At least that's how I picture the start of the study of such a design.

wmf · 2024-12-30T21:43:59 1735595039

Congestion control has always(ish) been in the transport layer.

meltyness · 2024-12-30T22:38:00 1735598280

TCP congestion control is a disaster. Setting DCTCP slightly improves things but for big slow pipes it can be impossible to get near line rate. Congestion control's intent also differs from QoS/CoS. It's also a 'reactive' design rather than a constructive one, usually a bad sign. Not to mention that TCP is itself really complex and susceptible to receiver resource exhaustion.

Still getting into the reading here, but making my intuition a little bit more explicit, it seems that at least the design, but possibly the implementation of Homa actually hinges on datacenter switches QoS queues being configured with a policy to explicitly respect the receiver-assigned priorities... cool!

p_l · 2024-12-30T23:54:05 1735602845

The Homa approach seems to lift explicit flow control from reliable networks into IP layer, the classic token buckets of IEEE 1355/SpaceWire and also used in InfiniBand, except Homa calls them "grants" and permits certain default level of buffering from what I see.

mongol · 2024-12-31T07:57:43 1735631863

What is going on in a datacenter that motivates a protocol like this? I admit I am ignorarant. Is it for internal allocation, say traffic between Kubernetes nodes, this is envisioned?

nine_k · 2024-12-31T09:00:48 1735635648

AFAICT Homa is intended to let machines on the same rack, or maybe VMs on the same hardware, to talk with very low latency. It allows to start a communication with zero ceremony, and make the server code as stateless as possible, not even caring to keep a connection, but rather marking the response with the same ID as the request had, and let the receiver sort it out.

bombcar · 2024-12-31T13:27:26 1735651646

What’s the advantage of something like Homa over just using raw Ethernet packets?

nine_k · 2024-12-31T18:48:40 1735670920

The built-in routing, flow control, and congestion control? Like, the actual protocol on top of IP on top of Ethernet frames.

pclmulqdq · 2024-12-30T22:41:17 1735598477

Is anyone actually using Homa? I have heard that it has a few fundamental issues as described in the paper, and many people who want what Homa offers are using their own thing.