Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Go structs are copied on assignment (and other things about Go I'd missed) (jvns.ca)
207 points by misonic on Aug 11, 2024 | hide | past | favorite | 167 comments


The semantics of when stuff is copied, moved, or passed by reference are all over the place in language design.

C started with the idea that functions returned one int-sized value in a register. This led to classic bugs where the function returns a pointer to a local value. Compilers now usually catch this. C eventually got structure return by copy. Then C++ added return value by move, and automatic optimization for that. It's complicated.[1]

Most hard-compiled languages only let you return values of fixed length, because the caller has to allocate space. Dynamic languages where most things are boxed just return the box. Rust makes you declare boxed types explicitly. Vec and String are already boxed, which handles the common cases.

More dynamic languages tend to let you return anything, although there can be questions over whether you have your own mutable copy, a copy-on-write copy, a read-only copy, or a mutable reference to the original. That's what got the OP here, at

    thing := findThing(things, "record")
    thing.Name = "gramaphone"
They thought they had a mutable reference to the original, but they had a mutable copy.

There's a good argument for immutability by default, but many programmers dislike all the extra declarations required.

[1] https://stackoverflow.com/questions/17473753/c11-return-valu...


> There's a good argument for immutability by default, but many programmers dislike all the extra declarations required.

That's one little reason why Rust is loved by many: immutability by default. Meanwhile, it's not even possible in Go to declare immutable variables!


I hope it gets added, it adds so much safety to the language. Mind you, Go is not a very safe language in general, its type system is pretty loose compared to e.g. Java or Typescript. I don't believe it wants to be though.


It is possible to add more immutability in Go. There are many proposals for this: https://github.com/go101/go101/wiki/Go-immutable-value-propo....

The main reason nothing happened in this direction is the core team think it is not important enough.


It's very clear that the reason these kinds of proposals haven't been accepted has nothing to do with the core team not believing they're important enough, but instead because of the impact that they have on the rest of the language.


What impact in your opinion?


Immutability is a fundamental property of a language, it's not something that can be bolted on post-facto in a version update. Go as a language doesn't provide immutability.


If the "for ;;" loop semantics can be bolted on post-facto in a version update, nothing else couldn't.

Go provides certain immutability.


It does not. Believe what you will.


You are not familiar with Go.


It's tough to retrofit new restrictions to old code. C/c++ code still doesn't have "const" everywhere it could. You have to make the default immutable - write "var" or "mut", not "const" - or it doesn't get used everywhere it should. But that breaks old code.


Immutable by default is only really possible now that we have gobs of memory though. I'm not even sure it's likely to stay popular: the demands of data processing at scale mean we're all likely to be routinely handling gigantic datasets which we don't want to copy all over the place.

The real problem is just visibility: am I editing a copy of the original? Who else can edit the original? Who's going to?

I'd argue those two questions are what we actually want to know the answer to, and immutability criteria are just an awkward compromise solution.


Immutable by default doesn't mean you have to write your program as a giant copy-aon-write or log-structured architecture. You can still commit all your shared global program state behind a mutex/rwlock crimes when that makes sense. But you can contain write access to that data to the places where it's needed.

And at the function-local level move elision should turn copy-and-modify to in-place modification if the original is no longer used.


Immutability isn’t a new idea, and you don’t need “gobs of memory.” Erlang was developed in the 80’s.


"Immutable by default is only really possible now that we have gobs of memory though"

I don't think so, actually. It's just changing the default. A choice to have types like Java's String which are always immutable might drive up memory usage, but just realising that the defaults are wrong and altering the language doesn't impact this at all, instead it makes your programs more explicit about what's actually happening.


LLVM will transform your mutable program into immutable one anyway because otherwise it's much harder if not impossible to write a lot of optimizations or validations.

You can collapse a lot of the additional copies at compile time even for C


That's not really the same thing though - that's a choice of representation for optimization purposes. LLVM won't turn a pointer access into a value copy - it can't, because that would change program behavior.

The distinction here is that in a lot of cases, you really do want to edit data in one location and there's no good reason for it to be copied except that it makes the program easier to reason about (exceptions apply for when things like cache locality considerations come into play).


I seem to recall that in many cases it is in fact permitted for C compiler (probably C++ as well) to turn pointer access into value copy so long as the rather flexible semantics are still kept.

The main saving grace is that usually you're only dealing with register copies...


The killer IMO is the combination of mutability and implicit references. With either immutability or explicit references, the hazards are greatly lessened. In the absence of mutability, there's no observable difference between passing a reference and passing a value. With explicit references, there's always an indication when objects are liable to nonlocally mutate.

Go and C are quite wise to use pass by value and opt-in references. It avoids the programmer's burden of immutability by default while making it more obvious* when to be aware of potential mutation. A programming style incorporating local mutation also comes very naturally, which is simple, practical, and remains easy to reason about.

*Go and Rust would be wiser if they retained -> from C so that this obviousness would remain without having to check type declarations which may even have been inferred.

Python gets hit the worst by this combination. On top of having mutable references everywhere, it also doesn't make a big deal about object identity (unlike e.g. Java) and has a lot of declarative constructs. Which means you get surprising stuff like this.

  >>> foo = [[]] * 10
  >>> foo[0].append("bar")
  >>> foo
  [['bar'], ['bar'], ['bar'], ['bar'], ['bar'], ['bar'], ['bar'], ['bar'], ['bar'], ['bar']]

  >>> def baz(arg = []):
  ...   print(arg)
  ...   arg.append("qux")
  ... 
  >>> baz()
  []
  >>> baz()
  ['qux']


The one thing I’ve personally wanted in Go is something like final in Java, for structs.


Well, that's another nice thing with Rust. Maybe it's not the saviour language but at least it brings clarity to ownership of values.


I think the same thing could happen in Rust. foo.x = bar will compile if foo is a copy or if it's a mutable reference. As in Go, you'd have to explicitly type 'foo' in order for any compiler error to show up.


Let's work through the example

1. There's a user-defined type named Thing with a String inside it.

2. We've got a function named findThing, which takes an array of Things and a string name, performs a linear search of the array and returns a mutable reference to the Thing with that name or if it's not found it doesn't. In Go, this function accidentally gives back a mutable reference to a copy of the Thing.

3. We make an array of things, use findThing to find one of them and try to mutate it.

We can translate 1 into Rust easily, it's just struct Thing { Name: String } and the compiler will moan because this is deeply unconventional naming in Rust and that warns by default.

We can translate 3 into Rust easily too, the compiler moans about our naming again.

But when we try to translate 2 by writing findThing we struggle. Rust wants to understand what lifetime to associate with this mutable reference we're returning. If we try to make a copy, where does the copy live? Rust doesn't have garbage collection, so if it just goes out of scope the lifetime ends and Rust will reject this function as nonsense - you can't return references which have expired, silly programmer, try again.

If we don't make a copy we can successfuly tie the lifetime to the array, but then we don't have the Go bug, so the thing you say "could happen in Rust" doesn't happen.

Here's a Godbolt link for a working example: https://rust.godbolt.org/z/shKj6rEz7

Now, try to adjust it to have the same bug, I expect it will be much harder to do this wrong.


Ah, I was working on a mistaken assumption about the Go example code (I thought the function returned Thing, whereas actually it returns *Thing). That will teach me to RTFA...


Right, I think Julia would spot that if we get a Thing back, it's clearly not the Thing inside our array, because those are necessarily different Things.

If the function does return Thing, in Rust we can implement the mistake, returning a copy, but there's no plausible way to implement the intended functionality, we can get the matching thing out of an array we're allowed to mutate -- by swapping it, but now some different Thing is in the array of Things instead, that's what swapping means. If we're given the actual array, not a (mutable) reference as a parameter, we can destroy it and keep just the matching Thing to return, but now our caller hasn't got a Things array, it was destructively moved to give us a parameter.


Actually Go is always pass by value. Even when you pass the pointer you’ll get a copy of it.


What matters is that a function can modify the value pointed at, without necessarily returning it.


> Actually Go is always pass by value.

False, this depends upon the underlying type. The Go 'map' type is always pass (and assign) by reference.

https://go.dev/play/p/ovfuNBNtiza


No, it's definitely passed by value:

https://go.dev/play/p/mIehOwUWz95

A map is implicitly a pointer to the underlying map structure. When you pass a map, you're passing a copy of a pointer.


Technically no, but I get why it feels like this. :-)

I think the confusion stems from the fact that Go hides the distinction of "dot" access vs "pointer dot" access. (eg, the difference between `foo.Bar` and `foo->Bar` in C++). Go knows if an object is a pointer and needs to be de-referenced, and hides the de-reference operation from you syntactically. So you can use a pointer without knowing it's a pointer.

This is the confusion: You never have a "map object that is passed by reference", you instead have a "pointer to a map object that is always passed by value".

This is less confusing for structs the programmer defines themselves, since they can check the type of "foo" see if "foo.Boo" is accessing Bar directly or de-referencing "foo" and then accessing "Bar".

But it's more confusing when applied to builtin type for which you never see the implementation. How would you know if `map[string]bool{}` returns a value or a pointer? You don't!

The map type is passed by value but its implementation is that it is a thin wrapper around a pointer to a struct. Ergo in practice it feels like passing by reference. But Go is technically always pass by value. Map does not get any special implementation in the language.

You can do this yourself. `type MyStruct struct{ data *MyType }`. Now if you pass MyStruct by value, any operations on it are effectively by reference since all operations have to dereference `data` to get at the actual content.

(Same goes for strings, except they are immutable so no one notices.)

This might feel like semantics, but it's important to remember that Go doesn't treat any of those built-ins with special "pass by reference" rules. Instead, it's behaving consistently like it would for any types you defined yourself and one of the learning curves of the language is to think in Go's terms of pointers vs non-pointers and learn which native types are pointers. If you think of it that way, learning Go's built-ins is no different than learning the API methods for a custom struct Go's built-ins just happen to be, well, built-in and thus get syntactic sugar that your custom structs do not.

"Pass by reference" is usually used to refer to the case where you use a value at level N of the call stack, but the code at level N+1 of the callstack decides to make it a pointer. In Go this never happens, instead you technically always had a pointer the whole time!


From Google's Go project blog:

> Map types are reference types, like pointers or slices.

https://go.dev/blog/maps


Actually, here's an example of precisely why maps are not passed as references:

https://go.dev/play/p/-mEeg2Ud68V

This example cannot re-allocate the original map! If the map were truly passed by reference, it could be allocated in a local function. But it can't, only values within the map can be modified when a map is passed.

This is in contrast with pass by reference in, say, C++, in which case you can assign directly to and modify the reference. ex: https://www.ibm.com/docs/en/zos/2.4.0?topic=calls-pass-by-re...


Did you read my comment?

References are nothing special, they are are simply pointers that hide the pointer and usually the receiving definition makes the decision about whether a pointer of a copy is passed.

In Go the type is just a wrapper around a pointer, making it function like a reference in all cases.

In go some types are simply defined as pointers, so they are technically always passed by value and there is never any magic about whether they are passed by value or referenced. It is always a value consisting of a reference.

A map is a pointer that is passed by value, along with slices and channels. 100% of the time.


That is no different than C or C++. A pointer is just another type of value.


> This led to classic bugs where the function returns a pointer to a local value. Compilers now usually catch this. C eventually got structure return by copy. Then C++ added return value by move, and automatic optimization for that.

Jesus Christ. Can we now get “return fruit by vegetable?”

Some speedups may not be worth having to memorize standards documents.


programmers who get the job done without needing to memorise standards or dig into hardware specifics are surely happy enough with general optimisations applied by the compiler like tail call optimisation or copy elision ...


not complicated stuff to understand, and doesn’t even need to be understood to use correctly. Just don’t return pointers to locals.


One of the many things I find inspiring about Julia is how quick she is to admit to mistakes she has made or things that she hasn't understood.

If she didn't understand it, I can 100% guarantee that there are large numbers of people out there who also didn't understand it - many of whom were probably too embarrassed to ever admit it.

I think this is a useful trait for senior software engineers generally. If you're a senior engineer you should have earned enough of a reputation that the risk involved in admitting "I didn't know that" can be offset by everything you provably DO know already. As such, you can help everyone else out by admitting to those gaps in your knowledge and helping emphasize that nobody knows everything and it's OK to take pride in filling those knowledge gaps when you come across them.


Personally, I think that the idea that programmers should know everything is kinda bizarre.

Programming is about finding the answer. If you already knew everything, you could just sit down and type any program from your knowledge.

We all know that you can't know everything otherwise, why even write documentation or use git?

Unfortunately, it's difficult to move past this idea, and it's so pervasive that stating "I don't know" can negatively affect your standing for some.


Right! The best thing about our chosen career is that it's completely impossible to know everything - there's always a new corner of software engineering to dig into, be it how the Linux kernel works, or programming with Haskell, understanding Transformer LLM architectures, or how the Svelte compiler works or whatever.


... and be able to code in React


Many people have the same expectation of their doctor: he must know everything or he is completely useless and I should find a new doctor.

It is also insanely difficult for a doctor/surgeon to admit a mistake without being punished severly.


This is what jumped out to me reading this article too - unashamedly admitting to having gaps in knowledge that some others might take for granted.

Bucketing some of those gaps as “probably useful but not to me right now” is also great. It shows a purpose to focus on what matters right now, with a hint to return to when it does pop up again later.

Lots of folks I work with and respect sometimes get trapped in the weeds having to understand every little thing. It’s good to be curious, but sometimes filing it away for later and shipping is more important now.


One of the many things I find inspiring about Julia is how quick she is to admit to mistakes she has made or things that she hasn't understood.

Agreed. I think 90%+ of the complaints people have about technical blog posts would go away if authors were more willing to admit what their blog post is really for and what their limitations are.

Writing an "Ultimate Guide to X" as a learning experience is fine if you admit that (and, ideally, use a more humble title) but pretending to be a long time expert in framing when someone isn't leads to problems and a poor reputation. You, too, are fantastic at framing your content properly, and have a great reputation amongst people I know for it.


I recently included a big goof in a blog post. I accidentally consumed all my errors and didn't log them anywhere, so I was just flying mostly blind.

It was also in a language I don't have much experience in (Elixir), doing 3D rendering in OpenGL (which I also don't have much experience in), on a Mac (so no tools to debug OpenGL stuff). Definitely wasted more time than I'd like to admit, but I won't be making that mistake again. :)


most of the time when people don't ask questions in industry, it isn't because they already know the answers, it's because they don't care what the answers are

blogging is culturally different because there's way more exhibitionism involved


Don't you think, as others in this thread are saying, that one reason people don't ask questions is because they're afraid to admit their lack of knowledge?

We've probably all been in one of those meetings where something is being proposed, and everyone is nodding vaguely along but you're not sure the proposal is correct. Do you ask a clarifying question (and risk seeming foolish for not knowing something obvious) or just let it pass because everyone else seems happy with it?


> Do you ask a clarifying question (and risk seeming foolish for not knowing something obvious) or just let it pass because everyone else seems happy with it?

Two different kinds of foolish here: etiquette or technical

In some meetings if it's management and senior leadership grilling a design, if some unrelated IC butts in with a bunch of questions that are 5 steps behind everyone else it's sort of a faux pas


Right -- there isn't a single answer, it's very context dependent. I do think many ICs lean too far in the direction of keeping quiet, though.

(Managers too, come to think of it. You can feel overawed by skilled ICs and chicken out of asking naive but crucial questions.)


I thought you were talking about Julia the language and was so confused.


This is the kind of mistake that I doubt even a junior Go developer would make. If one is a senior engineer and one has Go on their CV, then one must know the memory management top to bottom.

This kind of downplaying of hard skills and trying to argue that one can help by admitting gaps in their knowledge is at best weird.


Donovan and Kernighan's "The Go Programming Language" is one of the best pieces of technical writing I've ever read. Buy it and read it cover to cover.

Then read the [Go Language Specification][1] cover to cover. It's dry but refreshingly not legalese.

[1]: https://go.dev/ref/spec


Learning Go by Jon Bodner, particularly the newest edition, is also excellent.

An aside: are there any other Go books, particularly ones the explore more specific topics, that are recommended? I've read a few that I didn't find very impressive.


I liked Cloud Native Go by Matthew A. Titmus and Concurrency in Go by Katherine Cox-Buday.


The language spec was so good I was able to make tangible contributions to an open source project just by using that and I don’t consider myself a go programmer at all. I want to buy that book but it’s technical and I feel like there might be a second edition around the corner?

/edit I bought it after reading this thread: https://groups.google.com/g/golang-nuts/c/U99js3UYz-U


Not understanding structs vs pointers is a pretty basic misconception in go.

Does this trip anyone else up? I found it unenlightening / unsurprising, and the linked "100 mistakes" piece also very basic and in some cases just plain wrong.


"Very basic" is the entire point of this exercise. Just because things are basic doesn't mean people won't misunderstand them, and won't benefit from clarification.

Which of those 100 mistakes were "plain wrong"? That would be useful feedback for the author.


As she points out though, a lot of dynamic languages don’t behave this way. A string after all points to a heap allocation; so it’s not unreasonable to think of a string as a pointer.


I've several times answered questions from people coming from dynamic languages that ask lots of questions about Go pointers. And the answer is, actually, since Go lacks pointer arithmetic, Go pointers work the way you're used to things working. It's the Go non-pointers that are the new bizarre thing you're not used to!

So there is definitely a common language heritage that will find the behavior of value copies in Go surprising. I came into Go with compiled language experience but I'd been exclusively in dynamic scripting languages exclusively for over a decade. I had to remind myself about this as well.


In those languages, nearly everything is a pointer, they just don't call it that, which causes unnecessary confusion when you need to understand what's really happening. (E.g., in Java, a String is a pointer to a string, not a string; but an int is just an int, not a pointer to an int.)


It’s how structs work in C though, and Go is spiritually very close to C, including explicit pointer types, address-taking and dereferencing.


Not necessarily. A string in C is usually a char. If you have a struct with a char and you copy it, you copy the pointer to the backing memory. This is analogous to Go which also has a String be a pointer to the heap, but the behavior is different.

Go’s String is a char* that behaves like a char[] when copied.


Is there a typo in this? C strings are not usually structs at all. Structs also do not behave differently if they contain a char.


I think the two *s merged into an italic there


> Go’s String is a char* that behaves like a char[] when copied.

Uhh, no. A go string is (effectively) a pointer to an immutable string of characters. When you do a = b, both a and b point to the same string (i.e. each is its own string struct, containing a pointer to the exact same array of character data).

But if you try to get a byte[] from it, THEN it makes a copy.


That was my reaction too, but it does point the way to a more subtle chronic ache in coding Go: the efficiency<->simplicity tradeoff between passing some large struct to a function by reference, or by copying.

The former, "by reference" is guaranteed to impose no increase in calling overhead, irrespective of the compiler's ability to optimize the object code, however fat the struct eventually grows.

The latter, "by copying" guarantees the calling function will upon return find all fields of the struct just as before -- a great aid to understanding during code review.


Yeah I'm fairly new to Go, but have done plenty of work in languages with "no pointers" (Python, Java), with pointers (Objective-C), and with "reference/value types" (Swift). I thought I was pretty proficient with all of this, but Go's implementation was really unergonomic. It feels like it's trying to be C, but treating it more like Swift has been more effective I've found. How this all interacts with nils and missing fields in structs is also not very ergonomic, and it feels like the language is obviously missing an optional type, yet I've not seen one used commonly yet.


Please let me know what's plain wrong; I'm always happy to receive constructive feedback.


The beginning item about if-else returns isn't go-specific and is low impact, and immediately turned me off / led me to dismiss the article as a waste of time.

After trudging through more of the list, you've collected a lot of good nuggets! Maybe move the if-else to the end, as it's not particularly insightful compared to the rest.

Also the shadowing, it's fine but much less interesting and unlikely to hook in your target audience up front.

Good luck!


100go.co isn't an article; it's an online summary of my book listing 100 Go mistakes. There's a sense of progression where we navigate through topics and go (generally) towards more complicated topics at the end.


> though apparently structs will be automatically copied on assignment if the struct implements the Copy trait

What's actually going on is that the Rust compiler is always allowed to choose whether to just copy bits during assignment, but if your type implements Copy then the value isn't gone after it has been assigned as it would be with the ordinary destructive move assignment semantic -- so any code can continue to use the value whereas otherwise it would no longer exist having been moved.

Some languages make the built-in types magic and you can't have that magic in your own types, Rust mostly resists this, a few things are magic in the stdlib and you wouldn't be allowed to do this magic in stable Rust (it's available to you in nightly) but mostly, as with Copy, your types are just as good as the built-in types.

This actually feels really natural after not long in my experience.


This sometimes catches out people C#/.Net too, it's a big difference between Class and Struct, Class is reference type and Struct is value type. (see fiddle below), but in practice people very rarely reach for structs, so people don't tend to build up the muscle memory of using them, even if they intuitively understand the difference between reference types and value types from general use of other types.

(Fiddle demonstration for non-.Net peeps: https://dotnetfiddle.net/apDZP5 ).


Non-C# developer question: what use-case/situation would a `struct` make sense to use instead of a `class`? Just out of curiosity.

[Edit] Well, there's a nice, special article for this very question: https://learn.microsoft.com/en-us/dotnet/standard/design-gui...


In principle, the answer should be (if we ignored the language community and the model of the stdlib, which we shouldn't do), that structs should be used for most things, and classes only when they are needed. There's nothing a class can do that a struct can't, and structs are not automatically allocated in the heap, so they take some pressure off the GC. Go showed that you can have a fully managed GC language where all types are value types, and get pretty good ergonomics. Java is adding support for value types specifically for performance reasons, and so may have a similar attitude in the far future.

Note that Go does have one feature that makes value types more ergonomic - the ability to explicitly take a reference to one and use it as any other variable, using the syntax of C pointers (but without pointer arithmetic). In C#, if you need a reference to a structs type, your options are more limited. There is `ref` for function parameters, but if you want to store it in a field, you need to use some class type as a box, or perhaps an array of 1 element, since there are no "ref fields" in this sense (there are ref fields, but they are a completely different thing, related to the "ref structs").

Now, in working with real C# code and real C# programmers, all this is false. The community has always preferred using structs almost exclusively for small immutable values, like Color. The standard library and most C# code is not designed with wide use of structs in mind, so it's possible various methods will copy structs unnecessarily around. People aren't used to the semantics of structs, and there is no syntactic difference between structs and classes, so mutable structs will often cause confusion where people won't realize they're modifying a temporary copy instead of the modifying the original.


There is no free lunch in software :)

> using the syntax of C pointers (but without pointer arithmetic)

Go structs, when they have their address taken, act the same way object references do in C# or Java (that is, *MyStruct). Taking an address of a struct in Go and assigning it to a location likely turns it into a heap allocation. Even when not, it is important to understand what a stack in Go is and how it is implemented:

Go makes a different set of tradeoffs by using a virtual/managed stack. When more memory on stack is required - the request goes through an allocator and the new memory range is attached to a previous one in a linked-list fashion (if this has changed in recent versions - please correct me). This is rather effective but notably can suffer from locality issues where, for example, two adjacently accessed structs in a hot loop are placed in different memory locations. Modern CPUs can deal with such non-locality of data well, but it is a concern still.

It also, in a way, makes it somewhat similar to the role Gen0 / Nursery GC heaps play in .NET and OpenJDK GC implementations - heap allocations just bump a pointer within thread-local allocation context, and if a specific heap has ran out of free memory - more memory is requested from GC (or a collection is triggered, or both). By nature of both having generational heaps and moving garbage collector, the data in the heap has high locality and consecutively allocated objects are placed in a linear fashion. One of the main drawbacks of such approach is much higher implementation complexity.

Additionally, the Go choice comes at a tradeoff of making FFI (comparatively) expensive - you pay about 0.5-2ns for a call across interop into C in .NET (when statically linked, it's just a direct call + branch), where-as the price of FFI in GoGC is around 50ns (and also interacts worse with the executor if you block goroutines).

Overall, the design behind memory management in Go is interesting and makes a lot of sense for the scenarios Go was intended for (lightweight networked microservices, CLI tooling, background daemons, etc.).

However, it trades off smaller memory footprint for a significantly lower allocation throughput. Because GC in Go is, at least partially, write-barrier driven, it makes WBs much more expensive - something you pay for when you assign a Go struct pointer to a heap or not provably local location (I don't know if Go performs WB elision/cheap WB selection the way OpenJDK and .NET do).

To explore this further, I put together a small demonstration[0] based on BenchmarksGame's BinaryTrees suite which stresses this exact scenario. Both Go and C# there can be additionally inspected with e.g. dtrace on macOS or most other native profilers like Samply[1].

> Now, in working with real C# code and real C# programmers, all this is false...

> but if you want to store it in a field, you need to use some class type as a box,

This does not correspond to language specification or the kind of code that is being written outside of enterprise.

First of all, `ref T` syntax in C# is a much more powerful concept than a simple reference to a local variable. `ref T` in .NET is a 'byref' aka managed pointer. Byrefs can point to arbitrary memory and are GC-aware. This means that they can point to stack, unmanged memory (you can even mmap it directly) GC on NonGC heap object interiors, etc. If they point to GC heap object interiors, they are appropriately updated by GC when it moves the objects, and are ignored when they are not. They cannot be stored on heap, which does cause some tension, but if you have a pointer-rich deep data structure, then classes are a better choice almost every time.

Byrefs can be held by ref structs and the most common example used everywhere today is Span<T> - internally it is (ref T _reference, int _length) and is used for interacting with and slicing of arbitrary contiguous memory. You can find additional details on low-level memory management techniques here: https://news.ycombinator.com/item?id=40963672

Byrefs, alongside regular C pointers in C#, support pointer arithmetics as well. Of course it is just as unsafe, but for targeted hot paths this is indispensable. You can use it for fancy things like vectorized byte pair count within a sequence without having to pin the memory: https://github.com/U8String/U8String/blob/split-refactor/Sou...

Last but not least, the average line-of-business code indeed rarely uses structs - in the past it was mostly classes, today it is a mix of classes, records, and sometimes record structs for single-field wrappers (still rarely). However, C# is a multi-paradigm language with strong low-level capabilities and there is a sea of projects beyond enterprise that make use of all the new and old low-level features. It's something that both C# and .NET were designed in mind from the very beginning. In this regard, they brings you much closer to the metal than Go unless significant changes are introduced to it.

If you're interested, feel free to explore Ryujinx[2] and Garnet[3] which are more recent examples of projects demonstrating suitability of C# in domains historically reserved for C, C++ and Rust.

[0]: https://gist.github.com/neon-sunset/72e6aa57c6a4c5eb0e2711e1...

[1]: https://github.com/mstange/samply

[2]: https://github.com/search?q=repo%3ARyujinx%2FRyujinx+ref+str...

[3]: https://github.com/search?q=repo%3Amicrosoft%2Fgarnet+struct...*


I believe the Go design of having all types be structs and supporting explicit pointers to them is separate from their managed stack design, and both are separate still from the GC design.

You can have value types as the base type, and support (managed) pointers to the value types freely in a runtime like .NET or the JVM. I tend to think the data type design is better in Go, but the coroutine implementation and simplistic mark-and-sweep GC are inferior.

I'll also note that I should have checked how things had changed in C#. I hadn't used it sine C# 4 or something, when the ref keyword was limited to method parameters. Thanks for explaining the wealth of improvements since. This makes it indeed a core type of pointer, instead of being limited to a parameter passing scheme.

And yes, you're right that there exist many c# communities with different standards of use, and some are much more low level than Go can even... Go. I was talking mostly about the kind of things you'd find as recommendations from official MS docs or John Skeet answers on SO when I was saying "C# community", but you're absolutely right that this is a limited view and doesn't accurately encompass even some major forces in the ecosystem.


If they are small, then there can be a significant performance advantage to using a struct.

Imagine you have a Dictionary<TKey,TValue>. (That is, a dictionary (hash lookup) from type TKey to type TValue ).

Imagine your choice of key is a composite of 4 bytes, and you want to preserve the meaning of each, so let's say you define a class (shorthand to avoid all the get/set cruft):

    class MyKey {
       byte A;
       byte B;
       byte C;
       byte D;
    }
When you profile your memory usage, to your horror you discover you actually have a keysize of 64 bits for the reference to the location on the heap.

If however you use a struct, then your key is actually the 4 bytes that defines your key.

In modern .Net, you'd probably be better off defaulting to use a record (correction: record struct) type for this purpose unless you have very specific reasons for wanting fine-grained control over memory layout with the StructLayout Attribute.

See this article for using StructLayout:

https://learn.microsoft.com/en-us/dotnet/api/system.runtime....


It should be a record struct, as record defaults to class.


Thanks for that correction.


Please note that the article is quite old and does not encompass the wide variety of scenarios C# is effective at.

Structs are used for but not limited to: all kinds of "transient" data containers, pass by value semantics, precise control over layout of data in memory, low-level programming and interoperating with C/C++, zero-cost abstractions via struct generics (ala Rust or templates in C++), lightweight wrappers over existing types, etc.

Most C# codebases use them without even noticing in `foreach` loops - enumerators are quite often structs, and so are Span<T> and Memory<T> (which are .NET slice types for arrays, strings and pretty much every other type of contiguous memory, including unmanaged). Tuple syntax uses structs too - `(int x, int y)` is `ValueTuple<int, int>` behind the scenes.

.NET has gotten quite good at optimizing structs, so the more general response is "you use them for the same things you use structs in C, C++". Or the same reason you would pick plain T struct in Rust over Box/Arc<T>.

Intro to structs: https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...


usually performance reasons; don't want to allocate on the heap, contributing to GC pressure, or you want pass by copy semantics


Even if you exclude performance reasons, some things just make more sense as values rather than instances. A good example is Color. A Color might be an object that holds 3 integers (red, green, blue) but you want to treat it like you would a scalar value. For example, two color instances should be equal if they contain all the same values. Class instances all have their own identity -- two classes instances are not equal even if they contain the same values.


This isn't something that unsual though, from some oldies to newer ones, Delphi, Oberon variants, Modula-3, Common Lisp, Eiffel, D, Swift, and eventually Java (when value class and value record finally lands, EA available)

It is the focus on managed scripting languages, that trips people up, when they finally change to one of those compiled ones.


    > and eventually Java
Wait, it's been a looong time since I did Java. What am I missing here; what does Java do differently from C#?

(C# is my daily driver so I'm quite familiar with its handling of value and reference types)


Currently it does better escape analysis than the CLR is capable of, specially if using specific JVMs like GraalVM, Azul, as Java enjoys choice in JVM implementations. Apparently .NET 9 will have better escape analysis.

Additionally, there is the Project Valhala, whose goal is to add value types to Java, and there are now the first set of early access releases,

https://jdk.java.net/valhalla/

It has taken a while (about 10 years now) to get right, because they want to keep all existing ecosystem running on top of a Valhala improved JVM in a transparent way, and adding value types without changing the Java ABI from all stuff on Maven Central and private repos, is an herculean task.

Ideally Java's should have taken into account systems languages with GC that predated it, given the quoted influences on its design, better latter than never I guess.


It's sad that most of the items are clear language design mistakes stemming from the creators not learning from other languages. Go is a missed opportunity. Item after item of "yeah that wouldn't happen in rust".

The list feels like it's meant to blame the programmer, but that ain't right.


This drives me crazy. Go essentially doubles down on all the programming language mistakes we spent decades discovering and finding solutions for.

When people ask all those questions about why software is still so terrible in all sorts of ways, a large part of the answer is languages like C, C++, JavaScript, Python, and now Go.


    func findThing(things []Thing, name string) *Thing {
      for i := range things {
        if things[i].Name == name {
          return &things[i]
        }
      }
      return nil
    }
Also you could just return i or -1, and the consuming code would be clear about what it was doing. Find the index. Update the item at the index.

    if location := findThing(things, name); location != -1 {
         things[location].Name = "updated"
    }


Well, if you don't mind that it doesn't work correctly with slices: [0], then sure, you may return indices.

[0] https://go.dev/play/p/Q2ntuaugbGQ


That didn't work because you didn't pass the same slice in. You subsliced your things slice, which outputs a new slice.


It does work with slices as long as you use the slice to update as well


To generalize the title into a rule is good to remember that in Go everything is passed by value(copy).


That is the case for almost every modern language. C++ is one of the few languages that has "references" and at least last I looked that's a language accommodation over what are pointers being passed by value in the assembly, at least until compiler optimizations take over (and that's not limited to references either).

If you're in 2024 and you're in some programming class making a big deal about pass-by-value versus pass-by-reference, ask for your money back and find a course based in this century. Almost literally any topic is a better use of valuable class time than that. From what I've seen of the few unfortunate souls suffering through such a curriculum in recent times is that it literally anti-educates them.


For a last-century example of actual pass-by-reference, assign-by-copy, the PL/I program

  foo: proc options(main);
    dcl sysprint print;
    dcl a(3) fixed bin(31) init(1,2,3);
    put skip data (a);
    call bar(a);
    put skip data (a);
  
  bar: proc(x);
    dcl x(3) fixed bin(31);
    dcl b(3) fixed bin(31) init(3,2,1);
    x = b;
    b(1) = 42;
    x(2) = 42;
    put skip data (b);
    put skip data (x);
  end bar;
  
  end foo;
outputs

  A(1)=   1   A(2)=   2   A(3)=   3 ;
  B(1)=  42   B(2)=   2   B(3)=   1 ;
  X(1)=   3   X(2)=  42   X(3)=   1 ;
  A(1)=   3   A(2)=  42   A(3)=   1 ;
demonstrating that X refers to A in BAR and assigning B to X copies B into X (= A).


Sure, if they want to be bad developers in C#, F#, Swift, D, Rust, Ada, VB, Delphi, just to stay on the ones that are kind of relevant in 2024 for business, for various level of relevant, versus the ones story has forgotten about.


Rust doesn't have pass-by-reference in the sense that C# or C++ do. The `&` and `&mut` types are called references[1], but what is meant by "reference" is the C++ guarantee that a reference always points to a valid value. This C++ code uses references without explicit pointer deferencing:

    #include <iostream>
    #include <string>
    
    void s_ref_modify(std::string& s)
    {
        s = std::string("Best");
        return;
    }
    
    int main()
    {
        std::string str = "Test";
        s_ref_modify(str);
        std::cout << str << '\n';
    }
The equivalent Rust requires passing a mutable pointer to the callee, where it is explicitly dereferenced:

    fn s_ref_modify(s: &mut String) {
        *s = String::from("Best");
    }
    
    fn main() {
        let mut str = String::from("Test");
        s_ref_modify(&mut str);
        println!("{}", str);
    }
Swift has `inout` parameters, which superficially look similar to pass-by-reference, except that the semantics are actually copy-in copy-out[2]:

> In-out parameters are passed as follows:

> 1. When the function is called, the value of the argument is copied.

> 2. In the body of the function, the copy is modified.

> 3. When the function returns, the copy’s value is assigned to the original argument.

> This behavior is known as _copy-in copy-out_ or _call by value result_. For example, when a computed property or a property with observers is passed as an in-out parameter, its getter is called as part of the function call and its setter is called as part of the function return.

[1]: https://doc.rust-lang.org/book/ch04-02-references-and-borrow...

[2]: https://docs.swift.org/swift-book/documentation/the-swift-pr....


Is copying huge blocks of data free in 2024? My benchmarks suggest otherwise, and the world still needs assembly programmers.


The way almost all programming languages work is that they explicitly pass a copy of a pointer to a function. That is, in almost all languages used today, whether GC or not, assigning to a function parameter doesn't modify the original variable in the calling function. Assigning to a field of that parameter will often modify the field of the caller's local variable, though.

That is, in code like this:

  ReferenceType a = {myField: 1}
  foo(a)
  print(a.myField)
  
  void foo(ReferenceType a) {
    a.myField = 9
    a = null
  } 
Whether you translate this pseudocode to Python, Java, C# (with `class RefType`), C (`RefType = *StructType`), Go (same as C), C++ (same as C), Rust, Zig etc - the result is the same: the print will work and it will say 9.

The only exceptions where the print would fail with a null pointer issue that I know of are C++'s references and C#' s ref parameters. Are there any others?


Right. Passing pointers is much cheaper than passing values of large structures. And then references are an abstraction over pointers that allow further compile-time optimization in languages that support it. Pass-by-value, pass-by-pointer, and pass-by-reference are three distinct operational concepts that should be taught to programmers.


I think the right mental model is pass-by-value for the first two. There is nothing different in the calling convention between sending a parameter of type int* vs a parameter of type int. They are both pass-by-value. The value of a pointer happens to be a reference to an object, while the value of an int is an int. In both cases, the semantics is that the value of the expression passed to the function is copied to a local variable in the function.

Depending on the language, that is very likely the whole picture of how function calls work. In a rare few modern languages, this is not true: in C# and C++, when you have a reference parameter, things get sonewhat more complicated. When you pass an expression to a reference parameter, instead of copying the value of evaluating that expression into the parameter of the function, the parameter is that value itself. It's probably easier to explain this as passing a pointer to the result of the expression + some extra syntax to auto-dereference the pointer.


> I think the right mental model is pass-by-value for the first two. There is nothing different in the calling convention between sending a parameter of type int* vs a parameter of type int.

You're talking about parameters of type int; I'm talking about structs that are strictly larger than pointers. Structs which may be nested; for which deep copies are necessary to avoid memory leaks / corruption. And here, the distinction between these "mental models" exhibits a massive gap in real performance.

Here's a deliberately pathological case in C++; I've seen this error countless times from programmers in languages that make a distinction between references/pointers and values:

    bool vector_compare(vector<int> vec, size_t i, size_t j) {
        return vec[i] < vec[j];
    }

    int vector_argmin(vector<int> vec) {
        if (vec.size()) {
            size_t arg = 0;
            for(size_t i = 1; i < vec.size(); i++) {
                if (vector_compare(vec, i, arg))
                    arg = i;
            }
            return arg;
        } else return -1;
    }
The vector_compare function makes a copy of the full vector before doing its thing; this ends up turning my linear-looking runtime into accidentally-quadratic. From the perspective of this solitary example, it would make sense to collapse reference/pointer into the same category and leave "value" on its own.

But actually these are three distinct concepts, with nuance and overlap, that should be taught to anybody with more than a passing interest in languages and compilers. I'm not here to weigh in on what constitutes a modern language, but the notion that we should just throw this crucial distinction away because some half-rate programmers don't understand it is patently offensive.


My point is the same for int as for vector<int>. There is 0 difference in the C++ calling convention between passing a vector<int> and a vector<int>: they both copy an object of the parameter type. Of course, copying a 1000 element vector is much slower than copying a single pointer, but the difference is strictly the size of the type. The copying occurs the same way regardless. This is also the reason foo(char) is less overhead than a foo(char).

Everything (except reference types) is pass-by-value, but of course values can have wildly different sizes.

Also, the problem of accidentally copying large structs is not limited to arguments, the same considerations are important for assignments. Another reason why "pass-by-pointer" shouldn't be presented as some special thing, it's just passing a pointer copy.


Your point rather misses the mark.

Your vector<int*> is a red herring. The distinction I'm making is between passing a (vector<int>)* and a vector<int>, because those two objects have radically different sizes, and the distinction can and does create severe performance issues. And yet, pointers are still different from references: with a reference, you don't even need your object to have a memory address.


HN markup ate my *... Yes, I'm also talking about vector<int> and vector<int>*. They are indeed of radically different sizes, and the consequences of copying one are very different from the consequences of copying the other.

But this doesn't change the fact that they are both passed-by-value when you call a function of that parameter type.


It’s semantics only. The compiler is free to optimize it in any way, e.g. if a function call gets inlined, there is nothing “returning” to begin with, it’s all just local values.


See cousin posts. That's not what the terms mean.


Both C# and Swift makes a distinct difference by having both struct and classes.


That is not what the pass-by-copy vs. pass-by-reference distinction is. Both are passing by value, one is just a pointer (under the hood) and the other is not. But the incoming parameter is a copy.

See my cousin post: https://news.ycombinator.com/item?id=41220384

This is a distinction so dead that people basically assume that this must be talking about whether things are passed by pointer, because in modern languages, what else would it be? But that is not what pass-by-reference versus pass-by-copy means. This is part of why it's a good idea to let the terminology just die.


So what should we call "foo(x)" and "foo(ref x)" in C# to distinguish them if not pass-by-value and pass-by-reference?


C# can call it that specifically if it likes, because the general computer science term is dead, but under the hood you're passing a reference by value. Look to the generated assembler in a non-inlined function. You'll find a copy of a pointer. You did not in true pass-by-refernce langauges.

The fact that is a sensible thing to say in a modern language is another sign the terminology is dead.


C#'s ref parameters, same as C++' s reference types, have true pass-by-reference semantics. Whether this gets compiled to pass-by-pointer or not is not observable from the language semantics.

That is, the following holds true:

  int a = 10;
  foo(ref a) ;
  Assert(a == 100);

  void foo(ref int a) 
  {
    a = 100;
  }
There's also a good chance that in this very simple case that the compiler will inline foo, so that it will not ever pass the address of a even at the assembly level. The same would be true in C++ with `void foo (int& a)`.


Right, and on the flip side, Forth on say x86 will inevitably involve passing a pointer to the stack, be it explicitly or implicitly (ie global variable).

So if one invokes low-level implementation details, Forth is also a pass-by-pointer-value in the same way as C# "ref" and others, at least on x86.

However I don't think appealing to implementation details is useful, what matters is what is observed through the language, with the results you point out.


You're talking about pointers but calling them references. I'm sorry, but no, the terminology is not "dead" you're just contributing to confusion.


Is python no longer a modern language? Objects are certainly not copied when passed to a function.


Python copies references by value.

    $ python3
    Python 3.12.3 (main, Jul 31 2024, 17:43:48) [GCC 13.2.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> def x():
    ...     v = 1
    ...     y(v)
    ...     print(v)
    ... 
    >>> def y(val):
    ...     val += 1
    ... 
    >>> x()
    1
A pass-by-reference language would print 2.

Everything in a modern language is passed by copy. Exactly what is copied varies and can easily be pointers/references. But there were languages once upon a time that didn't work that way. It's a dead distinction now, though, unless you go dig one of them up.

If you want a specific one to look at, look at Forth. Note how when you call a function ("invoke a word", closest equivalent concept), the function/word doesn't get a copy of anything. It directly gets the actual value. There is no new copy, no new memory location, it gets the actual same memory as the caller was using, and not as a "pointer"... directly. Nothing works like that any more.


C++ is a live language, C# has out parameters.... there's stuff out there.

The classic example of "pass by copy-reference is less expressive" is you can't have pass a reference to number and have the caller modify it. You have to explicitly box it. I understand you understand this, but it's worth considering when thinking about whether the distinction means absolutely nothing at all.


> The classic example of "pass by copy-reference is less expressive" is you can't have pass a reference to number and have the caller modify it.

This is really not true. Depending on how your language implements pass-by-reference, you can pass a reference to an int without boxing in one of two ways: either pass a pointer to the stack location where the int is stored (more common today), or simply arrange the stack in such a way that the local int in the caller is at the location of the corresponding parameter in the callee (or in a register).

The second option basically means that the calling convention for reference parameters is different from the calling convention for non-reference parameters, which makes it complicated. It also doesn't work if you're passing a heap variable by reference, you need extra logic to implement that. But, for local variables, it's extremely efficient, no need to do an extra copy or store a pointer at all.


Hmmm... yeah that's a good point. Though I would contend that the fact that languages do not do this is indicative of... something.


I would guess that the main reason is that the on-stack way only works for local variables. If you want to pass anything else by reference, you need to use some kind of address to it, since it's not in the caller's stack anyway.


uh I'd not say it like that

Python passes primitive types by value, out rather "as if by value", because it copies them on write.

if you modify your experiment to pass around a dict or list and modify that in the 'y', you'll see y is happily modified.

so Python passes by reference, however it either blocks updates (tuple) or copies on write (int, str, float) or updates in place (dict, list, class)


> if you modify your experiment to pass around a dict or list and modify that in the 'y', you'll see y is happily modified.

No, you won't.

  x = {'a' : 1}
  foo(x)
  print(x)

  def foo(z):
    z = {'b' : 2}
You'll see that this prints `{'a' : 1}`, not `{'b' : 2}`. Python always uses pass-by-value. It passes a copy of the pointer to a dict/list/etc in this case. Of course, if you modify the fields of the z variable, as in `z['b'] = 2`, you do modify the original object that is referenced by z. But this is not pass-by-reference.


Is it not pass-by-reference by some technicality? In the mutation example you suggest, if a reference to x isn't being passed into foo, how could foo modify x?

I would sooner believe the example is showing you shadowing the z argument to foo, than foo being able to modify the in-parameter sometimes even if it's pass by value.


> In the mutation example you suggest, if a reference to x isn't being passed into foo, how could foo modify x

The important point is that it's not "a reference to x" that gets passed, it's a copy of x's value. x's value, like the value of all Python variables, is a reference to some object. The same thing applies to setting variables in Python in general:

  x = {1:2} # x is a new variable that references some dict 
  y = x # y is a new variable that references the same dict
  y[1] = 7 # the dict referenced by x and y was modified
  x = None # x no longer references the dict
  print(y) # y still references the dict, so this will print {1:7}
  y = None # now neither x nor y reference that dict; since y was the last reference to it, the dict's memory will be freed


This is what confuses me; it sounds like what you're saying is Python isn't pass-by-reference, it's pass-by-value, and that value is sometimes a reference?

Honestly, "x's value, like the value of all Python variables, is a reference to some object" makes me think it's more accurate to call Python pass-by-reference only.


Pass-by-reference means that your callee gets a reference to your local variables, and can modify them. This is impossible in Python. Pass by value means that your callee gets the values of your local variables and can't modify them. This is how Python functions work.

What those values represent and how they can be used is a completely different topic. Take the following code:

  x = "/dirs/sub/file.txt"
  with open(x, "w") as file:
    file.write("abc")
  foo(x)
  with open(x, "r") as file:
    print(file.read_all()) #prints "def" 

  def foo(z):
    with open(z, "w") as file:
      file.write("def")
      
Here x is in essence a "reference to a file". When you pass x to foo, it gets a copy of that reference in z. But both x and z refer to the same file, so when you modify the file, both see the changes. The calling convention is passing a copy of the value to the function. It doesn't care what that value represents.


So to be very clear:

  def foo(x):
    x['a'] = 1
  
  y = {'b': 2}
  foo(y)
  print(y)
foo can modify the object y points to, but it can't make y point to a different object? Is that what "This is impossible in Python" is referring to?


Yes.


yes! we agree how this works.

so we disagree on terminology?

in my CS upbringing, sharing the memory location of a thing as parameter was tagged "call by reference". the hallmark was: you can in theory modify the referenced thing, and you just need to copy the address.

call by value, in contrast, would create an independent clone, such that the called function has no chance to modify the outside value.

now python does fancy things, as we both agree. the result of which is that primitives (int, flot, str) behave as if they were passed by value, while dict and list and its derivatives show call by reference semantics.

I get how that _technically_ sounds like call by value. and indeed there is no assignment dunder. you can't capture reassignment of a name.

but other than that a class parameter _behaves_ like call by reference.


"In the mutation example you suggest, if a reference to x isn't being passed into foo, how could foo modify x?"

Because it is passing a pointer by value under the hood.

This is the part that messes everyone up. Passing pointers by value is not what passing by reference used to mean.

And it matters, precisely because that is extremely realistic Python code that absolutely will mess you up if you don't understand exactly what is going on. You were passed a reference by value. If you go under the hood, you will find it is quite literally being copied and a ref count is being incremented. It's a new reference to the same stuff as the passed-in reference. But if you assign directly to the variable holding that reference, that variable will then be holding the new reference. This is base level, "I'd use it on an interview to see if you really know Python", level stuff.

Everything in a modern language involves passing things by value. Sometimes the language will gloss over it for you, but it's still a gloss. There were languages where things fundamentally, at the deepest level, were not passed by value. They're gone. Passing references by copy is not the same thing, and that Python code is precisely why it's not the same thing.


Sure, but in the traditional sense of pass-by-reference you could say it's just always pass-by-value, and that value is always a reference. It's just not a helpful thing to say. (In those traditional pass by reference languages, was it impossible to pass a scalar to a function?)

Passing a pointer-by-value similarly seems to be avoiding the point; if you tell me Python is always pass-by-value I'll expect an object I pass to a function to be a copy & not a reference, thus not be able to be mutated, and that's not the case.


> Passing a pointer-by-value similarly seems to be avoiding the point; if you tell me Python is always pass-by-value I'll expect an object I pass to a function to be a copy & not a reference, thus not be able to be mutated, and that's not the case.

That would be a misunderstanding. It would only make sense if you think Python variables are Python objects. They are not: Python variables are pointers to objects. The fact that assignment to a variable never modifies the object pointed to by that variable is a consequence of that, and doesn't apply just to passing that variable to a function.


you replace the local binding z to the dict globally bound to x by a local dict in that z = ... assignment.

however if you do z['b'] = 2 in foo, then you'll see the global dict bound to x has been modified, as you have stated.

well, that's _exactly_ pass by reference.


There is no notion of variable binding in Python, that's a different thing. z, like any Python variable, is a reference to something. Initially, it's a reference to the same dictionary that x references. If we modify the object referenced by z (e.g. by adding a new item), we of course also modify the object referenced by x, as they are referencing the same object initially. However, when we assign something to z, we change the object that z is referencing. This has no effect on x, because x was passed-by-value to foo().

Pass-by-reference doesn't exist in Python. Here's what it looks like in C#, which does suport it:

  auto x = Dictionary<string, int >() ;
  x.Add("a", 1);
  foo(ref x);
  System.Println(x); //prints {b: 2}

  void foo(ref Dictionary<string, int> z) {
    auto k = new Dictionary<string, int>();
    k.Add("b", 2);
    z = k;
  }
Here z is just a new name for x. Any change you make to z, including changing its value, applies directly to x itself, not just to the object referenced by x.



But the author already knew that.

The important lesson is that assignments are by value(copy).


Maps and channels and functions are passed by reference. Slices are passed and returned by value but sometimes share state invisibly, the worst of both worlds. It would make more sense if Go either made this stuff immutable, made defensive copies, or refused and required using explicit pointers for all these cases.


No, it's not the case and this terminology shouldn't be used as it's confusing and unhelpful.

There are reference types in Go even though this is also not a super popular term. They still follow the pass-by-value semantics, it's just that a pointer is copied. A map is effectively a pointer to hmap data structure.

In the early days of Go, there was an explicit pointer, but then it was changed.

Slices are a 3-word structure internally that includes a pointer to a backing array and this is why it's also a "reference type".

That said, everything is still passed by value and there are no references in Go.


You are splitting hair, Maps are effectively references.

That's like saying C++ doesn't have references since it's just a pointer being copied around


No, there is a real difference, this is not splitting hairs.

Go is always pass-by-value, even for maps [0]:

  x := map[int]int{1: 2}
  foo(x)
  fmt.Printf("%+v", x) //prints map[1:2]

  func foo(a map[int]int) {
    a = map[int]int{3: 4}
  }
In contrast, C++ references have different semantics [1]:

  std::map<int, int> x {{1, 2}};
  foo(x);
  std::print("{%d:%d}", x.begin()->first, x.begin()->second);
  //prints {3:4}

  void foo(std::map<int, int>& a) {
    a = std::map<int, int> {{3, 4}}; 
  } 
[0] https://go.dev/play/p/6a6Mz9KdFUh

[1] https://onlinegdb.com/j0U2NYbjL


foo is receiving a mutable reference and it can't modify the map without those changes leaking out permanently to the caller: https://go.dev/play/p/DXchC5Hq8o8. Passing maps by value would have prevented this by copying the contents.

It's a quirk of C++ that reference args can't be replaced but pointer args can.


The point is that the local variable referencing the map is a different entity than the map itself. Foo gets a copy of that local variable, and the copy references the same map object.

And the fact that C++ references can be used for assignment is essentially their whole point, not "a quirk".


A (shameless) plug: I've been building a collection of Go bits like this. Hopefully it can be useful to someone other than me, too:

https://github.com/geokat/easygo


For me, who came from PHP, the way Go works seemed the most natural. PHP is also one of very few (old) languages which makes everything pass-by-value (except for objects, which initially also were pass-by-value but it was so confusing for people coming from other languages that they changed it).

Treating everything as a value IMO is quute nice _if you except it_, because it eliminates a whole class of possible side effects from mutating the value inside the receiver, without requiring extra complexity like immutability in the language itself.


It can also be a performance issue since range has to make a copy of whatever was in the slice. Slices of pure structs can be tantalizing for their simplicity but you should be thinking of how you want to range over them first and double check yourself everytime you write:

    _, obj := range ...
You're explicitly asking for the two argument convenience which can have a price.


Something that strikes me about Go's approach here, and the explanations in many of the posts on this page, are that they're all focused on what is happening under the hood: What memory are we pointing at, what's being copied, is it a pointer etc etc.

Whereas if we start from a point of view of "What semantics and performance guarantees do we desire?", we might end up with a more coherent user-facing interface (even if internally that leads to something more complex).

Personally, my mental model is often influenced by Python - where a name is distinct from a variable, but this distinction doesn't seem to appear in many other languages.


I always get surprised in the opposite direction by languages that work like that, fun to see it from the other side.

As for the second mistake listed, this is practically the reverse confusion itself... I remember one time in an interview, I got this bit of arcana about Go slices right and the interviewer insisted it was wrong, and despite the evidence being on the screen in the program output at the time, I just backed down. Not sure why I or anyone ever submits to the indignity of job interviews, but it also soured me on Go itself a bit!


Nice post!

Also, not everyone knows that even the much-maligned old C does this.

It's a huge red flag/cringe when someone breaks out memcpy() just to copy a struct value (or return it, obviously).


    thing := Thing{...}
    other_thing := thing
    pinter_to_same_thing := &thing
ALL types in go are being copied by value. There is no such thing as a "reference" in this language. Even a slice or map is just a small struct with some syntactic sugar provided by the language, and when you assign a slice to another variable, you are, in fact, creating a copy of that struct.


My pet peeve with slices and maps is that they hide a reference to the actual structure, and you are never sure of what you are modifying, or the performance impact when moving around big structures.

Example with slices: https://go.dev/play/p/8arcUrGU4SU

Example with maps: https://go.dev/play/p/eq8i6z8a4jN


No, there is no "hiding" of a "reference". There is copying a struct:

    s2 := s1
This copies the struct in `s1` into a new name `s2`. This struct contains, among other things, a pointer to the backing array. Therefore, when you assign to the slice

    s2[0] = "bye"
You assign to the same backing array. Slices are not arrays. Copying a slice copies a struct containing a pointer to an array. A similar situation holds true for maps. The same logic that is universal throughout the language, aka. "Go only ever copies things by value" holds true for all of these types.

https://go.dev/ref/spec#Slice_types

"A slice, once initialized, is always associated with an underlying array that holds its elements. A slice therefore shares storage with its array and with other slices of the same array; by contrast, distinct arrays always represent distinct storage."


Since a slice/map, internally, contains a pointer to the data, it looks like slices/maps have reference semantics: after you do "m2 := m1", all changes done through m1 are visible through m2, even though the type of m1 and m2 has no visible asterisk anywhere in it.


> , it looks like slices/maps have reference semantics

No, they don't. Pointers and references are fundamentally different concepts. A reference is a name, handled by the runtime, that is bound to an entity. A pointer is just a value of type `uintptr`.

When I "copy" a reference, I simply instruct the runtime to bind another name to the entity.

When I copy a struct containing a pointer, I actually allocate new memory to contain a new copy of that `uintptr`. And since that copy is a true copy of a pointer-value, I can change it.

That's why this:

    func main() {
        s1 := []int{1, 2}
        s2 := s1
        s2 = append(s2, 3)
        s1[0] = 42
        fmt.Println(s1)
        fmt.Println(s2)
    }
Will give you

    [42 2]
    [1 2 3]
as an output. s2 is not a "Reference" to the same entity as s1, it is a struct holding a pointer, and when we grow the slice this struct represents beyond the capacity of the backing array that pointer points to, by appending to it, we replace that pointer in s2.

Comparing that to a language that actually does have reference semantics (python):

    s1 = [1, 2]
    s2 = s1
    s2.append(3)
    s1[0] = 42
    print(s1)
    print(s2)
Gives me

    [42, 2, 3]
    [42, 2, 3]


Case of where the language affects (and can clarify/obscure) your model of the system and its behaviour: C++'s copy-assignment operator [0], which makes these semantics explicit.

[0] https://en.cppreference.com/w/cpp/language/copy_assignment


jvns highlights some of the easier to forgot or overlook mistakes, but their source article https://100go.co is a great refresher and introduction as well.


Golang is sometimes considered a simple language, but it's not really beginner-proof like Java was designed to be. It's a good idea to spend time learning it thoroughly.


Looks as though the range loop isn't an issue from Go 1.22 anymore.

https://go.dev/blog/loopvar-preview


That's a different language design mistake, which several languages have had to fix including C# back in C# 5.

The question there is, does our for-each style loop make a new variable each iteration, with the appropriate value, or does it have a single variable and it's just re-assigned for each iteration. People who haven't designed a language before might think the second option sounds optimal and won't make a practical difference, but it's actually very annoying and that's why Go changed to the former.

This time though it's not about the variable staying the same, the problem is that we got a copy of the data we cared about, not a mutable reference to that data.


It is terrifying to read these comments. I don’t think I realized how confused so many programmers are about how programming works. Maybe the safety police are right after all.


The one about named returns, err always being nil, why is err even in scope, seems like it should be a compile error to me? (I rarely write Go).


In the function signature, the return variable err is type error. Since it is named it is also defined and initialized as nil.

In the code, an error is found and it does not assign a value to err and just returns it as the error value.

So it returns nil as the error when it wants to return an error with a proper value.

The code should be something like:

    if ctx.Err() != nil {
      return 0, 0, ctx.Err()
    }


Another thing to watch out for is that "defer" in Go is executed at the end of the function, not at the end of the current scope. This makes it not only more difficult to reason about but also much less useful.


Read The Fine Manual, and read some books, so underappreciated these days...


Misconceptions probably come from Java or Python where a bunch of things are implicitly done for you. I much prefer Golang’s explicitness. The stuff with slices are confusing though


What a reach to blame Java.


Agreed on this one, the "fix" involving the capacity flag, e.g. "2:3:3" is unintuitive compared to Python where there is no such concept.

Still, as far as sharp edges go these are nothing compared to Java.

See the discussion from:

Common I/O Tasks in Modern Java

https://news.ycombinator.com/item?id=41142737

House of horrors, especially the URL equality triggering DNA requests.


> URL equality triggering DNS requests

I agree it can be confusing, as the URL also can act as a client that performs actual connections. The documentation actually mentions that URI is a better choice when you want only a representation

In general I don't think the other examples are that bad. The reason why there are so many different ways of performing I/O, is because Java is evolving and adding better solutions, but can't really remove older stuff like "URL" because it is widely used.

I see similar issues with other languages as they evolve, and I think Java has managed it well. The IDEs are also often good at making suggestions on how to replace outdated code.


[flagged]


> Maybe you meant Zig or C

This seems hubris for someone who depends on types, looping constructs, and compilers.

Maybe you meant writing machine code or 6502, then I'd likely agree, but C is quite literally (& excellently) designed for quiche eaters.

/s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: