The very last paragraph sums it up pretty nicely imo:
> You'll have to be more creative when coming up with variables that don't shadow existing ones (which, for me, generally means using more obscure names).
The last adjective I want used in relation to my variable names is “obscure.”
I really like in Rust how you can reinit a variable with a different type e.g. “let rect: Rect<f32> = rect.into();”
It’s just so damn useful and I’m not sure what the downside is, it sucks when you have to keep coming up with different names so you can keep around an identifier that you don’t need anymore.
This is idiomatic Rust, it works very nicely there, however most languages aren't Rust
Rust's Into::into() is consuming the object in the old (now shadowed) rect variable. So conveniently the old rect variable which we can't access also no longer has a value†. In many languages a method can't consume the object like that, so the old object still exists but we can't access it because it is shadowed.
For example in C++ they have move semantics, but their move isn't destructive, so the object is typically hollowed out, but still exists until the end of the scope at least.
Rust's type strictness matters here too. It means if you later modify some code using rect meaning whatever it was before that statement morphing it into a Rect<f32> chances are it doesn't type check and is rejected. For example in many languages if (rect) { ... } would be legal code and might change meaning as a result of the transformation, but in Rust only booleans are true or false.
† Unless this previous variable's type implemented the Copy trait and therefore it has Copy semantics and consuming it doesn't do anything.
This is all true in this instance, but it doesn't have to be at all. You could write something like the following:
let name = "Arthur".to_owned();
//... Do something with &Arthur
let name = "Bethan".to_owned();
//... Do something with &Bethan
In this situation, ownership of the first string was never passed on, and the value will be dropped (deallocated, destructed, etc) at the end of the scope, meaning that also in Rust, the first string value is shadowed and becomes inaccessible, much like in your description of C++. In addition, because in this case both variables have the same type, you can use second variable thinking that you're using the first one, and the compiler will not help you, you'll just end up using the wrong name somewhere.
Fwiw, I find this feature very useful, and it's helpful more often than it is a nuisance. But there are no guarantees that you're consuming or transforming the object you're shadowing, and the compiler won't necessarily help you out if you simply accidentally use the same name twice.
As another commenter says the moved-from object should have "Valid but unspecified state" (types provided by the standard library will do that, custom types merely should do that)
Since you don't know what valid state it has, calls with pre-requisites are nonsense (e.g. if you have a Bird and the method land requires that the Bird should be flying, you can't call land() on a moved from Bird, because you don't know if it's flying) but all calls without pre-requisites are fine e.g. asking how long a string you moved is would work - it's probably zero length now, but maybe not.
> Presumably it wouldn’t be hard for the compiler to yell at you.
In the general case this is Undecidable, so, the opposite of not hard.
> I would assume you could change this without affecting backwards compatibility.
C++ which relies on this exists today, the most likely path to actually landing destructive move in C++ would be to add a whole new set of construction and assignment operators for destructive move, forcing people to opt in and adding to the many sets C++ already has, and likely angering C++ developers a great deal in the process.
Howard Hinnant, whose design today's non-destructive move is, did argue that in principle it's possible to add destructive move to the language later if desired, but his description rather undersells the benefits of this design, presumably because he couldn't deliver it. Maybe he'd watched enough Mad Men (yup, Mad Men's early seasons pre-date C++ having move semantics) to know you shouldn't tell the customer what they can't have or they'll want it.
Common things to actually do with a C++ variable after moving from it are:
* Nothing, but in the knowledge it won't be cleaned up until the scope ends
* Re-assign it, destroying the hollowed out object immediately
* Re-use the hollowed out object, e.g. call a clear() method on it and then use as normal
The only requirement placed on the “moved out” variable is that you should be able to call its destructor. Which means that it has to be in a valid but unspecified state. So it's fine to access such a variable, so long as you don't read its exact state. You can still assign to it, for instance.
Definitely super useful, especially in a language where such conversions are rather common.
Also useful because you can’t have abstracted local types, so let’s say you’re building an iterator in a language with interfaces you could do something like
let it: Iterator = some().thing();
// intermediate stuff
it = it.some().transform();
// more intermediate stuff
it = it.final().transform();
But in Rust that won’t work, every adapter yields a different concrete type, you’d have to box every layer to make them compatible. Intra-scope shadowing solves that issue.
The biggest downside is that it’s possible to reuse names for completely unrelated purposes, which can make code much harder to understand. Clippy has a shadow_unrelated lint but it’s allowed by default because it’s a bit limited.
That’s the point, you can because rust supports intra-scope shadowing.
If it didn’t you’d have to type-erase, or create a new independently-named binding for every step, as you do in e.g. Erlang (can’t say this is / was my favourite feature of the langage).
Yes, the fact that "V = expression" means "if variable V doesn't exist, assign expression's value to it; otherwise compare the expression's value with the value of V and raise exception if they're not equal" is one of my least favourite parts of Erlang.
I semi-regularly introduce local variables named exactly like one of the function's parameter and then spend several minutes trying to understand why the line expression on the right-hand side of assignment throws badmatch: of course, it doesn't, it's the assignment itself that throws it.
Yes that’s also an interesting facet of the language. IIRC it makes sense because of the Prolog ancestry, so it kinda-sorta looks like unification if you squint, but boy is it annoying.
The downside is that you may get a weird bug and only after a while see that you accidentally overwrote a function parameter and the Rust compiler didn't even warn you about it.
For this reason I always add the following line to my projects to enable warnings:
You can also use "deny" instead of "warn" to make it an error.
I also like "#![deny(unreachable_patterns)]", which detects bugs in enum pattern matching if you accidentally match "Foo" instead of "Type::Foo" - I honestly don't know why this isn't set by default.
> The downside is that you may get a weird bug and only after a while see that you accidentally overwrote a function parameter and the Rust compiler didn't even warn you about it.
If you “overwrite” a function parameter without using it, the compiler will warn you of an unused variable.
If you “overwrite” a function parameter because you’re converting it, it’s a major use case of the feature.
> I honestly don't know why this isn't set by default.
Because the author of the match can’t necessarily have that info e.g. if you match on `Result<A, B>` but `B` is an uninhabited type (e.g. Infallible), should the code fail to compile? That would make 95% of the Result API not work in those cases. Any enum manipulating generic types could face that issue.
IIRC it was originally a hard error, and was downgraded because there were several edge cases where compilation failed either on valid code, or on code which was not fixable (for reasons like the above).
> If you “overwrite” a function parameter because you’re converting it, it’s a major use case of the feature.
Or it's unintended and thus a bug. I personally almost never intentionally shadow variables so I turned it into warnings.
> e.g. if you match on `Result<A, B>` but `B` is an uninhabited type (e.g. Infallible), should the code fail to compile?
This specific example you chose is probably the least relevant here, as the Result type doesn't require you to write "Result::Err(_)" instead of just "Err(_)", both will correctly match. Which can of course also be done for custom enums by "importing" their variants ("use EnumName::*;"). But in my experience it's easy to accidentally omit the type in the match pattern and then suddenly it matches everything. I personally can't imagine a situation where this is intentional and have spent way too much time debugging this specific issue, hence I choose to turn it into an error.
> The downside is that you may get a weird bug and only after a while see that you accidentally overwrote a function parameter and the Rust compiler didn't even warn you about it.
It will absolutely warn about this:
fn foo(i: u32) -> u32 {
let i = 42;
i
}
fn main() {
dbg!(foo(42));
}
results in
warning: unused variable: `i`
--> src/main.rs:1:8
|
1 | fn foo(i: u32) -> u32 {
| ^ help: if this is intentional, prefix it with an underscore: `_i`
|
= note: `#[warn(unused_variables)]` on by default
I don't know what the difference was, but 1-2 years ago it definitely did not warn me. Perhaps it doesn't show a warning when you assign a different datatype?
There's only one case where shadowing has bitten me in the past: long methods with loops dealing usize almost exclusively, where shadowing external bindings inside the loop might make sense, but any mistake would be silent. This was in the context of terminal layout code. The solution there has been extensive testing, but what I should have done is split the megafunction into multiple smaller ones.
Exactly! I was thinking of shadowing in Rust when I wrote my original comment.
My day job is predominantly in Typescript and a lot of code winds up reading significantly worse than it needs to. A common pattern for me is unique-ifying some sort of array—“const dataUnique = new Set(data);” is horrible, and if there’s no reason to keep the original “data” variable in scope then it’s doubly bad; I want to keep as little context in my head as possible.
The downside is when reading code you’re keeping in your head information about the type of each variable. If you skim through the code and miss one of these redefinitions then you may be mistaken about the variable’s type.
That said, I still think sparing use of this is justified, especially with an editor which can show types on mouseover.
That’s true, but this has never been a problem for me looking through large codebases and doing code reviews, in other languages I was constantly annoyed by not being being able to shadow
At the opposite end of the spectrum, there are languages with case insensitivity and even style insensitivity. I personally avoid them, but it's interesting how the users of these languages have a very different philosophy.
With shadowing, you can use or mutate a variable, thinking you are using/mutating the outer instance because you’re unaware of the inner (shadowing) instance, which is the one you are really using/mutating. An IDE doesn’t help catching such an inadvertent error (unless it warns about shadowing variables, but then you’d want to rename it anyway, to get rid of the warning).
I’ve tripped over unexpected shadowing often enough that I wish more languages would forbid it. I rarely have trouble choosing appropriate variable names to avoid shadowing.
it's a footgun indeed and no IDE per se doesn't solve all the problems. But since rust was mentioned, there are other rust features that make that less of a problem: most of the rust code uses immutable variables and only rarely you do use mut variables and mut references and these can be under bigger scrutiny by reviews and linters.
I focused on IDEs in my comment because I find shadowing to be a problem even with immutable variables, because it's hard for you to tell what is the type of a variable if it keeps change throughout the function body.
There is no such restriction, it's just much less common to want that.
let x: u32 = 5;
let x: u32 = 10; // You can write this, but why?
let x: u32 = 20; // I really feel like you should re-consider
If you end up shadowing this way in a long function it more likely means the function got too long. On the other hand, I certainly have had cause to shadow variables in inner scopes e.g.
let x = some_complicate_stuff();
for dx in [-1, 0, 1] {
let x = x + dx;
// Do stuff with x very naturally here, rather than keep saying "x + dx" everywhere
}
// But outside the loop x is just x, it's not x + dx
Why would you do that with the same type instead of just making the variable mutable? And you can do it, I just don't think it's a good idea as you now effectively have a mutable variable without it being marked as such.
No, it's still better than a mutable variable. Because it's not a mutable variable, just a series of variables that happen to have the same name.
Mutable state is 'evil' and makes your program harder to reason about on a semantic level. Shadowing is merely a syntactic choice with pros and cons.
I like shadowing in Rust, it works well there. In eg Python or Haskell, it works less well, but for different reasons. (In Haskell it's because of laziness and definitions being co-recursive by default. In Python it's because the language doesn't give you any tools to tell apart assignment to an existing variable from creation of a new variable.)
> it's not a mutable variable, just a series of variables that happen to have the same name.
Fair point, though in that case I'd be more comfortable separating those variables into scopes.
> Mutable state is 'evil' and makes your program harder to reason about on a semantic level. Shadowing is merely a syntactic choice with pros and cons.
Both result in multiple states of the same identifier, so I don't quite see the big difference here. In Rust I already have the clearly visible "mut" keyword telling me that it'll be overwritten.
I added shadowing protection to D a couple decades ago. It has saved me from numerous difficult-to-find bugs, especially if the function gets a bit large. It's a big win.
Don't forget that zig files are essentially a struct using the same name as the file. That means you can differentiate between identifiers that are using the same name by supplying the struct name as a prefix.
You make it sound so difficult, but in my experience it hardly ever comes up in practice. When it does, rather than naming things obscurely I can just tack a temp_ or local_ or something else descriptive on the front.
When I used Zig I ran into this all the time in practice when using iterators. It is the combination of that you cannot nicely create a locally scoped iterator and that Zig forbids shadowing.
Specifically this kind of code (and I do not feel calling them e.g. iter1 and iter2 or foo_iter and bar_iter make the code nice):
var iter1 = foo_dir.iterate();
while (try iter1.next()) |entry| {
// Do something
}
var iter2 = bar_dir.iterate();
while (try iter2.next()) |entry| {
// Do something else
}
At the moment, the recommended solution for that is to use a block scope.
{
var iter = foo_dir.iterate();
while (try iter.next()) |entry| {
// Do something
}
}
{
var iter = bar_dir.iterate();
while (try iter.next()) |entry| {
// Do something
}
}
Being clear about things is one of Zig's central philosophies, if naming things `foo_iter` and `bar_iter` doesn't feel like the right solution then maybe the language isn't a good fit for you.
But also:
var iter = foo_dir.iterate();
while (try iter.next()) |entry| {
// Do something
}
iter = bar_dir.iterate();
while (try iter.next()) |entry| {
// Do something else
}
You can totally just reassign `iter` here since you're done with the first iterator (assuming they are both the same type, which appears to be the case).
I do not think long variable names is very clear. What is clear is locally scoped ones and there I feel Zig has some deficiencies which makes it unnecessarily cumbersome to have locally scoped iterators.
It's accidental complexity. There's an artificial limitation which you need to work around. What you get back is very questionable, I certainly don't remember when it was last time I've messed up the scoping, if ever.
they'll never remove it, because not shadowing variables really encourages you to break up your code into well-organized files. IMO, it's only painful if you
1) have bad coding habits. (playing with zig broke me of some of these)
> Speaking of structure fields, they're always public. Structures and functions are private by default with an option to make them public. But struct fields can only be public.
This feels like a mistake to me. I own and maintain a C library, and lack of private struct members is one of the things that causes the most problems. My users frequently reach into struct members even though I really do not want them to. This causes problems when those members have subtly different semantics than what they expect. It's even worse when I want to change the internal representation, and I have to track down all my users' code and change it accordingly.
In C, my only options are:
(1) Define the structure in a .c file instead of .h. This works great in cases where I can take the performance hit of not being able to inline accesses of those struct members. But for performance-critical structures where I need to be able to inline, this is not an option.
(2) Make the member names something like private_foo, internal_foo, etc. and hope my users take the hint. But this also makes my own code more ugly and creates longer lines that are more likely to wrap.
Unfortunately Zig is even less capable than C in this regard, because Zig takes away option (1). Since there is no header/source split in Zig, I cannot make a struct opaque by defining it in a source file only. Though I do see that there is "opaque {}", perhaps that plus some casting could accomplish a similar thing?
> The idea of private fields and getter/setter methods was popularized by Java, but it is an anti-pattern. Fields are there; they exist. They are the data that underpins any abstraction. My recommendation is to name fields carefully and leave them as part of the public API, carefully documenting what they do. [...] In my subjective experience, public fields generally lead to better abstractions by eliminating the temptation to attempt full encapsulation, when the more effective strategy is to provide composable abstraction layers.
Java does indeed represent an anti-pattern of verbose and frequently trivial getters and setters. But newer languages like C#, Swift, and Dart have more elegant and low-overhead syntax for properties, even allowing a property to move between being an actual field member and being derived, without breaking users.
As a library maintainer, full encapsulation is very important for evolution of a system over time. The only way for layers to truly be layers is if the contract at each layer boundary is clear. Otherwise the layers gel together and you cannot safely change any one layer independently.
Zig compiles everything as one large compilation unit, so inlining isn't much of an issue there. If you want to have private fields, there's a couple ways to go about it. You can make use of opaque types, of course, or you could use @fieldParentPtr to hide your private implementation from the API.
Something like so:
/// The internal implementation of this type.
const PrivateType = struct {
// Private fields.
field_a: u32,
field_b: bool,
// Public object exposed
public: PublicType,
// private functions here.
}
/// This gets exposed to the public API
pub const PublicType = struct {
// Public fields.
field_a: u32,
// Public functions here.
pub fn create(alloc: Allocator) !*PublicType {
var priv = try alloc.create(PrivateType);
return &priv.public;
}
/// Flips internal field b.
pub fn flip(p: *PublicType) void {
// Grabs the PrivateType that contains this PublicType as a field.
const self = @fieldParentPtr(PrivateType, "public", p);
self.*.field_b = !self.*.field_b;
}
}
D has a more unusual take on private fields. Private fields prevent access by other modules, but do not prevent access from within the same module. The module is really the unit of encapsulation, not the struct.
The purpose of this was to eliminate the need for "friend" classes.
This sometimes conflicts with what people coming from C and C++ are used to with everything being in one file (after headers are added by the preprocessor). But having modules gives opportunities for better ways to organize the code.
As for trivial property accessors and setters, those get inlined, so are not overhead.
> D has a more unusual take on private fields. Private fields prevent access by other modules, but do not prevent access from within the same module. The module is really the unit of encapsulation, not the struct.
Other languages would call these "internal", not "private".
Rust uses "private" to express the same notion, that a field can be accessed from the defining module and its submodules, but not from any other modules, unless you explicitly add a visibility modifier like "pub" or "pub(crate)".
Well, OOP was used, and its naming conventions settled, decades before Rust or D came along. See https://en.wikipedia.org/wiki/Simula for example. They decided to go against established conventions (as far as I understand - I don't know what the relationship between Rust modules and classes is). That's fine, but let's not pretend otherwise.
A Rust module is similar to a Java package or a C# namespace.
Java calls this notion "package-private" [0], at least in some of its documentation. Delphi calls this notion "private" [1], and has a separate "strict private" for members only visible from the containing class.
Also, C# uses "internal" to refer to a different notion, that the type or member can be accessed from anywhere in the containing assembly, not just the containing namespace [2]. (In D this is "package", and in Rust this is "pub(crate)".)
I'm just trying to push back on treating "other languages" as a unified whole here.
> Java does indeed represent an anti-pattern of verbose and frequently trivial getters and setters.
It's not an anti-pattern.
This is how Java provides the exact functionality that you describe as very important (field encapsulation). Because Java doesn't support properties, it needs to go through getters and setters.
As you point out, more modern languages support this functionality directly in the language with properties.
To me the anti-pattern is lots of redundant, trivial, verbose getters and setters. If I see a source file with tens or hundreds of lines of trivial method definitions, I think something has gone wrong.
Properties help avoid this anti-pattern by making it possible to transparently swap in a computed property for what used to be a plain field. This removes the incentive to proactively create trivial getters and setters "just in case."
Let's take something like C++ std::vector / Rust's Vec / Zig ArrayList
Although there's some serious meat in a few places (e.g the growth amortization and associated mechanisms) many of the methods defined for these types are trivial.
In C++ they don't look trivial because of layers of pre-processor cruft, terrible lack of language hygiene, and the perverse insistence that today's code should compile correctly with a compiler that's as old as Shrek (the movie that is).
But they are, ultimately, still trivial, even if writing a *= 2; takes sixteen lines of macros and a dozen structure tricks to ensure that compiler doesn't think we meant the global variable a, or the structural type a, or the function named a, or a dozen other things we couldn't possibly know about because this language is very, very stupid.
Here's Vec's implementation of a commonly used method:
But you wrote it in reply to a description of 10s or 100s of lines of trivial members.
Maybe D doesn't have such a thing, as I said I don't plan to go look, but all the other languages I used as examples certainly do. Rust's Vec has almost 50 public associated functions, a few are non-trivial, but many are, and of course they take up hundreds of lines.
> Properties help avoid this anti-pattern by making it possible to transparently swap in a computed property for what used to be a plain field.
Java's getters and setters *are* properties. Just with more syntactic sugar than languages that natively support properties, but the semantic is exactly the same.
> My users frequently reach into struct members even though I really do not want them to.
if as a library writer you don't want me to access your private struct layout and as a user I want to, I promise there is absolutely nothing that I won't try, from casting to bytes and working there, passing things from / to DLL writtens in C, to running your program in a VM, etc. So why try to fight this fight? Just tell your users that go outside of the sanctioned route that you won't help them just like a car manufacturer won't help you hack into their engine but you still can and should be able to if you want to
I'm completely content with that deal. But the deal only works if the warranty is clear. Unless it's unmistakable to users that they have crossed the line, they will complain if you break them. A comment next to the member is not enough: IDEs will happily auto-complete the names of internal-only members.
With private fields, there's a clear signal that you will be crossing a line by doing all those shenanigans -- like a 'no running' sign at the pool, you violate the rule and you accept responsibility for what happens next. With only public fields, there is no way to signal a rule. The pool no running sign is taken down.
And, there are valid reasons for a library writer to want to warn people off using certain fields, but still have them in-structure. Work-in-progress, internal machinery that the writer may wish to refactor at their leisure, whatever.
You could write that warning up in the documentation, but it's probably better as a structural affordance in the code, because then it's not just enforceable, but also lends itself to automation.
For my interests (DPDK, NIC, low level storage I/O etc.) I have high hopes for Zig. And for reasons other have explicated over the last year on HN, I think it'll work much better than Rust.
Readers should have realistic expectations. It's not substantially downhill to write C like code. I ran into this bug right off the bat. TL/DL: Zig's linker does not pull in dynamic libs. It finds static libs but alas the .tsk doesn't give expected behavior anyway:
What are the downsides of keeping the public fields in a struct which is in turn embedded in a private struct created by the library? Inferring the "outer offset" could get irritating but it depends on your tolerance of additional complexity (in every public function...).
> What are the downsides of keeping the public fields in a struct which is in turn embedded in a private struct created by the library?
That seems similar to using "opaque {}", if I am understanding you correctly?
I think either of those solutions could potentially work, at the cost of one explicit cast in each function that actually needs to access the data. I would love to know if this works well enough in practice.
The answer in C would be it constrains ordering/packing of fields, so you'd end up with N public/private wrappers and generally a lot of line noise, but I have a vague recollection that zig reorders fields as it sees fit anyway making that a non-issue
> Speaking of structure fields, they're always public. Structures and functions are private by default with an option to make them public. But struct fields can only be public. The recommendation is to document allowed/proper usage of each field.
> I don't want to editorialize this post too much, but it's already caused the type of issues that you'd expect, and I think it'll only cause more difficulties in a 1.x world.
I wish the author elaborated a bit more here because I've always been interested in what problem exactly `private` (and all the language complexity/overhead) that comes with it solves. for something like C# it kind of makes sense because it's a language built around chaining method calls and IntelliSense, so the user types `foo.` and IntelliSense shows literally everything you could want to do with that `Foo` instance, excluding `private` members/methods. I can see the use-case for this in a corporate-type setting. in absence of this feature (which I believe is the case for Zig?), what possible difficulties can be caused, in practice?
Regarding naming conventions, there is an issue about switching to snake case for functions as well, the main rationale being there is no difference between snake_case and camelCase for one word identifiers:
Shout out to llvm for sticking with camelCase while the c++ library is all snake_case. Turns out eventually one gets used to reading mixtures of them and only occasionally writes dense_map or unorderedMap, but I still wish it wasn't like this.
edit: I write large patches in the one true style and then begrudgingly fix them up for review using ad hoc emacs macros but it looks like clang-tidy can automate that. Wonder if it's robust enough to bidirectionally convert between review acceptable and legible
I don't usually like participating in bikeshedding, but the @ annotation feels like PHP's $ to me in that I don't see why it needs to exist. The language design could've easily just left that out and I don't think anything would be lost.
Other than that, I'm definitely excited for Zig as a potential C++ replacement.
Built-in names are essentially reserved words, and there are dozens of them. The @ prefix ensures you don't step on user's variable names, and that you can add new built-ins without making breaking changes.
not really, those are two modules that are always available to you, but you still have to import them like any other Zig module `const builtin = @import("builtin");`
My belief is that it is about builtin functions that are provided by the compiler versus part of the standard library. They are documented in the language reference [0] versus in the standard library documentation [1].
Although namespacing them keeps them out from under a programmer's feet, which is a significant benefit, it does seem like this would make it harder to find stuff.
@cmpxchgStrong, @wasmMemorySize and @embedFile are completely unrelated, but since they're all builtins they're neighbours.
This is sort of an issue we already deal with in other languages, and imo it's not a huge deal in those. Personally I find @ more reasonable than __builtin_
one side benefit is that at lot of @ code is "dangerous shit" so it draws your eye during code review. You will want to code review the "dangerous shit" that GPT-5 gives you.
This example seemed to stop just short of the really interesting bit: what if other.zig called Tea.drink()? Would it set Tea.full to false? Maybe this is obvious to Zig users, but coming from C++, that would be a violation of const correctness.
In C++ (thinking of classes rather than files), you wouldn't be able to call the drink method of a const object because it's not marked as a const method. Or, if it was marked as a const method, you wouldn't be able to modify full from in drink. You could get around this by marking full as mutable, which means you're going to deliberately violate const correctnees, but at least you have to be explicit about it. (In theory mutable is meant for things like caches that don't affect the visible behaviour of the class.)
To be clear: you would first need to create an instance of `Tea`.
var t: Tea = .{};
t.drink(); // will work and set t.full to false
If instead we declared `t` as `const`, then yes the call to `drink` would not have been possible.
`const Self = @This();` just binds the top-level struct type definition to a name, which then allows you to refer to it in other plances. Nothing more than that.
In this example, `foo` and `bar` cannot modify `self`, while `baz` can. In the case of `foo`, `self` is passed in by value, but since function arguments are immutable in Zig, it cannot be modified (the compiler might still opt for pass-by-reference under the hood, but the semantics don't change).
In the case of `bar` we are explicitly asking for a constant pointer, as you mentioned.
I spent a lot of this weekend learning Zig, and this was the most surprising this for me. foo, bar, and baz are all called the same way (as thing.foo() or thing.baz()), but depending on the type signature, the compiler can figure out whether you want a mutable or immutable reference to the object.
It’s pretty common for languages in the field e.g. C++ (const versus non-const methods), Rust (value, unique reference, shared reference, deref), and even Go (value v pointer, delegation).
The “method syntax” is pretty much just a convenience, so why not also handle that after all?
Interesting, thanks. In that case, what if `Tea` was not const? Could I assign a different type to it? What would the static type of an instance of Tea be, if its runtime value could be two incompatible types?
1. You can use a type variable as a normal one, as long as you do it at compile time (a variable of type type is required to be comptime). You can use a TypeInfo at runtime if you like though.
2. The type system is deep enough to describe the type of values, but not the type of types. That's what higher kinded types are for, and for better or for worse that's not part of zig. Also, type variables can only exist at compile time, so there is no runtime value.
No, it won't work. It'll say "actual" is undeclared.
https://github.com/ziglang/zig/issues/4437 is tracking the issue. I think std.testing is going to see some major changes, so it's possible this gets fixed. But going from the responses in that issue, I think that fix might be more of a side effect, since there doesn't seem to be too much sympathy for the issue as-is.
Coming from C, I always see these "Self" variables/types as counter-intuitive, and contribute to those cases where a function requires a parameter which I don't have to supply (just for these functions accepting "Self"), which could be an exception instead of a rule. I would rather prefer the C++ "this" keyword usage.
Also, do "Self" kind-of variables occupy memory?
How can I declare a (packed) struct describing a payload in a way that the struct can be then sent "as is" as a network packet?
Let's say this struct has a "calculateChecksum()" method. Will I need to declare "Self"? If so, will be "Self" part of the struct's memory layout?
There's nothing special about the 'Self' (it's just a type).
In general, consts in Zig are purely compile time things and don't take up memory.
And putting a function inside a struct which takes a pointer to its struct type as first argument just allows method-call-syntax-sugar, but you can also write it as regular function (which must then be namespaced with the struct type though):
const Bla = struct {
const Self = @This();
val: i32 = 0,
fn add(self: *Self, val: i32) void {
self.val += val;
}
};
pub fn main() void {
var bla = Bla{};
// with method-call syntax sugar
bla.add(2);
// without method-call syntax sugar
Bla.add(&bla, 3);
}
...but in this case it's probably better to not use Self and @This() (it mostly makes sense with generics).
> Will I need to declare "Self"? If so, will be "Self" part of the struct's memory layout?
I'm only familiar with C++, not Zig, but it doesn't seem that different to me. If a C++ class includes a typedef, then that doesn't contribute to the class's footprint.
Even the explicit self parameter is not so different. In C++, a method can be marked as const or even && (rvalue reference) and that's applied to the implicit this parameter. If anything, it would be clearer if this was an explicit parameter; as of C++23, it actually is allowed to be [1], called "deducing this".
I think Rust does this quite well. There are a few handy rules.
1. If the first argument of a function is `self`, then that function is treated as a method and can be called like `obj.meth(args)`. Functions without `self` must be called statically, which in the case of an object would look like `Type::meth(obj, args)`. The second form is used occasionally by common wrapper types like Box and Rc to avoid dereferencing “piercing the veil” and calling a method on the wrapped object (Rust has no separate `obj->meth` syntax, which IMO is its own issue).
2. Having access to `Self` lets you specify the types of non-this arguments as well as return types, like `fn add(self, other: Self) -> Self {...}`. You can also use it as the name of the type when (de)structing objects, e.g., `let Self { fields } = other` or `return Self { fields }`. Writing out the name of the current type each time could get unwieldy.
3. `Self` as a namespace is handy when dealing with enums, e.g., `match self { Self::First => ..., Self::Second => ...}` or static functions like `Self::func(args)`. It's crucial in trait implementations when referring to associated types, e.g., in Iterator there's `type Item; fn next(&mut self) -> Option<Self::Item>;`
I don't why a Self variable would be placed on the stack differently from non-Self variables; method calls have always just been syntactic sugar + namespacing for a free function, no?
> method calls have always just been syntactic sugar + namespacing for a free function, no?
Not quite; they're more than that for languages that have dynamic dispatch, like virtual methods in C++ or dyn impl in Rust. In that case, that argument affects which function is called. In the case of multiple inheritence (in C++), the implicit this parameter is also potentially subject to a pointer offset; if it's virtual multiple inheritence then that offset may even be computed at runtime.
Some parts of the Zig syntax reminds me of the old K&R style function declaration, that seemed absurd and was replaced at some point, but it was released, used in production and defended for a while.
There is often a strong rationale behind clumsy syntax decisions, I hope they will fix those quirks.
Just why? It does not seem like the same language that used to treat tabs as a compiler error. Looks like the worst of every worlds to me. Why not just stick to one style?
I don't see what's the problem here. That style has the advantage of being able to look at any identifier and understand whether it is an fn, a type or a variable.
I actually use this approach in my style guides for my teams writing js too.
Once you get used to it, it's quite pleasant.
In js you run into some grey areas, like function type callable variables that look weird, but still make sense when named well: click_handler() read_complete_callback()
The notation is very "compressed", that's for sure.
On the other hand, manipulating sequences of data and pointers of various kinds is one of the main things you do when programming, and concise notation (especially in a low level programming language) is akin to how mathematicians have concise notation for the most common objects they manipulate in their formulas.
Arrays vs slices seem to be pretty much the same as in Go, and the notation is completely straightforward: "ARRAY 100 OF INTEGER", except instead of "ARRAY" and "OF", the "[" and "]" are used — type constructor goes first, before its arguments, as it should be.
As for pointer types, they look reasonable but can't say much more.
One correction: Go slices also own their memory, so their equivalent would be Zig's `std.ArrayList`. Slices in Zig are just ptr+len, so you will have to manage the underlying memory separately. This makes sense for Zig since it's a lower-level language than Go.
Oh yes, I've had some quite hilarious (read: infuriating to debug) bugs because of that behaviour.
I imagine that's why almost everyone just use slices exclusively: you rarely if ever see "[SomeConstant]whatever { ... }" or even "[...]whatever { ... }" in Go codebases, it's almost always just "[]whatever { ... }": such literal slices have copy-on-append behaviour. And the syntax really nudges you into it which is nice.
> such literal slices have copy-on-append behaviour.
Only because literal slices default to cap == len (and a buffer of the same size). But if you create a sub-slice then all bets are off. That’s why some folks recommend always using a “full slice” if the result escapes:
A[x:y:y]
This way the cap() is set to the same value as the len(), and append will always trigger a copy. Essentially a cheaper (but riskier) version of a defensive copy.
it's difficult, but they are there for a reason. Working through them helped me appreciate all of the different options that exist (and yes sentinel termination is sometimes better: https://lemire.me/blog/2020/09/03/sentinels-can-be-faster/)
If there's one thing that's a nightmare about how zig handles these is that sometimes coercion between array types is a bit hidden and the rules are not obvious, though their application in code is usually straightforward and you don't have to think about it too much (AFAICT it doesn't let you do the wrong thing)
Languages tend to avoid having parsing ambiguities and syntaxes that require unbounded lookahead, because things like that make parsers slower and/or more complex. This is a pain not just for the compiler, but also syntax highlighting, IDEs and other tooling. Languages also need to consider future syntax extensions and ability to give good error messages for syntax errors. This usually requires some redundancy and extra sigils or keywords.
Here `full = true` is a valid expression syntax, and it'd be problematic if `{` could start a block of code here. This case may be parseable unambiguously thanks to `Tea` ident in the front, but e.g. if Zig ever wanted to add something like Swift's final closure syntax, then `expr { expr }` could become valid, and this would be ambiguous.
In Rust `if struct {}` is a parsing edge case. JS has ambiguous `{field: value}` objects and `{label: code}` blocks. CSS struggles to add nested rules, because syntaxes of `selector { property:value` and `selector { selector:selector` overlap, and that requires slower and/or more complicated parsers.
> Here `full = true` is a valid expression syntax, and it'd be problematic if `{` could start a block of code here.
Golang has almost this exact problem in its grammar but in practice they manage just fine even though there are indeed some edge cases where parser gets confused:
> A parsing ambiguity arises when a composite literal using the TypeName form of the LiteralType appears as an operand between the keyword and the opening brace of the block of an "if", "for", or "switch" statement, and the composite literal is not enclosed in parentheses, square brackets, or curly braces. In this rare case, the opening brace of the literal is erroneously parsed as the one introducing the block of statements. To resolve the ambiguity, the composite literal must appear within parentheses.
if x == (T{a,b,c}[i]) { … }
if (x == T{a,b,c}[i]) { … }
Yep, I also noticed that Zig just does what C does... coming from other C-like languages, like Java, JS, Go, though, which don't do that, I was also surprised.
In my compiler I loosely coerce comptime_int's to int's if needed by subsequent API's.
You also need to expand or truncate it's size on demand. E.g. C compilers do that.
Correct me if I'm wrong, but it really seems like bad const inference that `x` from the 8th example is typed as a `comptime_int`. Is all comptime inference only done at the point of declaration?
Self is used in the standard library for non generic types every now and again.
As for public fields, the 0.9 change to allocator comes to mind. This was written off as a minor change, because Zig is pre 1.0. But what I haven't heard (and 100% upfront, I haven't looked hard), is how Zig plans on dealing with it post 1.0. Like, where are we in the gradient of: "1.0 will never break the public API (struct fields included)" to "deal with it".
1.0 for Zig is going to mean not only never break the API, but also that we'll be pretty much done with the language. The current idea is to release v1 only after a few releases that only contain bug fixes.
> As for public fields, the 0.9 change to allocator comes to mind.
That's a good example of why it makes sense for Zig to have all fields be public: the change that we did to allocators has important implications that in Zig we don't want to hide behind a `private` descriptor. This is obviously not a universal truth, but it makes sense for a low-level programming language that cares about the details.
> Self is used in the standard library for non generic types every now and again.
I agree with the parent poster, those should probably be removed, as they don't really help make the code more readable or provide any other advantage. A overhaul of the stdlib is planned, but we're not there yet, as we're still busy working on the package manager and incremental compilation.
I’d like to chime in on the all-fields-are-public debate.
If I’m a library author and release something which exports a struct… I can never change any of the struct’s internal data structures? I can never rename a field. Or delete a field. Or change what should be an internal field’s type. And that makes non-breaking changes as a library author much more challenging, which may be a very serious problem for Zig, especially if the stdlib isn’t Go-style “batteries included”. I would seemingly need to version the structs to work around this: ArrayListV1, ArrayListV2…
Part of the beauty of great code is in its API design. And forcing my libs to document:
/// Don’t touch this field.
Without any way for me to hide the field itself from users or prevent them from using it is very strange. Users can’t intuit what to use and not use without studying documentation, which a well-thought out API might improve.
Go has been extremely successful in part due to its nicely thought out APIs in the stdlib. They have certain fields which anyone would care to use, and all the rest are—for your purpose as a user—not there. And this same design has made refactors of libraries/packages (without breaking changes for users) very easy.
When I first played with Zig (and I haven't done more than play), I had a similar view. I suspect that the adoption of Zig in certain current C niches may be hampered by the inability to express things like the opaque pointer idiom, mandated by coding standards like CERT C and MISRA[ble] C.
I now think that an external tool is the way to go. C++ has field access control, but for some projects, that's not enough, so they have an external layer of access control, like Chromium DEPS (sample: https://source.chromium.org/chromium/chromium/src/+/main:ui/... )
And use cases of libraries are different. Perhaps most uses of library X use it as a black box and should never directly access structure fields, but project X′ integrates closely with X. That's hard to express in the language since it's hard to express purely within X at all. Maybe you need a `foo.@i_know_this_is_private_but_let_me_use_it_anyway(bar)`, like Python's underscore convention.
Or you have an external static analyzer using some sort of allow and deny rules. Maybe library X declares `Deny *` but project X′ overrides it with an `Allow X.foo.bar`. The rules can be explicit, automatically enforced, and subject to code review, so it's OK.
It's impossible to state universally what should or should not be considered part of the public api of a struct when it comes to low-level programming.
In some cases the exact layout of the struct needs to stay stable. In others the contents don't matter, only that the size doesn't change. In others still, none of those properties matter and it's just a matter of keeping names and types stable, and that every new field that gets added has a default value so that old initialization expressions keep working.
This is just to name a few cases among many others. In the end you will need to specify in the docs what it is that you, the library author, want to consider part of the public API and what is considered an implementation detail.
> If I’m a library author and release something which exports a struct… I can never change any of the struct’s internal data structures?
Stucts in most modern languages are NOT guaranteed to have any particular layout unless you explicitly ask for it--so you don't have that guarantee to begin with. This enables them to rearrange the struct for better packing or AoS/SoA transformations.
If someone grovels in your structs and gets burned, well, that's their own fault. I'll go further, if someone grovels in a library struct, they have failed. I'll go even further, if someone has to grovel in a library struct, the library has failed.
In addition, in embedded contexts I regularly have to grovel in structs to work out some bug. "static" is often the bane of my existence in C because compilers often throw away or hide references to those variables. However, they may be crucial to running down state.
I work in a language where all fields are public and it is indispensable in debugging things. You DO NOT want your datatypes hiding things under the hood.
At best, if zig does want to make fields private, they should still be visible, but the compiler should disallow writing to them.
I believe you, but I haven't encountered anything like this in my work in a higher-level language, and this is my first push into manual memory management since I studied C in college many years ago.
Can you offer some examples of times when it has been indispensable, so that I can understand when this is useful?
don't remember any off the top of my head. I use elixir, I do often dive into the data structures with IO.inspect(..., structs: false) which exposes all of the "hidden" fields. Enough so that I know how to do this without having to look it up.
I suppose you could also use the debugger, but I value my time.
I see this desire to get rid of shadowing all the time, but in practice it's such a disruptive restriction.