The problem I see is not so much in how you generate the code. It is about how to maintain the code. If you check in the AI generated code unchanged then do you start changing that code by hand later? Do you trust that in the future AI can fix bugs in your code. Or do you clean up the AI generated code first?
I've never worked in web development, where it seems to me the majority of LLM coding assistants are deployed.
I work on safety critical and life sustaining software and hardware. That's the perspective I have on the world. One question that comes up is "why does it take so long to design and build these systems?" For me, the answer is: that's how long it takes humans to reach a sufficient level of understanding of what they're doing. That's when we ship: when we can provide objective evidence that the systems we've built are safe and effective. These systems we build, which are complex, have to interact with the real world, which is messy and far more complicated.
Writing more code means that's more complexity for humans (note the plurality) to understand. Hiring more people means that's more people who need to understand how the systems work. Want to pull in the schedule? That means humans have to understand in less time. Want to use Agile or this coding tool or that editor or this framework? Fine, these tools might make certain tasks a little easier, but none of that is going to remove the requirement that humans need to understand complex systems before they will work in the real world.
So then we come to LLMs. It's another episode of "finally, we can get these pesky engineers and their time wasting out of the loop". Maybe one day. But we are far from that today. What matters today is still how well do human engineers understand what they're doing. Are you using LLMs to help engineers better understand what they are building? Good. If that's the case you'll probably build more robust systems, and you _might_ even ship faster.
Are you trying to use LLMs to fool yourself into thinking this still isn't the game of humans needing to understand what's going on? "Let's offload some of the understanding of how these systems work onto the AI so we can save time and money". Then I think we're in trouble.
> Are you trying to use LLMs to fool yourself into thinking this still isn't the game of humans needing to understand what's going on?
This is a key question. If you look at all the anti-AI stuff around software engineering, the pervading sentiment is “this will never be a senior engineer”. Setting aside the possibility of future models actually bridging this gap (this would be AGI), let’s accept this as true.
You don’t need an LLM to be a senior engineer to be an effective tool, though. If an LLM can turn your design into concrete code more quickly than you could, that gives you more time to reason over the design, the potential side effects, etc. If you use the LLM well, it allows you to give more time to the things the LLM can’t do well.
Fully agree. In my own usage of AI (which I came to a bit late but have tried to fully embrace so I know what it can and can't do) I've noticed a very unusual side effect: I spend way more of my time documenting and reviewing designs than I used to, and that has been a big positive. I've always been very (maybe too) thoughtful about design and architecture, but I usually focused on high-level design and then would get to some coding as a way of evaluating/testing my designs. I could then throw away v0 using lessons learned and start a v1 on a solid track. Now however, I find myself able to get a lot further in nailing down the design to the point I don't have to build and throw away v0. The prototype is often highly salvageable with the help of the LLM doing the refactoring/iterating that used to make "starting over" a more optimal path. That in turn allows me to maintain the context and velocity of the design much better since there aren't days, or weeks, or even months between the "lessons learned" that then have to go back and revise the design.
The caveat here though, is if I didn't have the decades of experience writing/designing software by hand, I don't think I'd have the skills needed to reap the above benefit.
" They make it easier to explore ideas, to set things up, to translate intent into code across many specialized languages. But the real capability—our ability to respond to change—comes not from how fast we can produce code, but from how deeply we understand the system we are shaping. Tools keep getting smarter. The nature of learning loop stays the same."
Learning happens when your ideas break, when code fails, unexpected things happen. And in order to have that in a coding agent you need to provide a sensitive skin, which is made of tests, they provide pain feedback to the agent. Inside a good test harness the agent can't break things, it moves in a safe space with greater efficiency than before. So it was the environment providing us with understanding all alone, and we should make an environment where AI can understand what are the effects of its actions.
maybe. I think we're really just starting this, and I suspect that trying to fuse neural networks with symbolic logic is a really interesting direction to try to explore.
that's kind of not what we're talking about. a pretty large fraction of the community thinks programming is stone cold over because we can talk to an LLM
and have it spit out some code that eventually compiles.
personally I think there will be a huge shift in the way things are done. it just won't look like Claude.
I don't know why you're being downvoted, I think you're right.
I think LLMs need different coding languages, ones that emphasise correctness and formal methods. I think we'll develop specific languages for using LLMs with that work better for this task.
Of course, training an LLM to use it then becomes a chicken/egg problem, but I don't think that's insurmountable.
I don't think "understanding" should be the criteria, you can't commit your eyes in the PR. What you can commit is a test that enforces that understanding programatically. And we can do many many more tests now than before. You just need to ensure testing is deep and well designed.
I suspect that we are going to have a wave of gurus who show up soon to teach us how to code with LLMs. There’s so much doom and gloom in these sorts of threads about the death of quality code that someone is going to make money telling people how to avoid that problem.
The scenario you describe is a legitimate concern if you’re checking in AI generated code with minimal oversight. In fact I’d say it’s inevitable if you don’t maintain strict quality control. But that’s always the case, which is why code review is a thing. Likewise you can use LLMs without just checking in garbage.
The way I’ve used LLMs for coding so far is to give instructions and then iterate on the result (manually or with further instructions) until it meets my quality standards. It’s definitely slower than just checking in the first working thing the LLM churns out, but it’s sill been faster than doing it myself, I understand it exactly as well because I have to in order to give instructions (design) and iterate.
My favorite definition of “legacy code” is “code that is not tested” because no matter who writes code, it turns into a minefield quickly if it doesn’t have tests.
How do you know that it's actually faster than if you'd just written it yourself? I think the review and iteration part _is_ the work, and the fact that you started from something generated by an LLM doesn't actually speed things up. The research that I've seen also generally backs this idea up -- LLMs _feel_ very fast because code is being generated quickly, but they haven't actually done any of the work.
Because I’ve been a software engineer for over 20 years. If I look at a feature and feel like it will take me a day and an LLM churns it out in a hour including the iterating, I’m confident that using the LLM was meaningfully faster. Especially since engineers (including me) are notoriously bad at accurate estimation and things usually take at least twice as long as they estimate.
I have tested throwing several features at an LLM lately and I have no doubt that I’m significantly faster when using an LLM. My experience matches what Antirez describes. This doesn’t make me 10x faster, mostly because so much of my job is not coding. But in term of raw coding, I can believe it’s close to 10x.
> I know exactly what the result should be, the LLM is just typing it for me.
This is the mental model people should be working with. The LLM is there to tighten the loop from thought to code. You doing need to test it like an engineer. You just need to use it to make you more efficient.
It so happens that you *can^ give an LLM half-baked thoughts and it will sometimes still do a good job because the right thing is so straightforward. But in general the more vague and unclear your own thoughts, the lower quality the results, necessitating more iterations to refine.
> My favorite definition of “legacy code” is “code that is not tested” because no matter who writes code, it turns into a minefield quickly if it doesn’t have tests.
Unfortunately, "tests" don't do it, they have to be "good tests". I know, because I work on a codebase that has a lot of tests and some modules have good tests and some might as well not have tests because the tests just tell you that you changed something.
> My favorite definition of “legacy code” is “code that is not tested” because no matter who writes code, it turns into a minefield quickly if it doesn’t have tests.
On the contrary, legacy code has, by definition, been battle tested in production. I would amend the definition slightly to:
“Legacy code is code that is difficult to change.”
Lacking tests is one common reason why this could be, but not the only possible reason.
It’s from Working Effectively with Legacy Code. I don’t recall the exact definition but it’s something to that effect. Legacy = lack of automated tests.
The biggest barrier to changing code is usually insufficient automated testing. People are terrified of changing code when they can’t verify the results before breaking production.
More glibly legacy code is “any code I don’t want to deal with”. I’ve seen code written 1 year prior officially declared “legacy” because new coding standards were being put in place and no one wanted to update the old code to match.
I think it was Cory Doctorow who compared AI-generated code to asbestos.
Back in its day,
asbestos was in everything,
because of how useful it seemed.
Fast forward decades and now asbestos abatement is a hugely expensive and time-consuming requirement for any remodeling or teardown project.
Lead paint has some of the same history.
I see where you're coming from, and I agree with the implication that this is more of an issue for inexperienced devs. Having said that, I'd push back a bit on the "legacy" characterization.
For me, if I check in LLM-generated code, it means I've signed off on the final revision and feel comfortable maintaining it to a similar degree as though it were fully hand-written. I may not know every character as intimately as that of code I'd finished writing by hand a day ago, but it shouldn't be any more "legacy" to me than code I wrote by hand a year ago.
It's a bit of a meme that AI code is somehow an incomprehensible black box, but if that is ever the case, it's a failure of the user, not the tool. At the end of the day, a human needs to take responsibility for any code that ends up in a product. You can't just ship something that people will depend on not to harm them without any human ever having had the slightest idea of what it does under the hood.
Some of those words appear in my comment, but not in the way you're implying I used them.
My argument was that 1) LLM output isn't inherently "legacy" unless vibe coded, and 2) one should not vibe code software that others depend on to remain stable and secure. Your response about "abandonware" is a non sequitur.
I presume that through some process one can exorcise the legacy/vibe-codiness away. Perhaps code review of every line? (This would imply that the bottleneck to LLM output is human code review.) Or would having the LLM demonstrate correctness via generated tests be sufficient?
Just to clarify, you're inferring several things that I didn't say:
* I was agreeing with you that all vibe code is effectively legacy, but obviously not all legacy code is vibe code. Part of my point is also that not all LLM code is vibe code.
* I didn't comment on the dependability of legacy code, but I don't believe that strict vibe code should ever be depended on in principle.
As far as non-vibe coding with LLMs, I'd definitely suggest some level of human review and participation in the overall structure/organization. Even if the developer hasn't pored through it line by line, they should have signed off on the tech stack/dependencies/architecture and have some idea of what the file layout and internal modules/interfaces look like. If a major bug is ever discovered, the developer should know enough to confidently code review the fix or implement it by hand if necessary.
Take responsibility by leaving a good documentation of your code and a beefy set of tests, future agents and humans will have a point to bootstrap from, not just plain code.
Depends on what you do. When I'm using LLMs to generate code for projects I need to maintain (basically, everything non-throw-away-once-used), I treat it as any other code I'd write, tightly controlled with a focus on simplicity and well-thought out abstractions, and automated testing that verify what needs to be working. Nothing gets "merged" into the code without extensive review, and me understanding the full scope of the change.
So with that, I can change the code by hand afterwards or continue with LLMs, it makes no difference, because it's essentially the same process as if I had someone follow the ideas I describe, and then later they come back with a PR. I think probably this comes naturally to senior programmers and those who had a taste of management and similar positions, but if you haven't reviewed other's code before, I'm not sure how well this process can actually work.
At least for me, I manage to produce code I can maintain, and seemingly others to, and they don't devolve into hairballs/spaghetti. But again, requires reviewing absolutely every line and constantly edit/improve.
We recently got a PR from somebody adding a new feature and the person said he doesn't know $LANG but used AI.
The problem is, that code would require a massive amount of cleanup. I took a brief look and some code was in the wrong place. There were coding style issues, etc.
In my experience, the easy part is getting something that works for 99%. The hard part is getting the architecture right, all of the interfaces and making sure there are no corner cases that get the wrong results.
I'm sure AI can easily get to the 99%, but does it help with the rest?
> I'm sure AI can easily get to the 99%, but does it help with the rest?
Yes the AI can help with 100% is it. But the operator of the AI needs to be able to articulate this to the AI .
I've been in this position, where I had no choice but to use AI to write code to fix bugs in another party's codebase, then PR the changes back to the codebase owners. In this case it was vendor software that we rely on which the vendor hadn't fixed critical bugs in yet. And exactly as you described, my PR ultimately got rejected because even though it fixed the bugs in the immediate sense, it presented other issues due to not integrating with the external frameworks the vendor used for their dev processes. At which point it was just easier for the vendor to fix the software their way instead of accept my PR. But the point is that I could have made the PR correct in the first place, if I as the AI operator had the knowledge needed to articulate these more detailed and nuanced requirements to the AI. Since I didn't have this information then the AI generated code that worked but didn't meet the vendors spec. This type of situation is incredibly easy to fall into and is a good example of why you still need a human at the wheel on projects to set the guidance but you don't necessarily need the human to be writing every line of code.
I don't like the situation much but this is the reality of it. We're basically just code reviewers for AI now
Yeah, so what I'm mostly doing, and advocate for others to do, is basically the pure opposite of that.
Focus on architecture, interfaces, corner-cases, edge-cases and tradeoffs first, and then the details within that won't matter so much anymore. The design/architecture is the hard part, so focus on that first and foremost, and review + throw away bad ideas mercilessly.
Yes it does... but only in the hands of an expert who knows what they are doing.
I'd treat PRs like that as proof of concepts that the thing that can be done, but I'd be surprised if they often produced code that should be directly landed.
In the hands of an expert… right. So is it not incredibly irresponsible to release these tools into the wild, and expose it those who are not experts? They will actually become incredibly worse off. Ironically this does not ‘democratise’ intelligence at all - the gap widens between experts and the rest.
I sometimes wonder what would have happened if OpenAI had built GPT3 and then GPT-4 and NOT released them to the world, on the basis that they were too dangerous for regular people to use.
That nearly happened - it's why OpenAI didn't release open weight models past GPT2, and it's why Google didn't release anything useful built on Transformers despite having invented the architecture.
If we lived in the world today, LLMs would be available only to a small, elite and impossibly well funded class of people. Google and OpenAI would solely get to decide who could explore this new world with them.
With all due respect I don’t care about an acceleration in writing code - I’m more interested in incremental positive economic impact. To date I haven’t seen anything convince me that this technology will yield this.
Producing more code doesn’t overcome the lack of imagination, creativity and so on to figure out what projects resources should be invested in. This has always been an issue that will compound at firms like Google who have an expansive graveyard of projects laid to rest.
In fact, in a perverse way, all this ‘intelligence’ can exist. At the same time humans can get worse in their ability to make judgments in investment decisions.
You mean the net benefit in widespread access to LLMs?
I get the impression there's no answer here that would satisfy you, but personally I'm excited about regular people being able to automate tedious things in their lives without having to spend 6+
months learning to program first.
And being able to enrich their lives with access to as much world knowledge as possible via a system that can translate that knowledge into whatever language and terminology makes the most sense to them.
The average person already automates a lot of things in their day to day lives. They spend far less time doing the dishes, laundry, and cleaning because parts of those tasks have been mechanized and automated. I think LLMs probably automate the wrong thing for the average person (i.e., I still have to load the laundry machine and fold the laundry after) but automation has saved the average person a lot of time
For example, my friend doesn’t know programming but his job involves some tedious spreadsheet operations. He was able to use an LLM to generate a Python script to automate part of this work. Saving about 30 min/day. He didn’t review the code at all, but he did review the output to the spreadsheet and that’s all that matters.
His workplace has no one with programming skills, this is automation that would never have happened. Of course it’s not exactly replacing a human or anything. I suppose he could have hired someone to write the script but he never really thought to do that.
A work colleague had a tedious operation involving manually joining a bunch of video segments together in a predictable pattern. Took them a full working day.
They used "just" ChatGPT on the web to write an automation. Now the same process takes ~5 minutes of work. Select the correct video segments, click one button to run script.
The actual processing still takes time, but they don't need to stand there watching it progress so they can start the second job.
And this was a 100% non-tecnical marketing person with no programming skills past Excel formulas.
My favorite anecdotal story here is that a couple of years ago I was attending a training session at a fire station and the fire chief happened to mention that he had spent the past two days manually migrating contact details from one CRM to another.
I do not want the chief of a fire station losing two days of work to something that could be scripted!
I don't want my doctor to vibe script some conversion only to realize weeks or months later it made a subtle error in my prescription.
I want both of them to have enough fund to hire someone to do it properly.
But wanting is not enough unfortunately...
Humans make subtle errors all the time too though. AI results still need to be checked over for anything important, but it's on a vector toward being much more reliable than a human for any kind of repetitive task.
Currently, if you ask an LLM to do something small and self-contained like solve leetcode problems or implement specific algorithms, they will have a much lower rate of mistakes, in terms of implementing the actual code, than an experienced human engineer. The things it does badly are more about architecture, organization, style, and taste.
But with a software bug, the error becomes rapidly widespread and systematic, whereas human error are often not. Doing wrong with a couple of prescription because the doc worked for 12+ hrs is different from systematically doing wrong on a significant number of prescriptions until someone double check the results.
I agree with the excel thing.
Not with thinking it can't happen with vibecoded python.
I think handling sensitive data should be done by professional.
A lawyer handles contracts, a doctor handles health issue and a programmer handles data manipulation through programs.
This doesn't remove risk of errors completely, but it reduces it significantly.
In my home, it's me who's impacted if I screw up a fix in my plumbing, but I won't try to do it at work or in my child's school.
I don't care if my doctor vibe codes an app to manipulate their holidays pictures, I care if they do it to manipulate my health or personal data.
Of course issues CAN happen with Python, but at least with Python we have tools to check for the issues.
Bunch of your personal data is most likely going through some Excel made by a now-retired office worker somewhere 15 years ago. Nobody understands how the sheet works, but it works so they keep using it :) A replacement system (a massive SaaS application) has been "coming soon" for 8 years and cost millions, but it still doesn't work as well as the Excel sheet.
You can apply the same logic to all technologies, including programming languages, HTTP, cryptography, cameras, etc. Who should decide what's a responsible use?
I'm curious about the economic aspects of this. If only experts can use such tools effectively, how big will the total market be and does that warrant the investments?
For companies, if these tools make experts even more special, then experts may get more power certainly when it comes to salary.
So the productively benefits of AI have to be pretty high to overcome this. Does AI make an expert twice as productive?
Coding style can be deterministically checked for, and should be checked, automatically during linting. And no PR should get a single human pair of eyes, except for the author, looking at it until all CI checks have passed.
Many many other stylistic choices and code complexity can be automatically checked, why aren't you doing it?
> We recently got a PR from somebody adding a new feature and the person said he doesn't know $LANG but used AI.
"Oh, and check it out: I'm a bloody genius now! Estás usando este software de traducción in forma incorrecta. Por favor, consulta el manual. I don't even know what I just said, but I can find out!"
I think we will find out that certain languages, frameworks and libraries are easier for AI to get all the way correct. We may even have to design new languages, frameworks and libraries to realize the full promise of AI. But as the ecosystem around AI evolves I think these issues will be solved.
I think you're making a mistake if your reviews are just that you trust that your co-workers never make a mistake. I make mistakes. My co-workers make mistakes. Everybody makes mistakes, that's why we have code reviews.
The solution for low performers is very close oversight. If you imagine an LLM as a very junior engineer who needs an inordinate amount of hand holding (but who can also read and write about 1000x faster than you and who gets paid approximately nothing), you can get a lot of useful work out of it.
A lot of the criticisms of AI coding seem to come from people who think that the only way to use AI is to treat it as a peer. “Code this up and commit to main” is probably a workable model for throwaway projects. It’s not workable for long term projects, at least not currently.
A Junior programmer is a total waste of time if they don't learn. I don't help Juniors because it is an effective use of my time, but because there is hope that they'll learn and become Seniors. It is a long term investment. LLMs are not.
It’s a metaphor. With enough oversight, a qualified engineer can get good results out of an underperforming (or extremely junior) engineer. With a junior engineer, you give the oversight to help them grow. With an underperforming engineer you hope they grow quickly or you eventually terminate their employment because it’s a poor time trade off.
The trade off with an LLM is different. It’s not actually a junior or underperforming engineer. It’s far faster at churning out code than even the best engineers. It can read code far faster. It writes tests more consistently than most engineers (in my experience). It is surprisingly good at catching edge cases. With a junior engineer, you drag down your own performance to improve theirs and you’re often trading off short term benefits vs long term. With an LLM, your net performance goes up because it’s augmenting you with its own strengths.
As an engineer, it will never reach senior level (though future models might). But as a tool, it can enable you to do more.
> It writes tests more consistently than most engineers (in my experience)
I'm going to nit on this specifically. I firmly believe anyone that genuinely believes this either never writes tests that actually matter, or doesn't review the tests that an LLM throws out there. I've seen so many cases of people saying 'look at all these valid tests our LLM of choice wrote' only for half of them to do nothing and half of them misleading as to what it actually tests.
It’s like anything else, you’ve got to check the results and potentially push it to fix stuff.
I recently had AI code up a feature that was essentially text manipulation. There were existing tests to show it how to write effective tests and it did a great job of covering the new functionality. My feedback to the AI was mostly around some inaccurate comments it made in the code but the coverage was solid. Would have actually been faster for me to fix but I’m experimenting with how much I can make the AI do.
On the other hand I had AI code up another feature in a different code base and it produced a bunch of tests with little actual validation. It basically invoked the new functionality with a good spectrum of arguments but then just validated that the code didn’t throw. And in one case it tested something that diverged slightly from how the code would actually be invoked. In that case I told it how to validate what the functionality was actually doing and how to make the one test more representative. In the end it was good coverage with a small amount of work.
For people who don’t usually test or care bunch about testing, yeah, they probably let the AI create garbage tests.
I don't see anything here that corroborates your claim that it outputs more consistent test code than most engineers. In fact your second case would indicate otherwise.
And this also goes back to my first point about writing tests that matters. Coverage can matter, but coverage is not codifying business logic in your test suite. I've seen many engineers focus only on coverage only for their code to blow up in production because they didn't bother to test the actual real world scenarios it would be used in, which requires deep understanding of the full system.
I still feel like in most of these discussions the criticism of LLMs is that they are poor replacements for great engineers. Yeah. They are. LLMs are great tools for great engineers. They won’t replace good engineers and they won’t make shitty engineers good.
You can’t ask an LLM to autonomously write complex test suites. You have to guide it. But when AI creates a solid test suite with 20 minutes of prodding instead of 4 hours of hand coding, that’s a win. It doesn’t need to do everything alone to be useful.
> writing tests that matters
Yeah. So make sure it writes them. My experience so far is that it writes a decent set of tests with little prompting, honestly exceeding what I see a lot of engineers put together (lots of engineers suck at writing tests). With additional prompting it can make them great.
That seems like the kind of feature where the LLM would already have the domain knowledge needed to write reasonable tests, though. Similar to how it can vibe code a surprisingly complicated website or video game without much help, but probably not create a single component of a complex distributed system that will fit into an existing architecture, with exactly the correct behaviour based on some obscure domain knowledge that pretty much exists only in your company.
> probably not create a single component of a complex distributed system that will fit into an existing architecture, with exactly the correct behaviour based on some obscure domain knowledge that pretty much exists only in your company.
An LLM is not a principal engineer. It is a tool. If you try to use it to autonomously create complex systems, you are going to have a bad time. All of the respectable people hyping AI for coding are pretty clear that they have to direct it to get good results in custom domains or complex projects.
A principal engineer would also fail if you asked them to develop a component for your proprietary system with no information, but a principal engineer would be able to so their own deep discovery and design if they have the time and resources to do so. An AI needs you to do some of that.
I also find it hard to agree with that part. Perhaps it depends on what type of software you write, but in my experience finding good test cases is one of those things that often requires a deep level of domain knowledge. I haven’t had much luck making LLMs write interesting, non-trivial tests.
This has been my experience as well. So far, whenever I’ve been initially satisfied with the one shotted tests, when I had to go back to them I realized they needed to be reworked.
I guess everyone dealing with legacy software sees code as a cost factor. Being able to delete code is harder, but often more important than writing code.
Owning code requires you to maintain it. Finding out what parts of the code actual implement features and what parts are not needed anymore (or were never needed in the first place) is really hard. Since most of the time the requirements have never been documented and the authors have left or cannot remember. But not understanding what the code does removed all possibility to improve or modify it. This is how software dies.
Churning out code fast is a huge future liability. Management wants solutions fast and doesn't understand these long term costs. It is the same with all code generators: Short term gains, but long term maintainability issues.
Do you not write code? Is your code base frozen, or do you write code for new features and bug fixes?
The fact that AI can churn out code 1000x faster does not mean you should have it churn out 1000x more code. You might have a list of 20 critical features and it have time to implement 10. AI could let you get all 20 but shouldn’t mean you check in code for 1000 features you don’t even need.
I write code. On a good day perhaps 800-1000 "hand written" lines.
I have never actually thought about how much typing time this actually is. Perhaps an hour? In that case 7/8th of my day are filled with other stuff. Like analysis, planning, gathering requirements, talking to people.
So even if an AI removed almost all the time I spend typing away: This is only a 10% improvement in speed. Even if you ignore that I still have to review the code, understand everything and correct possible problems.
A bigger speedup is only possible if you decide not to understand everything the AI does and just trust it to do the right thing.
Maybe you code so fast that the thought-to-code transition is not a bottleneck for you. In which case, awesome for you. I suspect this makes you a significant outlier since respected and productive engineers like Antirez seem to find benefits.
Sure if you just leave all the code there. But if it's churning out iterations, incrementally improving stuff, it seems ok? That's pretty much what we do as humans, at least IME.
I feel like this is a forest for the trees kind of thing.
It is implied that the code being created is for “capabilities”. If your AI is churning out needless code, then sure, that’s a bad thing. Why would you be asking the AI for code you don’t need, though? You should be asking it for critical features, bug fixes, the things you would be coding up regardless.
You can use a hammer to break your own toes or you can use it to put a roof on your house. Using a tool poorly reflects on the craftsman, not the tool.
Just like LLMs are a total waste of time if you never update the system/developer prompts with additional information as you learn what's important to communicate vs not.
That is a completely different level. I expect a Junior Developer to be able to completely replace me long term and to be able decide when existing rules are outdated and when they should be replaced. Challenge my decisions without me asking for it. Being able to adapt what they have learned to new types of projects or new programming languages. Being Senior is setting the rules.
An LLM only follows rules/prompts. They can never become Senior.
Yes. Firstly AI forgets why it wrote certain code and with humans at least you can ask them when reviewing. Secondly current gen AI(at least Claude) kind of wants to finish the thing instead of thinking of bigger picture. Human programmers code little differently that they hate a single line fix in random file to fix something else in different part of the code.
I think the second is part of RL training to optimize for self contained task like swe bench.
I seriously don't remember why I wrote certain code two months ago. I have to read my code that I wrote two months ago to understand what I was doing and why. I don't remember every single line of code that I wrote and why. I guess I'm a stateless developer that way.
So you live in a world where code history must only be maintained orally? Have you ever thought to ask AI to write documentation on what and why and not just write the code. Asking it to document as well as code works well when the AI needs to go back and change either.
I don't see how asking AI to write some description of why it wrote this or that code would actually result in an explanation of why it wrote that code? It's not like it's thinking about it in that way, it's just generating both things. I guess they'd be in the same context so it might be somewhat correct.
If you ask it to document why it did something, then when it goes back later to update the code it has the why in its context. Otherwise, the AI just sees some code later and has no idea why it was written or what it does without reverse engineering it at the moment.
I'm not sure you understood the GP comment. LLMs don't know and can't tell you why they write certain things. You can't fix that by editing your prompt so it writes it on a comment instead of telling you. It will not put the "why" in the comment, and therefore the "why" won't be in the future LLM's context, because there is no way to make it output the "why".
It can output something that looks like the "why" and that's probably good enough in a large percentage of cases.
LLMs know why they are writing things in the moment, and they can justify decisions. Asking it to write those things down when it writes code works, or even asking them to design the code first and then generate/update code from the design also works. But yes, if things aren’t written down, “the LLM don’t know and can’t tell.” Don’t do that.
I'm going to second seanmcdirmid here, a quick trick is to have Claude write a "remaining.md" if you know you have to do something that will end the session.
Example from this morning, I have to recreate the EFI disk of one of my dev vm's, it means killing the session and rebooting the vm. I had Claude write itself a remaining.md to complement the overall build_guide.vm I'm using so I can pick up where I left off. It's surprisingly effective.
No, humans probably have tens of millions of token in memory of memory per PR. It includes not only what's in the code, but what all they searched, what all they tested and in which way, which order they worked on, the edge cases they faced etc. Claude just can't document all these, else it will run out of its working context pretty soon.
Ya, LLMs are not human level, they have smaller focus windows, but you can "remember" things with documentation, just like humans usually resort to when you realize that their tens of millions of token in memory per PR isn't reliable either.
The nice thing about LLMs, however, is that they don't grumble about writing extra documentation and tests like humans do. You just tell them to write lots of docs and they do it, they don't just do the fun coding part. I can empathize why human programmers feel threatened.
> It can output something that looks like the "why"
This feels like a distinction without difference. This is an extension of the common refrain that LLMs cannot “think”.
Rather than get overly philosophical, I would ask what the difference is in practical terms. If an LLM can write out a “why” and it is sufficient explanation for a human or a future LLM, how is that not a “why“?
If you're planning on throwing the code away, fine, but if you're not, eventually you're going to have to revisit it.
Say I'm chasing down some critical bug or a security issue. I run into something that looks overly complicated or unnecessary. Is it something a human did for a reason or did the LLM just randomly plop something in there?
I don't want a made up plausible answer, I need to know if this was a deliberate choice, forex "this is to work around an bug in XY library" or "this is here to guard against [security issue]" or if it's there because some dude on Stackoverflow wrote sample code in 2008.
If your concern is philosophical, and you are defining LLMs as not having a “why”, then of course they cannot write down “why” because it doesn’t exist. This is the philosophical discussion I am trying to avoid because I don’t think it’s fruitful.
If your concern is practical and you are worried that the “why” an LLM might produce is arbitrary, then my experience so far says this isn’t a problem. What I’m seeing LLMs record in commit messages and summaries of work is very much the concrete reasons they did things. I’ve yet to see a “why” that seemed like nonsense or arbitrary.
If you have engineers checking in overly complex blobs of code with no “why”, that’s a problem whether they use AI or not. AI tools do not replace engineers and I would not with in any code base where engineers were checking in vibe coded features without understanding them and vetting the results properly.
I don't care what text the LLM generates. If you wanna read robotext, knock yourself out. It's useless for what I'm talking about, which is "something is broken and I'm trying to figure out what"
In that context, I'm trying to do two things:
1. Fix the problem
2. Don't break anything else
If there's something weird in the code, I need to know if it's necessary. "Will I break something I don't know about if I change this" is something I can ask a person. Or a whole chain of people if I need to.
I can't ask the LLM, because "yes $BIG_CLIENT needs that behavior for stupid reasons" is not gonna be a part of its prompt or training data, and I need that information to fix it properly and not cause any regressions.
It may sound contrived but that sort of thing happens allllll the time.
> If there's something weird in the code, I need to know if it's necessary.
What does this have to do with LLMs?
I agree this sort of thing happens all the time. Today. With code written by humans. If you’re lucky you can go ask the human author, but in my experience if they didn’t bother to comment they usually can’t remember either. And very often the author has moved on anyway.
The fix for this is to write why this weird code is necessary in a comment or at least a commit message or PR summary. This is also the fix for LLM code. In the moment, when in the context for why this weird code was needed, record it.
You also should shame any engineer who checks in code they don’t understand, regardless of whether it came from an LLM or not. That’s just poor engineering and low standards.
Yeah. I know. The point is there is no Chesterson's Fence when it comes to LLMs. I can't even start from the assumption that this code is here for a reason.
And yes, of course people should understand the code. People should do a lot of things in theory. In practice, every codebase has bits that are duct taped together with a bunch of #FIXME comments lol. You deal with what you got.
The problem is that your starting point seems to be that LLMs can check in garbage to your code base with no human oversight.
If your engineering culture is such that an engineer could prompt an LLM to produce a bunch of code that contains a bunch of weird nonsense, and they can check that weird nonsense in with no comments and no will say “what the hell are you doing?”, then the LLM is not the problem. Your engineering culture is. There is no reason anyone should be checking in some obtuse code that solves BIG_CORP_PROBLEM without a comment to that effect, regardless of whether they used AI to generate the code or not.
Are you just arguing that LLM’s should not be allowed to check in code without human oversight? Because yeah, I one hundred percent agree and I think most people in favor of AI use for coding would also agree.
Yeah, and I'm explaining that the gap between theory and practice is greater in practice than it is in theory, and why LLMs make it worse.
It's easy to just say "just make the code better", but in reality I'm dealing with something that's an amalgam of the work of several hundred people, all the way back to the founders and whatever questionable choices they made lol.
The map is the territory here. Code is the result of our business processes and decisions and history.
You're treating this as a philosophical question like a LLM can't have actual reasons because it's not conscious. That's not the problem. No, the problem is mechanical. The processing path that would be needed to output actual reasons just doesn't exist.
LLMs only have one data path and that path basically computes what a human is most likely to write next. There's no way to make them not do this. If you ask it for a cake recipe it outputs what it thinks a human would say when asked for a fake recipe. If you ask it for a reason it called for 3 eggs, it outputs what it thinks a human would say when asked why they called for 3 eggs. It doesn't go backwards to the last checkpoint and do a variational analysis to see what factors actually caused it to write down 3 eggs. It just writes down some things that sound like reasons you'd use 3 eggs.
If you want to know the actual reasons it wrote 3 eggs, you can do that, but you need to write some special research software that metaphorically sticks the AI's brain full of electrodes. You can't do it by just asking the model because the model doesn't have access to that data.
Humans do the same thing by the way. We're terrible at knowing why we do things. Researchers stuck electrodes in our brains and discovered a signal that consistently appears about half a second before we're consciously aware we want to do something!
But this is exactly why it is philosophical. We’re having a discussion about why an LLM cannot really ever explain “why”. And then we turn around and say, but actually humans have the exact same problem. So it’s not an LLM problem at all. It’s a philosophical problem about whether it’s possible to identify a real “why”. In general it is not possible to distinguish between a “real why” and a post hoc rationalization so the distinction is meaningless for practical purposes.
It's absolutely not meaningless if you work on code that matters. It matters a lot.
I don't care about philosophical "knowing", I wanna make sure I'm not gonna cause an incident by ripping out or changing something or get paged because $BIG_CLIENT is furious that we broke their processes.
If I show you two "why" comments in a codebase, can you tell which one was written by an LLM and which was not?
Just like humans leave comments like this
// don't try to optimise this, it can't be done
// If you try, increment this number: 42
You can do the same for LLMs
// This is here because <reason> it cannot be optimised using <method>
It works, I've done it. (In the surface that code looks you can use a specific type of caching to speed it up, but it actually fails because of reasons - LLMs kept trying, I added a comment that stopped them).
Of course I can't tell the difference. That's not the point. And yes, humans can leave stupid comments too.
The difference is I can ping humans on Slack and get clarification.
I don't want reasons because I think comments are neat. If I'm tracking this sort of thing down, something is broken and I'm trying to fix it without breaking anything else.
It only takes screwing this up a couple times before you learn what a Chesterson's Fence is lol.
You are framing this as an AI problem, but from what I’m hearing, this is just an engineering culture problem.
You should not bet on the ability to ping humans on Slack long-term. Not because AI is going to replace human engineers, but because humans have fallible memories and leave jobs. To the extent that your processes require the ability to regularly ask other engineers “why the hell did you do this“, your processes are holding you back.
If anything, AI potentially makes this easier. Because it’s really easy to prompt the AI to record why the hell things are done the way they are, whether recording its own “thoughts” or recording the “why” it was given by an engineer.
It's not an engineering culture problem lol, I promise. I have over a decade in this career and I've worked at places with fantastic and rigorous processes and at places with awful ones. The better places slacked each other a lot.
I don't understand what's so hard to understand about "I need to understand the actual ramifications of my changes before I make them and no generated robotext is gonna tell me that"
StackOverflow is a tool. You could use it to look for a solution to a bug you're investigating. You could use it to learn new techniques. You could use it to guide you through tradeoffs in different options. You can also use it to copy/paste code you don't understand and break your production service. That's not a problem with StackOverflow.
> "I need to understand the actual ramifications of my changes before I make them and no generated robotext is gonna tell me that"
Who's checking in this robotext?
* Is it some rogue AI agent? Who gave it unfettered access to your codebase, and why?
* Is it you, using an LLM to try to fix a bug? Yeah, don't check it in if you don't understand what you got back or why.
* Is it your peers, checking in code they don't understand? Then you do have a culture problem.
An LLM gives you code. It doesn't free you of the responsibility to understand the code you check in. If the only way you can use an LLM is to blindly accept what it gives you, then yeah, I guess don't use an LLM. But then you also probably shouldn't use StackOverflow. Or anything else that might give you code you'd be tempted to check in blindly.
It does actually work incredibly well. It's even remarkably good at looking through existing stuff (written by AI or not) and reasoning about why it is the way it is. I agree it's not "thinking" in the same way a human might, but it gets to a more plausible explanation than many humans can a lot more often than I ever would have thought.
> So you live in a world where code history must only be maintained orally?
There are many companies and scenarios where this is completely legitimate.
For example, a startup that's iterating quickly with a small, skilled dev team. A bunch of documentation is a liability, it'll be stale before anyone ever reads it.
Just grabbing someone and collaborating with them on what they wrote is much more effective in that situation.
> For example, a startup that's iterating quickly with a small, skilled dev team. A bunch of documentation is a liability, it'll be stale before anyone ever reads it.
This is a huge advantage for AI though, they don't complain about writing docs, and will actively keep the docs in sync if you pipeline your requests to do something like "I want to change the code to do X, update the design docs, and then update the code". Human beings would just grumble a lot, an AI doesn't complain...it just does the work.
> Just grabbing someone and collaborating with them on what they wrote is much more effective in that situation.
Again, it just sounds to me that you are arguing why AIs are superior, not in how they are inferior.
Documentation isn't there to have and admire, you write it for a purpose.
There are like eight bajillion systems out there that can generate low-level javadoc-ish docs. Those are trivial.
The other types of internal developer documentation are "how do I set this up", "why was this code written" and "why is this code the way it is" and usually those are much more efficiently conveyed person to person. At least until you get to be a big company.
For a small team, I would 100% agree those kinds of documentation are usually a liability. The problem is "I can't trust that the documentation is accurate or complete" and with AI, I still can't trust that it wrote accurate or complete documentation, or that anyone checked what it generated. So it's kind of worse than useless?
The LLM writes it with the purpose you gave it, to remember why it did things when it goes to change things later. The difference between humans and AI is that humans skip the document step because they think they can just remember everything, AI doesn’t have that luxury.
Just say the model uses the files to seed token state. Anthropomorphizing the thing is silly.
And no, you don't skip the documentation because you "think you can just remember everything". It's a tradeoff.
Documentation is not free to maintain (no, not even the AI version) and bad or inaccurate documentation is worse than none, because it wastes everyone's time.
You build a mental map of how the code is structured and where to find what you need, and you build a mental model of how the system works. Understanding, not memorization.
When prod goes down you really don't wanna be faffing about going "hey Alexa, what's a database index".
Have you never had a situation where a question arose a year (or several) later that wasn’t addressed in the original documentation?
In particular IME the LLM generates a lot of documentation that explains what and not a lot of the why (or at least if it does it’s not reflecting underlying business decisions that prompted the change).
You can ask it to generate the why, even if it the agent isn’t doing that by default. At least you can ask it to encode how it is mapping your request to code, and to make sure that the original request is documented, so you can record why it did something at least, even if it can’t have insight into why you made the request in the first place. The same applies to successive changes.
1. a detailed spec, result of your discussions with the agent about work, when it gets it you ask the agent to formalize it into docs
2. an extensive suite of tests to cover every angle; the tests are generated, but your have to ensure their quality, coverage and depth
I think, to make a metaphor, that specs are like the skeleton of the agent, tests are like the skin, while the agent itself is the muscle and cerebellum, and you are the PFC. Skeleton provides structure and decides how the joints fit, tests provide pain and feedback. The muscle is made more efficient between the two.
In short the new coding loop looks like: "spec -> code -> test, rinse and repeat"
Are you just generating code with the LLM? Ya, you are screwed. Are you generating documentation and tests and everything else to help to code live? Your options for maintenance go up. Now just replace “generate” with “maintain” and you are basically asking AI to make changes to a description at the top that then percolate to multiple artifacts being updated, only one happening to be the code itself, and the code updates multiple time as the AI checks tests and stuff.
I wish there were good guides on how to get the best out of LLMs. All of these tips about adding documentation etc seem very useful but I’ve never seen good guides on how to do this effectively or sustainably.
It is still the early days; everyone has their process, and a lot of the process is still ad hoc. It is an exciting time to be in the field though, before turn key solutions come we all get to be explorers.
Fair, but it would be interesting to see how people are implementing this “write the docs you need to do a better job” logic and putting it into use. I’m playing with this but would love to see someone’s success story. “I did X and now the code is better/its more token efficient/reviewers understand the changes/whatever.”
I just let the LLM write the docs it will read and I don't pay attention to them very much unless I need to debug a problem that it can't solve on its own. I just tell it what areas to focus on, it writes stuff that gets checked in but not really read by humans, it updates the docs when it things change before it changes the code, but can also review all the design stuff to when making code changes.
Sometimes I run into a problem that the LLM can't really handle yet, but I just break the problem up into more docs, tests, and code. So...that usually works, but I admit I'm move more slowly on those problems, and I'm not asking the LLM how to break the problem up yet (although I think we will get there soon).
Do you prompt for anything specific to record or does your prompt just contain something general like “read .aidump if present for potentially useful context and update or create .aidump with any useful information”?
Mostly the latter! You can ask it to look at things conditionally (like, if the test fails, look at this doc before deciding what to do next), but usually I just load it all up at the start before asking it to make change. The LLM is good enough about picking out what it needs. The one problem is that if you have a change you are propagating through the workflow, you need to highlight that change to the LLM or it might not notice it.
I'm working on workflow processing to make this easier ATM (because I can't help my coworkers do what I'm doing, and what I'm doing is so ad hoc), which is why I'm talking about it so much. So the idea is that you request a change at the top, and the LLM updates everything to accommodate the change, keeping track of what changed in each artifact. When it goes to generate code...it has a change for the artifacts that input into code (which are just read in along with a prompt saying "generate the code!"). You just don't ask the LLM to change the code directly (because if you do that, none of the docs get updated for the change, and things can go bad after that...).
When things go wrong, I add extra context if I can spot the problem ("focus on X, X looks wrong because...") and that just merges with the other docs as the context. Sometimes if I can't figure out why a test is failing, I ask it to create a simpler version of the test and see if that fails (if it does, it will be easier to eye the problem). Manual intervention is still necessary (and ugh, sometimes the LLM is just having a bad day and I need to /clear and try again).
I need to play with this more. I’ve had AI generate a bunch of small summaries that it could theoretically use to optimize future work. I haven’t asked it specifically to just dump info as it’s doing other work yet.
The files I had it generate were interesting but I’m not convinced looking at them that they contain the real info the AI needs to be more efficient. I should look into what kind of context analysis agents are passing back because that seems like what I want to save for later.
You can’t just ask AI to dump, you need to vaguely describe what design elements you think are important, like for SQL, you might want to plan our your CTEs first, then come up with a strategy for implementing each one, before getting to the SQL file itself (and of course tests, but that is a separate line of artifacts, you don’t want the AI to look at the tests when updating code, because you want to avoid letting AI code to the test). You can also look at where the AI having trouble doing something, or not doing it very well, and ask it to write documentation that will help it do that more successfully.
I can’t imagine asking AI to change some code without having a description of what the code does. You could maybe reverse engineer that, but that would basically be generating the documents after the fact. Likewise changing code without tests, where failing tests are actionable signals for the AI to make sure it doesn’t break things on update. Some people here think you can just ask it to write code without any other artifacts, thats nuts (maybe agentic will develop in the direction where AI writes persistent artifacts on its own without being told to do so, actually I’m sure that will happen eventually).
> You can’t just ask AI to dump, you need to vaguely describe what design elements you think are important
Right. And that’s what I’ve tried to do but I am not confident it’s captured the most critical info in an efficient way.
> I can’t imagine asking AI to change some code without having a description of what the code does. You could maybe reverse engineer that, but that would basically be generating the documents after the fact.
This is exactly how I’ve been using AI so far. I tell it to deeply analyze the code before starting and it burns huge amounts of tokens relearning the same things it learned last time. I want to get some docs in place to minimize this. That’s why I’m interested in what a subagent would respond with because that’s what it’s operating with usually. Or maybe the compressed context might be an interesting reference.
You can save the analysis and those are your docs. But your workflow has to maintain them in sync with the code.
I have no idea about token cost working for a FAANG, it’s a blind spot for me. One of these days I’m going to try to get QWen coder going for some personal projects on my M3 Max (I can run 30b or even 80b heavily quantized), and see if I can get something going that’s thrifty with the resources provided by a local LLM.
I’m not actually paying for tokens. Just trying to be a good citizen. And also trying to figure out how to set everyone in my organization up to do the same.
Interestingly while playing with Claude Code I just learned that /init actually does analyze and record findings.
Would it not be a new paradigm, where the generated code from AI is segregated and treated like a binary blob? You don't change it (beyond perhaps some cosmetic, or superficial changes that the AI missed). You keep the prompt(s), and maintain that instead. And for new changes you want added, the prompts are either modified, or appended to.
indeed - https://www.dbreunig.com/2026/01/08/a-software-library-with-... appears to be exactly that - the idea that the only leverage you have for fixing bugs is updating prompts (and, to be fair, test cases, which you should be doing for every bug anyway) is kind of upsetting as someone who thinks software can actually work :-)
There is a related issue of ownership. When human programmers make errors that cost revenue or worse, there is (in theory) a clear chain of accountability. Who do you blame if errors generated by LLMs end up in mission critical software?
> Who do you blame if errors generated by LLMs end up in mission critical software?
I don't think many companies/codebases allow LLMs to autonomously edit code and deploy it, there is still a human in the loop that "prompt > generates > reviews > commits", so it really isn't hard to find someone to blame for those errors, if you happen to work in that kind of blame-filled environment.
Same goes with contractors I suppose, if you end up outsourcing work to a contractor, they do a shitty job but that got shipped anyways, who do you blame? Replace "contractor" with "LLM" and I think the answer remains the same.
I have AI agents write, perform code review, improve and iterate upon the code. I trust that an agent with capabilities to write working code can also improve it. I use Claude skills for this and keep improving the skills based on both AI and human code reviews for the same type of code.
It is a bit of a bummer if you spend quite a bit of time tracking down a very weird DNS bug and it turns out to be systemd-resolved.
And I don't want to go into all of the time spend getting systemd unit files correct. There is very active community suggesting things you can add, which then of course breaks your release for users in unexpected ways. An enormous waste of time.
Depends a bit on how you define systemd. Just found out that the systemd developers don't understand DNS (or IPv6). Interesting problems result from that.
> Just found out that the systemd developers don't understand DNS (or IPv6).
Just according to Github, systemd has over 2,300 contributors. Which ones are you referring to?
And more to the point, what is this supposed to mean? Did you encounter a bug or something? DNS on Linux is sort of famously a tire fire, see for example https://tailscale.com/blog/sisyphean-dns-client-linux ... IPv6 networking is also famously difficult on Linux, with many users still refusing to even leave it enabled, frustratingly for those of us who care about IPv6.
Systemd-resolved invents DNS records (not really something you would like to see, makes debugging DNS issues a nightmare). But worse, it populates those DNS records with IPv6 link local addresses, which really have no place in DNS.
Then, when after a nice debugging session why your application behaves so strangely, all the data in DNS is correct, why doesn't it work, you find that this issue has been reported before and was rejected as won't fix, works as intended.
Hm, but systemd-resolved mainly doesn't provide DNS services, it provides _name resolution_. Names can be resolved using more sources than just DNS, some of which do support link-locals properly, so it's normal for getaddrinfo() or the other standard name resolution functions to return addresses that aren't in DNS.
i.e. it's not inventing DNS records, because the things returned by getaddrinfo() aren't (exclusively) DNS records.
The debug tool for this is `getent ahosts`. `dig` is certainly useful, but it makes direct DNS queries rather than going via the system's name resolution setup, so it can't tell you what your programs are seeing.
systemd-resolved responds on port 53. It inserts itself in /etc/resolv.conf as the DNS resolver that is to be used by DNS stub resolvers.
It can do whatever it likes as longs as it follows DNS RFCs when replying to DNS requests.
Redefining recursive DNS resolution as general 'name resolution' is indeed exactly the kind of horror I expect from the systemd project. If systemd-resolved wants to do general name resolution, then just take a different transport protocol (dbus for example) and leave DNS alone.
It's not from systemd though. glibc's NSS stuff has been around since... 1996?, and it had support for lookups over NIS in the same year, so getaddrinfo() (or rather gethostbyname(), since this predates getaddrinfo()!) have never just been DNS.
systemd-resolved normally does use a separate protocol, specifically an NSS plugin (see /etc/nsswitch.conf). The DNS server part is mostly only there as a fallback/compatibility hack for software that tries to implement its own name resolution by reading /etc/hosts and /etc/resolv.conf and doing DNS queries.
I suppose "the DNS compatibility hack should follow DNS RFCs" is a reasonable argument... but applications normally go via the NSS plugin anyway, not via that fallback, so it probably wouldn't have helped you much.
I'm not sure what you are talking about. Our software has a stub resolver that is not the one in glibc. It directly issues DNS requests without going through /etc/nsswitch.conf.
It would have been fine if it was getaddrinfo (and it was done properly) because getaddrinfo gives back a socket and that can add the scope ID to the IPv6 link local address. In DNS there is no scope ID, so it will never work in Linux (it would work on Windows, but that's a different story).
If you don't like those additional name resolution methods, then turn them off. Resolved gives you full control over that, usually on a per-interface basis.
If you don't like that systemd is broken, then you can turn it off. Yes, that's why people are avoiding systemd. Not so much that the software has bugs, but the attitude of the community.
It's not broken - it's a tradeoff. systemd-resolved is an optional component of systemd. It's not a part of the core. If you don't like the choices it took, you can use another resolver - there are plenty.
I don't think many people are avoiding systemd now - but those who do tend to do it because it non-optionally replaces so much of the system. OP is pointing out that's not the case of systemd-resolved.
It's not a trade-off. Use of /etc/resolv.conf and port 53 is defined by historical use and by a large number of IETF RFC.
When you violate those, it is broken.
That's why systemd has such a bad reputation. Systemd almost always breaks existing use in unexpected ways. And in the case of DNS, it is a clearly defined protocol, which systemd-resolved breaks. Which you claim is a 'tradeoff'.
When a project ships an optional component that is broken, it is still a broken component.
The sad thing about systemd (including systemd-resolved) is that it is default on Linux distributions. So if you write software then you are forced to deal with it, because quite a few users will have it without being aware of the issues.
Yes, violating historical precedent is part of the tradeoff - I see no contradiction. Are you able to identify the positive benefits offered by this approach? If not, we're not really "engineering" so to speak. Just picking favorites.
> The sad thing about systemd (including systemd-resolved) is that it is default on Linux distributions. So if you write software then you are forced to deal with it, because quite a few users will have it without being aware of the issues.
I'm well aware - my day job is writing networking software.
That's the main problem with systemd: replacing services that don't need replacing and doing a bad job of it. Its DNS resolver is particularly infamous for its problems.
It's not that simple. For example, in the The Netherlands, the use of electricity was stable for a long time. Mostly because all kinds of equipment (light bulbs, etc) got more efficient.
Grid operators predicted that with the energy transition, demand would rise, but politics wanted to keep prices low and limited investments.
So now, there is a big problem in the entire country connecting companies or new residential areas to the grid independent of how electricity is generated.
At the same time, the government is extremely forward looking and builds massive interconnection points on the North-Sea. Not a bad idea in the long run, but in the short run it does make electricity from wind on sea more expensive.
That said, the biggest hit to EU countries is that cheap natural gas disappeared. Coal is not cheap and extremely polluting. Natural gas was cheap for a while. Until it wasn't.
I recently had this problem is some rust code. I was implementing A and had some code that would decide which of several 'B's to use. I then wanted to call an internal method on A (that takes a mutable reference to A) with a mutable reference to the B that I selected. That was obviously rejected by the compiler and had to find a way around it.
I agree. The main advantage of Fil-C is compatibility with C, in a secure way. The disadvantages are speed, and garbage collection. (Even thought, I read that garbage collection might not be needed in some cases; I would be very interested in knowing more details).
For new code, I would not use Fil-C. For kernel and low-level tools, other languages seem better. Right now, Rust is the only popular language in this space that doesn't have these disadvantages. But in my view, Rust also has issues, specially the borrow checker, and code verbosity. Maybe in the future there will be a language that resolves these issues as well (as a hobby, I'm trying to build such a language). But right now, Rust seems to be the best choice for the kernel (for code that needs to be fast and secure).
How does that compare with rust? You don't happen to have an example of a binary underway moving to rust in Ubuntu-land as well? Curious to see as I honestly don't know whether rust is nimble like C or not.
My impression is - rust fares a bit better on RAM footprint, and about as badly on disk binary size. It's darn hard to compare apples-to-apples, though - given it's a different language, so everything is a rewrite. One example:
Ubuntu 25.10's rust "coreutils" multicall binary: 10828088 bytes on disk, 7396 KB in RAM while doing "sleep".
Alpine 3.22's GNU "coreutils" multicall binary: 1057280 bytes on disk, 2320 KB in RAM while doing "sleep".
I don't have numbers, but Rust is also terrible for binary size. Large Rust binaries can be improved with various efforts, but it's not friendly by default. Rust focuses on runtime performance, high-level programming, and compile-time guarantees, but compile times and binary sizes are the drawback. Notably, Rust prefers static linking.
For DNS as a service to work, it has to be accessible and give the right answers. It doesn't matter why it is not accessible or why it doesn't give the right answers. If it doesn't, then the service is broken.
DNS is in the unique position that it is relatively high up in the network stack, so lots of (network) failures affect DNS as a service. It is a big distributed database, which gives many possibilities for wrong data, it is used by almost all applications, so a failure of DNS as a service is highly noticeable.
Finally, DNS has by nature (some what) centralized choke points. If you have a domain like company.com, then just about everything that company does has to go through the DNS servers for company.com. Any small failure there can have a huge effect.
As an open source software vendor I can say two things:
1) The CVE system allows vendors to deny CVEs that relate to their product. I don't know the exact rules, so I don't know if it applies in this case. We take anything that can crash our software seriously.
2) For users without a support contract, your priority does not automatically become out priority. If you want your issues fixed then make sure we have the money to do so. Just because you got a free download doesn't give you any rights to support.
What started this is a case where you have to put weird stuff in a config file to trigger the CVE. If the people behind dnsmasq don't get paid or not enough, then it is perfectly fine if this is not a priority.
We have a very popular product, lots of use in what is really the foundation of the internet and almost no support contracts.
So you can turn the argument around, if you are not paying for software, consider it a hobby project. Feel free to report and issue and create a ticket. But don't expect anything to happen. And don't complain on mailing lists how your issue is not taken seriously. Just fix the issue yourself or switch to a different product.
I think you're missing my point. Your code is your resume. It's also an advertisement for whether your product is worth donating to, helping with, buying, and whether you are an excellent coder and project maintainer or not.
A CVE, bogus or not, needs to be handled. If you don't, it reflects upon you. Hands down. No amount of "but it's for free" works to negate this. Ever. No one can demand anything of you, but your reputation will 100% be graded upon how you deal with such things.
This is the way the world works. This is how reputation works. Get over it. Deal with it. Understand it. No, you're not going to ever change this, unless you genetically engineer new humans. This is how humans, and human society has existed for millennia. You will never, ever, ever, change this. You will never explain an alternate to anyone. Ever.
Even if the CVE is bogus, you need to set the record straight, and it's almost akin to libel against your project and you. My suggestions about having a page listing all CVEs are fairly clear and to the point.
These suggestions help people asses your project and your reliability and competency. Yet at the same time? They reduce your effort and work!
Instead of debating endlessly on a mailing list, and instead of repeated bug reports, a well placed security page will take the lion's bulk of such things, answer them, and leave the project team free to not deal with questions on each CVE.
Such a list gives you an authoritative reason why the CVE is triaged as it is, you can point mailing list inquiries at it, WONTFIX bug reports at it, and you can even put your project's stance at the top of the page!
What I've been saying in these posts, is that organization overrules chaos. And that even if some weirdos disagree with you, or have silly expectations, you're crystal clear on things.
I think this is what you want. Your concerns about what people should expect, are dealt with via this method. I actually think we're aligned here, except (perhaps?) you think doing this is work.
It's not. It's the opposite of work. It's saving time.
Why?
Because you will never, ever, ever change human behaviour. Ever. Literally nothing has ever changed in, for example, how commercial transactions occur. This exact complaint could happen today over a used car:
Every problem you've had with humans has been done endlessly billions of trillions of times. Just because it's a software project, doesn't mean it's any different than any other project. There have been volunteer, for free works since the inception of humanity. There have been people with unrealistic expectations, and the tug and pull therein.
I'll reiterate my original stance, just make it clear. Make it clear that you're dealing with CVEs. Part of this makes it eminently clear that the fly in the ointment is the persistent person with crazy expectations. Not your project.
At the level of dnsmasq, I doubt they will care about resume.
CVEs are obviously important to you. I'm sure CVEs would be important to Dnsmasq, if they would get paid to handle them. So my guess is that they don't.
If they don't have the resources to deal with those CVEs (and I would certainly try to fix config errors that lead to crashes) despite being a hugely popular piece of software then they are just not going to deal with those CVEs, or report on them, etc.
The next step, given that Dnsmasq is used by big companies as well, might be to leave those CVEs out there on purpose. No money, no work.
If you expect that people are just not going to give you enough money then leaving out certain aspects of professionally maintained software is reasonable.
I recently started re-reading "Programming in Ada" by J.G.P. Barnes about the original Ada. In my opinion, it was not that good of a language. Plenty of ways to trigger undefined behavior.
Where C was clearly designed to be a practical language with feedback from implementing an operating system in C. Ada lacked that kind of practical experience. And it shows.
I don't know anything about modern day Ada, but I can see why it didn't catch on in the Unix world.
I recall watching a presentation about C++20. During the presentation, the presenter said there were about 163 undefined behaviors in the C language (note: I think it was C99) which implied there were many more in C++ since it’s a much more complex language. Unfortunately, I don’t have a link to that presentation.
You might have heard about the SPARK variant of Ada. I recall reading in an article many years ago that the original version of SPARK was based on Ada83 because it is a very safe language with a lot less undefined behaviors, which is key to trying to statically prove the correctness of a program.
I'm curious about this list, because it definitely doesn't seem that way these days. It'd be interesting to see how many of these are still possible now.
I didn't make a list, but let me give an example. Page 22 where variable declarations are introduced:
If a variable is declared and not given an initial value then great care must be taken not to use the undefined value of the variable until one has been properly given to it. If a program does use the undefined value in an uninitialised variable, its behaviour will be unpredictable; the program is said to be erronous.
My conclusion is that C is not a good basis for what Rust is trying to do. The kind of reliability Rust is trying to provide with almost no runtime overhead requires a much more complex language than C.
reply