More

yuedongze · 2025-12-08T22:39:59 1765233599

> because it looks about right to me

this is something one can look in further. it is really probabilistic checkable proofs underneath, and we are naturally looking for places where it needs to look right, and use that as a basis of assuming the work is done right.

yuedongze · 2025-12-08T22:36:23 1765233383

It's nice to see a wide array of discussions under this! Glad that I didn't give up on this thought and end up writing it down.

I want to stress that the main point of my article is not really about AI coding, it's about letting AI perform any arbitrary tasks reliably. Coding is an interesting one because it seems like it's a place where we can exploit structure and abstraction and approaches (like TDD) to make verification simpler - it's like spot-checking in places with a very low soundness error.

I'm encouraging people to look for tasks other than coding to see if we can find similar patterns. The more we can find these cost asymmetry (easier to verify than doing), the more we can harness AI's real potential.

felipeerias · 2025-12-09T00:38:03 1765240683

Thinking about the relationship between creation and verification is a good way to develop productive workflows with AI tools.

One that works particularly well in my case is test-driven development followed by pair programming:

• “given this spec/context/goal/… make test XYZ pass”

• “now that we have a draft solution, is it in the right component? is it efficient? well documented? any corner cases?…”

Yoric · 2025-12-08T23:56:38 1765238198

Note that in the case of coding, there is an entire branch of computer science dedicated to verification.

All the type systems (and model-checkers) for Rust, Ada, OCaml, Haskell, TypeScript, Python, C#, Java, ... are based on such research, and these are all rather weak in comparison to what research has created in the last ~30 years (see Rocq, Idris, Lean).

This goes beyond that, as some of these mechanisms have been applied to mathematics, but also to some aspects of finance and law (I know of at least mechanisms to prove formally implementations of banking contracts and tax management).

So there is lots to do in the domain. Sadly, as every branch of CS other than AI (and in fact pretty much every branch of science other than AI), this branch of computer science is underfunded. But that can change!

charcircuit · 2025-12-09T02:28:49 1765247329

Considering how useful I've found AI at finding and fixing bugs proportional to the effort I put in, I question your claim that it's being underfunded. While I have learned things like Idris, in the end I never was able to practically use them to reduce bugs in the software I was writing unlike AI. It's possible that the funding towards these types of languages is actually distracting people from more practical solutions which could actually mean that it is overfunded in regards to program verification.

yuedongze · 2025-12-08T19:00:11 1765220411

I think it's definitely an interesting subject for Verification Engineering. the easier to task AI to do work more precisely, the easier we can check their work.

CuriouslyC · 2025-12-08T19:22:29 1765221749

Yup. Codebase structure for agents is a rabbit hole I've spent a lot of time going down. The interesting thing is that it's mostly the same structure that humans tend to prefer, with a few tweaks: agents like smaller files/functions (more precise reads/edits), strongly typed functional programming, doc-comments with examples and hyperlinks to additional context, smaller directories with semantic subgroups, long/distinct variable names, etc.

lukan · 2025-12-08T21:11:33 1765228293

Aren't those all things, humans also tend to prefer to read?

I like to read descriptive variable names, I just don't like to write them all the time.

yuedongze · 2025-12-08T18:35:32 1765218932

indeed, i see verification debt outweighing tradition tech debt very very soon...

yuedongze · 2025-12-08T17:56:23 1765216583

i've seen a lot of startups that use AI to QA human work. how about the idea of use humans to QA AI work? a lot of interesting things might follow

hn_acc1 · 2025-12-09T00:03:41 1765238621

This feels a lot like the "humans must be ready at any time to take over from FSD" that Tesla is trying to push. With presumably similar results.

If it works 85% of the time, how soon do you catch that it is moving in the wrong direction? Are you having a standup every few minutes for it to review (edit) it's work with you? Are you reviewing hundreds of thousands of lines of code every day?

It feels a bit like pouring cement or molten steel really fast: at best, it works, and you get things done way faster. Get it just a bit wrong, and your work is all messed up, as well as a lot of collateral damage. But I guess if you haven't shipped yet, it's ok to start over? How many different respins can you keep in your head before it all blends?

__loam · 2025-12-08T18:03:52 1765217032

No thanks.

adventured · 2025-12-08T18:18:56 1765217936

A large percentage (at least 50%) of the market for software developers will shift to lower paid jobs focused on managing, inspecting and testing the work that AI does. If a median software developer job paid $125k before, it'll shift to $65k-$85k type AI babysitting work after.

mjr00 · 2025-12-08T18:56:11 1765220171

It's funny that I heard exactly this when I graduated university in the late 2000s:

> A large percentage (at least 50%) of the market for software developers will shift to lower paid jobs focused on managing, inspecting and testing the work that outsourced developers do. If a median software developer job paid $125k before, it'll shift to $65k-$85k type outsourced developer babysitting work after.

Aldipower · 2025-12-08T17:59:13 1765216753

Sounds inhuman.

quantummagic · 2025-12-08T18:05:07 1765217107

As an industry, we've been doing the same thing to people in almost every other sector of the workforce, since we began. Automation is just starting to come for us now, and a lot of us are really pissed off about it. All of a sudden, we're humanitarians.

Terr_ · 2025-12-08T19:24:59 1765221899

> Automation is just starting to come for us now

This argument is common and facile: Software development has always been about "automating ourselves out of a job", whether in the broad sense of creating compilers and IDEs, or in the individual sense that you write some code and say: "Hey, I don't want to rewrite this again later, not even if I was being paid for my time, I'll make it into a reusable library."

> the same thing

The reverse: What pisses me off is how what's coming is not the same thing.

Customers are being sold a snake-oil product, and its adoption may well ruin things we've spent careers de-crappifying by making them consistent and repeatable and understandable. In the aftermath, some portion of my (continued) career will be diverted to cleaning up the lingering damage from it.

A4ET8a8uTh0_v2 · 2025-12-08T18:03:16 1765216996

Nah, sounds like management, but I am repeating myself. In all seriousness, I have found myself having to carefully rein some of similar decisions in. I don't want to get into details, but there are times I wonder if they understand how things really work or if people need some 'floor' level exposure before they just decree stuff.

colechristensen · 2025-12-08T18:27:38 1765218458

Yes, but not like what you think. Programmers are going to look more like product managers with extra technical context.

AI is also great at looking for its own quality problems.

Yesterday on an entirely LLM generated codebase

Prompt: > SEARCH FOR ANTIPATTERNS

Found 17 antipatterns across the codebase:

And then what followed was a detailed list, about a third of them I thought were pretty important, a third of them were arguably issues or not, and the rest were either not important or effectively "this project isn't fully functional"

As an engineer, I didn't have to find code errors or fix code errors, I had to pick which errors were important and then give instructions to have them fixed.

mjr00 · 2025-12-08T18:53:43 1765220023

> Programmers are going to look more like product managers with extra technical context.

The limit of product manager as "extra technical context" approaches infinity is programmer. Because the best, most specific way to specify extra technical context is just plain old code.

LPisGood · 2025-12-08T20:29:40 1765225780

This is exactly why no code / low code solutions don’t really work. At the end of the day, there is irreducible technical complexity.

manmal · 2025-12-08T21:04:25 1765227865

Yeah, don‘t rely on the LLM finding all the issues. Complex code like Swift concurrency tooling is just riddled with issues. I usually need to increase to 100% line coverage and then let it loop on hanging tests until everything _seems_ to work.

(It’s been said that Swift concurrency is too hard for humans as well though)

colechristensen · 2025-12-08T23:39:24 1765237164

I don't trust programmers to find all the issues either and in several shops I've been in "we should have tests" was a controversial argument.

A good software engineering system built around the top LLMs today is definitely competitive in quality to a mediocre software shop and 100x faster and 1000x cheaper.

yuedongze · 2025-12-08T17:53:33 1765216413

im hoping this can introduce a framework to help people visualize the problem and figure out a way to close that gap. image generation is something every one can verify, but code generation is perhaps not. but if we can make verifying code as effortless as verifying images (not saying it's possible), then our productivity can enter the next level...

drlobster · 2025-12-08T17:57:20 1765216640

I think you underestimating how good these image generators are at the moment.

yuedongze · 2025-12-08T17:59:06 1765216746

oh i mean the other direction! checking if a generated image is "good" that no one will tell something is off and it look naturally, rather than checking if they are fake.

yuedongze · on Sept 3, 2024

oh interesting thoughts! let me digest and get back to you. overall i just defined a read API description that is "read content from this gist file by calling this API, it returns content of a text file, formatted as CSV, with first row defining the columns" and a write API description that is similar.

I develop on a paid plan GPT-4o (it allows me creating custom GPTs), but users can be on free tier GPT-4o to use it (although it limits image uploads to a handful per day). free tier github as always.

yuedongze · on Aug 25, 2023

I’m curious who is willing to use more than 10 clouds

yuedongze · on Aug 11, 2023

The HTTP/S proxy network of NL is interesting. But I think QUIC unreliable transport + MASQUE is the future here

yuedongze · on June 27, 2023

Are there any good performance tests done between Tinc and WG? Curious to see how they perform and what there bottlenecks are.