davidbau's comments

davidbau · 2025-12-28T02:44:59 1766889899

"I reward-hacked myself" is a great way to put it!!

AI is too aware of human behavior, and it is teaching us that willpower and config files are not enough. When the agent keeps producing output that looks like progress, it is hard not to accept. We need something external that pushes back when we don't.

That is why automated tests matter: not just because they catch bugs (though they do), but because they are a commitment device. The agent can't merge until the tests pass. "Test the tests" matters because otherwise the agent just games whatever shallow metric we gave it, or when we're not looking, it guts the tests.

The discipline needs to be structural, not personal. You cannot out-willpower a system that is totally optimized to make you say yes.

davidbau · 2025-12-28T02:33:58 1766889238

Fair. Let me be more precise.

The distinction is between two ways of deploying human thinking. In the first, you are the test oracle: think about every test, repeat every five minutes, or every 30. In the second, you design the evaluation infrastructure: what to measure, what's untested, what hypotheses to prioritize. Both require judgment. But the first scatters your attention; the second concentrates it. I disagree that an LLM cannot be used to write tests, but as with any battery of tests, you cannot trust it blindly. You need to think about how to test the tests.

As for product risk: I do not know what is hiding in 13,000 lines I haven't read; there are certainly bugs to be found. But that is just as true when you manage a big team. The solution has never been to read every line your collaborators write. You need to invest in (technical and human) systems that give you confidence without requiring you to personally verify everything. The question is how to build systems that are good enough.

davidbau · 2025-12-28T02:11:09 1766887869

Agreed. The question, for me:

Is it possible to vibe code (the second way, without looking at 90% of the code) and still learn the important things?

I think the keys to the castle will come from figuring out how to do this.

davidbau · 2025-12-28T02:00:48 1766887248

No question this will be hard to do.

But I am not so pessimistic. I do think it will be possible, because it is more fun to test your tests now than in the pre-LLM era. You just need a little bit of knowledge and patience, and the LLM absorbs most of the psychic pain.

If programmers get accustomed to doing their tests of tests, software might actually get better.

davidbau · 2025-12-28T01:34:57 1766885697

Right. Just saw this thread. Yesterday asked claude+codex to add a fallback to WebGL support (another 5000 LoC!). So now it works a bit better on Linux, Safari, though the WebGL impl is not as smooth as WebGPU.

davidbau · on Feb 4, 2022

My gosh, I think COVID distancing has been bad for all our mental health.

Or maybe this kind of trash is the output of professional RIA trolls. This article is an anti-science, anti-government fever dream masquerading as a critique of federal funding of science, and I’m not sure why it gets upvotes. It has nothing to do with federal funding and everything to do with the author’s inability to imagine that his fellow citizens might be trying to act in good faith. Substitute any public good for the word “science”, and the article is the same. Should society really be burned down, because, uh, military-industrial complex?

Someday after people like the author have decimated all of modern civilization, historians will look back at the era of Big Science in the 20+21st century U.S. as one of humanity’s great achievements. https://en.wikipedia.org/wiki/History_of_science_policy

Living in America we take for granted the internet, materials science, agriculture, meteorology, geography, as well as all the eye-catching astronomical telescopes etc. As a country we have been big believers in the value of uncovering the truth. And it has worked. Consider that the entire field of modern biology is a product of big government science. It was a freshly-minted postdoc who got a government job at the NIH who actually cracked the genetic code, (yes it was a lowly government employee who did it, years after fancy famous academics theorized about it without actually doing the gruntwork of setting up the decisive experiments to decode the actual code). https://en.wikipedia.org/wiki/Marshall_Warren_Nirenberg . Maybe the problem is that government science is not promoted by big marketing machines. The nature of public science is that the work is left to speak for itself. So scientists like Nirenberg are not as celebrated as some startup founders or even university professors. But a huge amount of modern science and modern society rests on these everyday government-funded science efforts.

It is disgusting that the public servants who have long been doing good, hard, thankless work, are now being vilified.

Please don’t burn it all down.

goatsneez · on Feb 4, 2022

I think the point of the writing in this article (and many similar ones out-there, including dozens upon dozens of published ones from academics themselves) is to point out a systemic problem that has emerged from the complexity of our civilization which is increasingly dependent on the institution of science to provide facts (and also "facts" used for political/policy advertisement) on plethora of diverse issues. The values you try to defend in your post are real but are not under question or even in the spotlight.

The bureaucratization of science, the science institutions metrics and evaluation, the iron-triangle (written about 2-3 decades by academics) all are valid issues which now seep out to the public. If you want more "posh" articles see https://innovationfrontier.org/fix-science-dont-just-fund-it... or here (science is self-destructing...) https://www.thenewatlantis.com/publications/saving-science

In short, there is no need to "burn it all down", but denying or ignoring what appears nowadays even to an outsider(s) (for the article is clearly by a non-academic and it is not exactly just an outlier) obvious means that discussion and thinking about improvements is in place. Having a debate, well, on HN is not really impact-full... but regardless.