More

worble · 2026-01-20T12:29:54 1768912194

I'd never heard of Ark UI before, and as a svelte and solidjs dabbler it's great it supports multiple frameworks. Thanks for this site!

worble · 2026-01-18T12:39:31 1768739971

I've never seen a website break from this, got any examples?

linker3000 · 2026-01-18T13:51:54 1768744314

LinkedIn - it takes you to the allow/deny page but doesn't automate things. It used to be that the LinkedIn login would get stuck in a cycle around this, but now it just dumps you on to the consent page.

rsynnott · 2026-01-18T18:50:22 1768762222

I mean, no great loss.

worble · 2025-12-31T14:25:59 1767191159

I'd be curious in how well it passes 100th Coin's NES accuracy tests https://github.com/100thCoin/AccuracyCoin

utopiah · 2025-12-31T14:37:13 1767191833

Indeed, that's what I kind of hinted at in https://news.ycombinator.com/item?id=46442195 and coincidentally https://news.ycombinator.com/item?id=46437688 briefly after, namely that OK, one can "generate" a "solution", that's much easier than before... but until we can verify somehow that it actually does what it say it does (and we know of hallucinations and have no reason to believe this changed) then testing itself, especially of well know "problems" is more and more important.

That being said, it doesn't answer the "why" in the first place, an even more important question. At least though it does help somehow to compare with existing alternatives.

garciasn · 2025-12-31T15:34:17 1767195257

Isn’t this how all software development works? Folks commit code, it’s tested, and reviewed, and then deployed.

Why would this be any different?

PaulDavisThe1st · 2025-12-31T15:53:05 1767196385

That's not how software development works.

Folks think, they write code, they do their own localized evaluation and testing, then they commit and then the rest of the (down|up)stream process begins.

LLM's skip over the "actually verify that the code I just wrote does what I intended it to" step. Granted, most humans don't do this step as thoroughly and carefully as would be desirable (sometimes through laziness, sometimes because of a belief in (down|up)stream testing processes). But LLM's don't do it at all.

sally_glance · 2025-12-31T16:00:21 1767196821

They absolutely can do that if you give them the tools. Seeing Claude (I use it with opencode agents) run curl and playwright to verify and then fix it's implementation was a real 'wow' moment for me.

Q6T46nT668w6i3m · 2025-12-31T17:34:56 1767202496

We have different experiences. Often I’ll see Claude, et. al. find creative ways to fulfill the task without satisfying my intent, e.g., changing the implementation plan I specifically asked for, changing tolerances or even tests, and frequently disabling tests.

sally_glance · 2026-01-01T04:47:33 1767242853

Yeah I feel that, if it happens your only way out is to write down a more extensive implementation plan first. For me that is the point where I start regretting to have tried implementing something using AI,.. But admittedly most of the time redacting the implementation plan and running the agent again is still faster than I could have done on my own (I try to make implementation tasks explicit in the form of a markdown file, worked pretty well so far).

Fr0styMatt88 · 2025-12-31T20:55:40 1767214540

I see these “you had a different experience than me” comments around AI coding agents a lot and can concur; I’ll have a different experience with Copilot from day-to-day even, sometimes it’s great and other days I give up on using it at all it’s being so bad.

Makes me honestly wonder — will AGI just give us agents that get into bad moods and not want to work for the day because they’re tired or just don’t feel like it!

ssl-3 · 2025-12-31T22:57:55 1767221875

If part of the goal is to emulate a person's abilities, then surely that includes a person's ability to fuck things up.

DANmode · 2025-12-31T19:48:05 1767210485

Are you a customer?

DANmode · 2025-12-31T22:12:13 1767219133

Don’t downvote because you don’t like the question.

It obviously adds to the discussion: paid and non paid accounts are being conflated daily in threads like these!

They’re not the same tier account!

Free users, especially ones deemed less interesting to learn from for the future, are given table-scraps when they feel it’s necessary for load reasons.

nineteen999 · 2025-12-31T23:11:15 1767222675

Exactly. There's an impedance mismatch between those using the free/cheap tiers and those paying a premium, so the discussion gets squirrely because one side is talking about apples and the other oranges.

DANmode · 2026-01-01T06:23:46 1767248626

Right.

More specifically: One side is talking about apples,

and the other is talking about mushy old apples,

that sometimes you need to wait 12 hours for.

baobun · 2026-01-01T01:24:01 1767230641

All user accounts are also customers. Some are paying with data and contributing to metrics going up.

DANmode · 2026-01-01T02:36:19 1767234979

That’s not how words work.

All users are stakeholders.

They’re emphatically not considered customers.

We can disagree with that, create legal protections for those people - but that doesn’t make them customers to OpenAI, Anthropic, et al.

mapontosevenths · 2025-12-31T16:03:15 1767196995

> LLM's skip over the "actually verify that the code I just wrote does what I intended it to" step.

I'm not sure where this idea comes from. Just instruct it to write and run unit tests and document as it goes. All of the ones I've used will happily do so.

You still have to verify that the unit tests are valid, but that's still far less work than skipping them or writing the code/tests yourself.

butlike · 2025-12-31T20:42:33 1767213753

I disagree it's less work. It just carte blanche rewrites tests. I've seen it rewrite and rewrite tests to the point of undermining the original test intention. So now instead of intentionally writing code and a new unit test, I need to intentionally go and review EVERY unit test it touched. Every. Time.

It also doesn't necessarily rewrite documentation as implementation changes. I've seen documentation code rot happen within the same coding session.

mapontosevenths · 2025-12-31T23:04:07 1767222247

I've seen it do that as well. Especially Gemini 3 lately.

I've started to add an instruction to my GEMINI.md after I'm happy with the tests telling it not to edit them, but to still run them.

I solve the documentation issue the same way. By telling it when and what to update in the .md file.

jimmaswell · 2025-12-31T16:35:57 1767198957

> actually verify that the code I just wrote does what I intended it to

That's what the author did when they ran it.

adventured · 2025-12-31T16:33:04 1767198784

Claude Opus 4.5 will routinely test its own code before handing it off to you, even with zero instruction to do so.

PaulDavisThe1st · 2025-12-31T19:06:33 1767207993

One commercial equivalent to the project I work on, called ProTools (a DAW), has a test "harness" that took 6 people more than a year to write and takes more than a week to execute.

Last month, I made a minor change to our own code and verified that it worked (it did!). Earlier this week, I was notified of an entirely different workflow that had been broken by the change I had made. The only sort of automated testing that would have detected this would have been similar in scope and scale to the ProTools test harness, and neither an individual human nor an LLM is going to run that.

Moreover, that workflow was entirely graphically based, so unless Claude Opus 4.5 or whatever today's flavor of vibe coding LLM agent is has access to a testing system that allows it to inject mouse events into a running instance of our application (hint: it does not), there's no way it could run an effective test for this sort of code change.

I have no doubt that Claude et al. can verify that their carefully defined module does the very limited task it is supposed to do, for cases where "carefully defined" and "very limited" are appropriate. If that's the only sort of coding you do, I am sorry for your loss.

utopiah · 2025-12-31T19:51:34 1767210694

> access to a testing system that allows it to inject mouse events into a running instance of our application

FWIW that's precisely what https://pptr.dev is all about. To your broader point though designing a good harness itself remains very challenging and requires to actually understand what value for user, software architecture (to e.g. bypass user interaction and test the API first), etc.

PaulDavisThe1st · 2025-12-31T22:36:08 1767220568

> Puppeteer is a JavaScript library which provides a high-level API to control Chrome or Firefox

my world is native desktop applications, not in-browser stuff.

nineteen999 · 2025-12-31T23:13:18 1767222798

You suggest a web testing framework as a response to someone working on a real desktop app?

utopiah · 2026-01-01T07:05:19 1767251119

No I was sharing an example of a framework that does include "a testing system that allows it to inject mouse events".

That being said mouse events and similar isn't hard to do, e.g. start with a fixed resolution (using xrandr) then xdotool or similar. Ideally if the application has accessibility feature it won't be as finicky.

My point though was just to show that testing with GUI is not infeasible.

Apparently there is even a "UI Testing for devs & agents" https://www.chromatic.com which I found via Visual TDD https://www.chromatic.com/blog/visual-test-driven-developmen... I can't recommend this but it does show even though the person I was replying with can't use Puppeteer in their context the tooling does exist and the principles would still apply.

PaulDavisThe1st · 2026-01-01T16:42:19 1767285739

> My point though was just to show that testing with GUI is not infeasible.

Indeed, which is why I mentioned the ProTools test harness and the fact that it took 6 people a year to write and takes a week to run (or took a week, at some point in the past; it might be more or less now).

astrange · 2025-12-31T23:44:53 1767224693

Claude can do that, yes.

https://platform.claude.com/docs/en/agents-and-tools/tool-us...

Although if you want to test a UI app, it's better to do it through accessibility APIs rather than actually looking at the screen and clicking.

roger_ · 2025-12-31T15:05:50 1767193550

I’m sure you can point Claude at that page and have it make the necessary changes to pass.

deadbabe · 2025-12-31T15:46:03 1767195963

Or it could loop infinitely, never quite being able to pass all the tests.

hu3 · 2025-12-31T21:50:33 1767217833

which is easily fixable by some human guidance

RAMJAC · 2026-01-01T00:18:21 1767226701

Sorta, I went into this not really knowing how to implement an emulator: https://github.com/RAMJAC-digital/RAMBO

With the NES there are all sorts of weird edge cases, one of which are NMI flags and resets; the PPU in general is kinda tricky to get right. Claude has had *massive** issues with this, and I've had to take control and completely throw out code it's generated. I'm restarting it with a clean slate though, as there are still issues with some of the underlying abstractions. PPU is still the bane of my existence, DMA, I don't like the instruction pipeline, haven't even gotten to the APU. It's getting an 80/130 on accuracy coin.

Though, when it came to creating a WASM target, Claude was largely able to do it with minimal input on my end. Actually, getting the WASM emulator running in the browser was the least painful part of this project.

You will run into three problems: 1) "The Wall" when any project becomes large enough, you need the context window to be *very* specific and scoped, with explicit details of what is expected, the success criteria and deliverables. 2) Ambiguity means Claude is going to choose the path of least resistance, and will pedantically avoid/add things which are not specced. Stubs for functions, "beyond scope", "deferred" are some favorite excuses to not refactoring or implementing obvious issues (anything that will go beyond the context window, Claude knows, but won't tell you will be punted work). 3) Chat bots *loooove* to talk, it will vomit code for days. Removing code/documentation is anathema to Claude. "Backward compatibility", deprecated, and legacy being its favorite.

deadbabe · 2026-01-01T15:51:53 1767282713

This sounds exhausting, once the thrill of seeing code rapidly generated wears off, I wonder if it's even worth it. If someone was going to use code they didn't write, why not just pull down some open source implementation from somewhere and build on top of it? It's basically gets you the same thing but without the LLM hassles, and you can start building on a more sane foundation.

worble · 2025-12-30T11:47:00 1767095220

They've all started cracking down, in the past year the Barclays and Lloyds app have broken on my phone.

TSB still works for now, but even for a bank they're technologically incompetent so I'm going to just assume they're behind the curve rather than willingly not using SafetyNet.

The only one I would bank on still working in the future is Monzo, since, like you say, they detect it and just give you scary warning and let you continue.

lol768 · 2025-12-30T14:23:49 1767104629

Barclays have always played silly games with this stuff, they used to fund a whole team whose job it was to waste time on security theatre (this was nearly ten years ago).

worble · 2025-12-12T22:37:07 1765579027

Hello fellow Ubuntu font lover.

I have this set as my OS default and also forced for all webpages, I just find it so clear and easy to read. On the occasion that I have to browse the web without it, I don't struggle per-say, but I definitely find that I have to read slower, and find myself rereading words more often.

worble · 2025-12-07T02:08:09 1765073289

See: Kernighan's Law

> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

https://www.laws-of-software.com/laws/kernighan/

DrewADesign · 2025-12-07T02:24:15 1765074255

I think people misunderstand this quote. Cleverness in this context is referring to complexity, and generally stems from falling in love with some complex mechanism you dream up to solve a problem rather than challenging yourself to create something simpler and easier to maintain. Bolting together bits of LLM-created code is is far more likely to be “clever” rather than good.

SilverSlash · 2025-12-07T04:28:20 1765081700

What an amazing quote!

worble · 2025-11-25T01:42:36 1764034956

Junior's grow into mids, and eventually into seniors. OSS contributor's eventually learn the codebase, you talk to them, you all get invested in the shared success of the project and sometimes you even become friends.

For me, personally, I just don't see the point of putting that same effort into a machine. It won't learn or grow from the corrections I make in that PR, so why bother? I might as well have written it myself and saved the merge review headache.

Maybe one day it'll reach perfect parity of what I could've written myself, but today isn't that day.

mwigdahl · 2025-11-25T16:37:44 1764088664

I wonder if that difference in mentality is a large part of the pro- vs anti-AI debate.

To me the AI is a very smart tool, not a very dumb co-worker. When I use the tool, my goal is for _me_ to learn from _its_ mistakes, so I can get better at using the tool. Code I produce using an AI tool is my code. I don't produce it by directly writing it, but my techniques guide the tool through the generation process and I am responsible for the fitness and quality of the resulting code.

I accept that the tool doesn't learn like a human, just like I accept that my IDE or a screwdriver doesn't learn like a human. But I myself can improve the performance of the AI coding by developing my own skills through usage and then applying those skills.

timschmidt · 2025-11-25T02:14:29 1764036869

> It won't learn or grow from the corrections I make in that PR, so why bother?

That does not match my experience. As the codebases I've worked with LLMs on become more opinionated and stylized, it seems to to a better job of following the existing work. And over time the models have absolutely improved in terms of their ability to understand issues and offer solutions. Each new release has solved problems for me that the previous ones have struggled with.

Re: interpersonal interactions, I don't find that the LLM has pushed them out or away. My projects still have groups of interested folk who talk and joke and learn and have fun. What the LLMs have addressed for me in part is the relative scarcity of labor for such work. I'm not hacking on the Linux Kernel with 10,000 contributors. Even with a dozen contributors, the amount of contributed code is relatively low and only in areas they are interested in. The LLM doesn't mind if I ask it to do something super boring. And it's been surprisingly helpful in chasing down bugs.

> Maybe one day it'll reach perfect parity of what I could've written myself, but today isn't that day.

Regardless of whether or not that happens, they've already been useful for me for at least 9 months. Since O3, which is the first one that really started to understand Rust's borrow checker in my experience. My measure isn't whether or not it writes code as well as I do, but how productive I am when working with it compared to not. In my measurements with SLOCCount over the last 9 months, I'm about 8x more productive than the previous 15 years without (as long as I've been measuring). And that's allowed me to get to projects which have been on the shelf for years.

This article by an AI researcher I happen to have worked with neatly sums up feelings I've had about comments like yours: https://medium.com/@ahintze_23208/ai-or-you-who-is-the-one-w...

worble · 2025-11-15T00:33:59 1763166839

The CLAUDE.md is a bigger giveaway

worble · 2025-11-14T14:24:53 1763130293

>remember foobar

It's still being actively developed

worble · 2025-11-07T21:29:06 1762550946

Migadu, Fastmail, Protonmail, Zoho, Tutanota

These have all been running for many years and work fine, hell there's even the meme addresses at cock.li which has been running for over 10 years.

You don't need to be on a gmail account for reliable email.