More

orphea · 2026-03-31T12:24:20 1774959860

  the poor guy

Do you mean the LLM?

orphea · 2026-03-31T12:02:31 1774958551

Then they made it wrong. For example, "What the actual fuck?" is not getting flagged, neither is "What the *fuck*".

arcfour · 2026-03-31T13:42:51 1774964571

It is exceedingly obvious that the goal here is to catch at least 75-80% of negative sentiment and not to be exhaustive and pedantic and think of every possible way someone could express themselves.

Zamaamiro · 2026-03-31T13:55:25 1774965325

Classic over-engineering. Their approach is just fine 90% of the time for the use case it’s intended for.

orphea · 2026-03-31T14:19:55 1774966795

75-80% [1], 90%, 99% [2]. In other words, no one has any idea.

I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.

Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P

[1]: https://news.ycombinator.com/item?id=47587286

[2]: https://news.ycombinator.com/item?id=47586932

zwirbl · 2026-03-31T14:56:02 1774968962

It compares to lowercase input, so doesn't matter. The rest is still valid

morkalork · 2026-03-31T15:23:22 1774970602

Except that it's a list of English keywords. Swearing at the computer is the one thing I'll hear devs switch back to their native language for constantly

vntok · 2026-03-31T13:33:44 1774964024

They evidently ran a statistical analysis and determined that virtually no one uses those phrases as a quick retort to a model's unsatisfying answer... so they don't need to optimize for them.

orphea · 2026-03-31T11:58:31 1774958311

It looks like it's just for logging, why does it need to block?

jflynn2 · 2026-03-31T12:16:46 1774959406

Better question - why would you call an LLM (expensive in compute terms) for something that a regex can do (cheap in compute terms)

Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts

orphea · 2026-03-31T12:55:52 1774961752

This is assuming the regex is doing a good job. It is not. Also you can embed a very tiny model if you really want to flag as many negatives as possible (I don't know anthropic's goal with this) - it would be quick and free.

gf000 · 2026-03-31T13:17:14 1774963034

I think it's a very reasonable tradeoff, getting 99% of true positives at the fraction of cost (both runtime and engineering).

Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.

orphea · 2026-03-31T08:34:28 1774946068

  This is how others feel as well and how software engineering will feel for new generations

How can you make such universal statements? This is not true at all. There are plenty of people who find vibe coding mentally exhausting (not everyone wants to be a manager) and who think LLMs suck that joy that was left in programming.

orphea · 2026-03-29T09:47:50 1774777670

No, but it has always been huge.

orphea · 2026-03-27T15:12:38 1774624358

  > this sort of performance

They've been very proud of it.

orphea · 2026-03-25T09:18:36 1774430316

Whatever? I too prefer two windows side by side and I don't see this feature useful, but if others do, that's great.

orphea · 2026-03-23T16:36:54 1774283814

I didn't vote but I'll bite the bullet.

1. Do not relay LLM output. If someone wanted, they would use it, ChatGPT is free. Post your own, human, meaty thoughts.

2. The blog post explains all these technologies, one just need to read it further than the title. It might be a big ask here on HN, I know, but still.

orphea · 2026-03-19T18:26:14 1773944774

Reproducing is absolutely not a copyright violation. Otherwise emulators would have no legal option to exist.

designerarvid · 2026-03-19T20:23:05 1773951785

That is a question about which copyrights are enforced. Different question.

ErroneousBosh · 2026-03-19T21:44:01 1773956641

An emulator is not a reproduction of the thing it emulates.

orphea · 2026-03-19T14:17:23 1773929843

I've been using 5.3-Codex. I cannot proof because it's subjective, but I have better results (you could say more reasonable) with it than 4.6 Opus.

GPT-5.4 one-shot a cross-language issue (a C++ repo + some amount of Lua), Opus kept hallutinating. This was debugging, not codegen.