Hacker Newsnew | past | comments | ask | show | jobs | submit | orphea's commentslogin

  the poor guy
Do you mean the LLM?

Then they made it wrong. For example, "What the actual fuck?" is not getting flagged, neither is "What the *fuck*".

It is exceedingly obvious that the goal here is to catch at least 75-80% of negative sentiment and not to be exhaustive and pedantic and think of every possible way someone could express themselves.

Classic over-engineering. Their approach is just fine 90% of the time for the use case it’s intended for.

75-80% [1], 90%, 99% [2]. In other words, no one has any idea.

I doubt it's anywhere that high because even if you don't write anything fancy and simply capitalize the first word like you'd normally do at the beginning of a sentence, the regex won't flag it.

Anyway, I don't really care, might just as well be 99.99%. This is not a hill I'm going to die on :P

[1]: https://news.ycombinator.com/item?id=47587286

[2]: https://news.ycombinator.com/item?id=47586932


It compares to lowercase input, so doesn't matter. The rest is still valid

Except that it's a list of English keywords. Swearing at the computer is the one thing I'll hear devs switch back to their native language for constantly

They evidently ran a statistical analysis and determined that virtually no one uses those phrases as a quick retort to a model's unsatisfying answer... so they don't need to optimize for them.

It looks like it's just for logging, why does it need to block?

Better question - why would you call an LLM (expensive in compute terms) for something that a regex can do (cheap in compute terms)

Regex is going to be something like 10,000 times quicker than the quickest LLM call, multiply that by billions of prompts


This is assuming the regex is doing a good job. It is not. Also you can embed a very tiny model if you really want to flag as many negatives as possible (I don't know anthropic's goal with this) - it would be quick and free.

I think it's a very reasonable tradeoff, getting 99% of true positives at the fraction of cost (both runtime and engineering).

Besides, they probably do a separate analysis on server side either way, so they can check a true positive to false positive ratio.


  This is how others feel as well and how software engineering will feel for new generations
How can you make such universal statements? This is not true at all. There are plenty of people who find vibe coding mentally exhausting (not everyone wants to be a manager) and who think LLMs suck that joy that was left in programming.

No, but it has always been huge.

  > this sort of performance
They've been very proud of it.

Whatever? I too prefer two windows side by side and I don't see this feature useful, but if others do, that's great.

I didn't vote but I'll bite the bullet.

1. Do not relay LLM output. If someone wanted, they would use it, ChatGPT is free. Post your own, human, meaty thoughts.

2. The blog post explains all these technologies, one just need to read it further than the title. It might be a big ask here on HN, I know, but still.


Reproducing is absolutely not a copyright violation. Otherwise emulators would have no legal option to exist.

That is a question about which copyrights are enforced. Different question.

An emulator is not a reproduction of the thing it emulates.

I've been using 5.3-Codex. I cannot proof because it's subjective, but I have better results (you could say more reasonable) with it than 4.6 Opus.

GPT-5.4 one-shot a cross-language issue (a C++ repo + some amount of Lua), Opus kept hallutinating. This was debugging, not codegen.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: