More

emp17344 · 2026-02-23T01:50:29 1771811429

You certainly don’t need to brush for longer than 2 minutes. Overbrushing is a concern as well.

emp17344 · 2026-02-20T18:42:48 1771612968

Your entire argument is derived from a pseudoscientific field without any peer-reviewed research. Mechanistic interpretability is a joke invented by AI firms to sell chatbots.

jychang · 2026-02-21T01:24:54 1771637094

Lol that's a stupid ass response, especially when half the papers are from universities from China. You think the chinese universities are trying to sell ChatGPT subscriptions? Ridiculous. You're just falling behind in tech knowledge.

And apparently you think peer reviewed papers presented at NeurIPS or other conferences are considered pseudoscience. (For the people not versed in ML, NeurIPS is where the 2017 paper "Attention is All You Need" that started the modern ML revolution was presented)

https://neurips.cc/virtual/2023/poster/72666

https://jmlr.org/beta/papers/v26/23-0058.html

https://proceedings.mlr.press/v267/palumbo25a.html

https://iclr.cc/virtual/2026/poster/10011755

emp17344 · 2026-02-20T18:40:19 1771612819

Sounds like you agree this “evidence” lacks any semblance of scientific rigor?

mrbungie · 2026-02-20T21:05:56 1771621556

(Not GP) There was a well recognized reproducibility problem in the ML field before LLM-mania, and that's considering published papers with proper peer-reviews. The current state of afairs in some ways is even less rigourous than that, and then some people in the field feel free to overextend their conclusions into other fields like neurosciences.

ACCount37 · 2026-02-21T09:50:49 1771667449

Frankly, I don't see a reason to give a shit.

We're in the "mad science" regime because the current speed of progress means adding rigor would sacrifice velocity. Preprints are the lifeblood of the field because preprints can be put out there earlier and start contributing earlier.

Anthropic, much as you hate them, has some of the best mechanistic interpretability researchers and AI wranglers across the entire industry. When they find things, they find things. Your "not scientifically rigorous" is just a flimsy excuse to dismiss the findings that make you deeply uncomfortable.

emp17344 · 2026-02-20T18:39:18 1771612758

Mechanistic interpretability is a joke, supported entirely by non-peer reviewed papers released as marketing material by AI firms.

emp17344 · 2026-02-20T18:37:34 1771612654

Did you just invent a nonsense fallacy to use as a bludgeon here? “Stochastic parrot fallacy” does not exist, and there actually quite a bit of evidence supporting the stochastic parrot hypothesis.

LoganDark · 2026-02-21T00:16:01 1771632961

I imagine "stochastic parrot fallacy" could be their term for using the hypothesis to dismiss LLMs even where they can be useful; i.e., dismissing them for their weaknesses alone and ignoring their strengths. (Of course, we have no way to know for sure without their input.)

emp17344 · 2026-02-20T18:36:17 1771612577

That’s because there’s no objective research on this. Similarly, there are no good citations to support your objection. They simply don’t exist yet.

gaigalas · 2026-02-20T20:23:18 1771618998

Maybe not worth discussing something that cannot be objectively assessed then.

LoganDark · 2026-02-21T00:12:11 1771632731

Then don't; all I did was offer my thoughts in a public comments section.

emp17344 · 2026-02-20T18:33:30 1771612410

Oh, please. There’s always a way to blame the user, it’s a catch-22. The fact is that coding agents aren’t perfect and it’s quite common for them to fail. Refer to the recent C-compiler nonsense Anthropic tried to pull for proof.

cyberpunk · 2026-02-20T20:56:29 1771620989

It fails far less often than I do at the cookie cutter parts of my job, and it’s much faster and cheaper than I am.

Being honest; I probably have to write some properly clever code or do some actual design as a dev lead like… 2% of my time? At most? The rest of the code related work I do, it’s outperforming me.

Now, maybe you’re somehow different to me, but I find it hard to believe that the majority of devs out there are balancing binary trees and coming up with shithot unique algorithms all day rather than mangling some formatting and dealing with improving db performance, picking the right pattern for some backend and so on style tasks day to day.

emp17344 · 2026-02-19T22:32:22 1771540342

Ethical realists would disagree with you.

emp17344 · 2026-02-19T22:23:32 1771539812

Plagiarizing Stockfish doesn’t make me good at chess. Same principle applies.

emp17344 · 2026-02-19T22:22:33 1771539753

That’s a devastating benchmark design flaw. Sick of these bullshit benchmarks designed solely to hype AI. AI boosters turn around and use them as ammo, despite not understanding them.

famouswaffles · 2026-02-19T22:52:18 1771541538

Relax. Anyone who's genuinely interested in the question will see with a few searches that LLMs can play chess fine, although the post-trained models mostly seem to be regressed. Problem is people are more interested in validating their own assumptions than anything else.

https://arxiv.org/abs/2403.15498

https://arxiv.org/abs/2501.17186

https://github.com/adamkarvonen/chess_gpt_eval

runarberg · 2026-02-19T22:50:17 1771541417

I like this game between grok-4.1-fast and maia-1100 (engine, not LLM).

https://chessbenchllm.onrender.com/game/37d0d260-d63b-4e41-9...

This exact game has been played 60 thousand times on lichess. The peace sacrifice Grok performed on move 6 has been played 5 million times on lichess. Every single move Grok made is also the top played move on lichess.

This reminds me of Stefan Zweig’s The Royal Game where the protagonist survived Nazi torture by memorizing every game in a chess book his torturers dropped (excellent book btw. and I am aware I just committed Godwin’s law here; also aware of the irony here). The protagonist became “good” at chess, simply by memorizing a lot of games.

famouswaffles · 2026-02-19T22:57:03 1771541823

The LLMs that can play chess, i.e not make an illegal move every game do not play it simply by memorized plays.

dwohnitmok · 2026-02-20T01:16:22 1771550182

> That’s a devastating benchmark design flaw

I think parent simply missed until their later reply that the benchmark includes rated engines.