This matches my experience working with LLMs. I've built several applications th...

GavCo · on Jan 29, 2025

Interesting. I wonder if this is related to the model architecture and attention mechanism.

The author seems to be implying it could be: "Even a single mention of ‘code enhancement suggestions’ in our instructions seemed to hijack the model’s attention"

jimminyx · on Jan 29, 2025

The attention is probably just latching on to strong statistical patterns. Obvious errors create sharp spikes in attention weights, and drown out more subtle signals that can actually matter more

timbilt · on Jan 29, 2025

The weirdness of LLMs is that they're so damn good at so many things but then you see these glaring gaps that instantly make them seem dumb. We desperately need benchmarks and evals that test these kinds of hard to pin down cognitive abilities

turnsout · on Jan 29, 2025

Absolutely. This is not a new observation, but another thing they struggle with is self-reporting confidence intervals. When I've asked LLMs to classify/tag things along with a confidence metric, the number seems random and has no connection to the quality or difficulty of the classification.