I feel his need to be right distracts from the fact that he is. It’s interesting to think about what a hybrid symbolic/transformer system could be. In a linked post he showed that by effectively delegating math to Python is what made Grok 4 so successful at math. I’d personally like to see more of what a symbolic first system would look like, effectively hard math with monads for where inference is needed.
Aloe's neurosymbolic system just beat OpenAI's deep research score on the GAIA benchmark by 20 points. While Gary is full of bluster, he does know a few things about the limitations of LLMs. :) (aloe.inc)
Yeah there was on old paper that blew math/physics benchmarks out of the water by letting the LLM write code and having the physics engine execute it. I don't have a link to it off my head but that seems to be the right directly.
LLM + general tool use seems to be quite effective.