My favourite example of the underlying probabilistic nature of LLMs is related t...

My favourite example of the underlying probabilistic nature of LLMs is related to a niche hobby of mine, English Change Ringing. Every time someone asks an LLM a question that requires more than a basic definition of what Change Ringing is, the result is hilarious. Not only do the answers suffer from factual hallucinations, they aren't even internally logically consistent. It's literally just probabilistic word soup, and glaringly obviously so.

Although there isn't a vast corpus on Method Ringing, there is a fair amount; the "rules" are online (https://framework.cccbr.org.uk/version2/index.html), Change ringing is based on pure maths (Group Theory) and has been linked with CS from when CS first started - it's mentioned in Knuth, and the Steinhaus–Johnson–Trotter algorithm for generating permutations wasn't invented by them in the 1960's, it was known to Change Ringers in the 1650's. Think of it of Towers of Hanoi with knobs on :-) So it would seem a good fit for automated reasoning, indeed such things already exist - https://ropley.com/?page_id=25777.

If I asked a non-ringing human to explain to me how to ring Cambridge Major, they'd say "Sorry, I don't know" and an LLM with insufficient training data would probably say the same. The problem is when LLMs know just enough to be dangerous, but they don't know what they don't know. The more abstruse a topic is, the worse LLMs are going to do at it, and it's precisely those areas where people are most likely to turn to them for answers. They'll get one that's grammatically correct and sounds authoritative - but they almost certainly won't know if it's nonsense.

Adding a "reliability" score to LLM output seems eminently feasible, but due to the hype and commercial pressures around the current generation of LLMs, that's never going to happen as the pressure is on to produce plausible sounding output, even if it's bullshit.

https://www.lawgazette.co.uk/news/appalling-high-court-judge...