What exactly did 2025 AI hallucinate for you? The last time I've seen a hallucination from these things was a year ago. For questions that a kid or a student is going to answer im not sure any reasonable person should be worried about this.
Just a couple of days ago, I submitted a few pages from the PDF of a PhD thesis written in French to ChatGPT, asking it to translate them into English. The first 2-3 pages were perfect, then the LLM started hallucinating, putting new sentences and removing parts. The interesting fact is that the added sentences were correct and generally on the spot: the result text sounded plausible, and only a careful comparison of each sentence revealed the truth. Near the end of the chapter, virtually nothing of what ChatGPT produced was directly related to the original text.
Transformer models are excellent at translation, but next-token prediction is not the correct architecture for it. You want something more like seq2seq. Next token prediction cares more about local consistency (i.e., going off on a tangent with a self-consistent but totally fabricated "translation") than faithfulness.
I use it every day for work and every day it gets stuff wrong of the "that doesn't even exist" variety. Because I'm working on things that are complex + highly verifiable, I notice.
Sure, Joe Average who's using it to look smart in Reddit or HN arguments or to find out how to install a mod for their favorite game isn't gonna notice anymore, because it's much more plausible much more often than two years ago, but if you're asking it things that aren't trivially easy for you to verify, you have no way of telling how frequently it hallucinates.
OpenAI's o3/40 models completely spun out when I was trying to write a tiny little TUI with ratatui, couldn't handle writing a render function. No idea why, spent like 15 minutes trying to get it to work, ended up pulling up the docs..
I haven't spent any money with claude on this project and realistically it's not worth it, but I've run into little things like that a fair amount.
>Thanks all for the replies, we’re hardcoding fixes now
-LLM devcos
Jokes aside, get deep into the domains you know. Or ask to give movie titles based on specific parts of uncommon films. And definitely ask for instructions using specific software tools (“no actually Opus/o3/2.5, that menu isn’t available in this context” etc.).
Are you using them daily? I find that maybe 3 or 4 programming questions I ask per day, it simply cannot provide a correct answer even after hand holding. They often go to extreme gymnastics to try to gaslight you no matter how much proof you provide.
For example, today I was asking a LLM about how to configure a GH action to install a SDK version that was just recently out of support. It kept hallucinating on my config saying that when you provide multiple SDK versions in the config, it only picks the most recent. This is false. It's also mentioned in the documentation specifically, which I linked the LLM, that it installs all versions you list. Explaining this to copilot, it keeps doubling down, ignoring the docs, and even going as far as asking me to have the action output the installed SDKs, seeing all the ones I requested as installed, then gaslighting me saying that it can print out the wrong SDKs with a `--list-sdks` command.
ChatGPT hallucinates things all the time. I will feed it info on something and have a conversation. At first it's mostly fine, but eventually it starts just making stuff up.
Two days ago when my boomer mother in law tried to justify her anti-cancer diet that killed Steve Jobs. On the bright side my partner will be inheriting soon by the looks of it.
Not defending your mother-in-law here (because I agree with you that it is a pretty silly and maybe even potentially harmful diet), afaik it wasn’t the diet itself that killed Steve Jobs. It was his decision to do that diet instead of doing actual cancer treatment until it was too late.
>>No jumping off high buildings is perfectly safe as long as you land skillfully.
Not really, because no matter how you spin it, the person in your scenario dies.
However, doing Steve Jobs’ diet might actually be fine (or at least not deadly) for an average person. Only as long as they don’t have late-stage pancreatic cancer and don’t decide to forego chemotherapy treatment.
Which is what killed Jobs, not the diet. For all we know, he might’ve been alive today even if he followed the same diet, as long as he also did the chemo treatment.
Indeed if you're a base jumper with a parachute, you might survive the landing.
Ackshually, this seems analogous to Job's diet and refusal of cancer treatment! And it was the cancer that put him at the top of the building in the first place.
The anti cancer diet absolutely works if you want to reduce the odds of getting cancer. It probably even works to slow cancer compared to the average American diet.
Will it stop and reverse a cancer? Probably not.
I thought it was high fiber diets that reduce risk of cancer (ever so slightly), because of reduced inflammation. Not fruity diets, which are high in carbohydrates.
Most (all?) AI models I work with are literally deterministic. If you give it the same exact input, you get the same exact output every single time.
What most people call “non-deterministic” in AI is that one of those inputs is a _seed_ that is sourced from a PRNG because getting a different answer every time is considered a feature for most use cases.
Edit: I’m trying to imagine how you could get a non-deterministic AI and I’m struggling because the entire thing is built on a series of deterministic steps. The only way you can make it look non-deterministic is to hide part of the input from the user.
This is an incredibly pedantic argument. The common interfaces for LLMs set their temperature value to non-zero, so they are effectively non-deterministic.
Unless something has fundamentally changed since then (which I've not heard about) all sparse models are only deterministic at the batch level, rather than the sample level.