Right, but if these things are so rare that we all only know the one viral example, I feel like that lends credence to the models basically generally not having this problem.
Researchers built the Winnograd Schema Challenge more than a decade ago to assess common sense reasoning, and LLMs beat that challenge task around GPT 4.
They're not so rare. Hallucinations have been spotted everywhere, but the "driving a car to the car wash" is an amusing one that's been recently publicised. Developers aren't going to point out every time an LLM hallucinates an entire library.
I'd add to this, any moderately involved logical or numerical problem causes hallucinations for me on all frontier models.
If you ask them in isolation they may write a script to solve it "properly", but I guess this is because they added enough of these to the training set. But this workaround doesn't scale.
As soon as I give the LLM a proper problem and a small part of it requires numeric reasoning, it almost always hallucinates something and doesn't solve it with a script.
If the logic/math is part of a larger problem the miss rate is near 100%.
LLMs have massive amounts of knowledge, encoded in verbal intelligence, but their logic intelligence is well below even average human intelligence.
If you look at how they work (tokenization and embeddings) it's clear that transformers will not solve the issue. The escape hatches only work very unreliably.
If you ask this of any current day AI it will answer exactly how you would expect. Telling you to drive, and acknowledging the comedic nature of the question.
That's because AI labs keep stamping out the widely known failures. I assume without actually retraining the main model, but with some small classifier that detects the known meme questions and injects correct answer in the context.
But try asking your favorite LLM what happens if you're holding a pen with two hands (one at each end) and let go of one end.
Not unlikely that you're talking to a lot of AI-based AI boosters. It's easier to create astroturfed comments with chatbots than fixing the inherent problems.
Nice. My test was always a blond bald guy. It always adds hair. If you ask for bald you get a dark haired bald guy, if you add blond, you can't get bald because I guess saying the hair color implies hair (on the head), while you may just want blonde eyebrows and/or blond stubble.
I believe that that's the stuff you buy in the shop, the non-refillable containers. If you buy a proper refillable balloon gas cylinder it's the higher grade stuff. Source: bought the shop stuff, got disappointed, bought the cylinder, happy.
Octopus energy in the UK with the beta agile tariff is based on 30 minute pricing and prices have been so low at times such as yesterday that the price per unit has been negative, for example on Sunday and Monday I was paid 4.2 pence per unit to take electricity off the network.
Yeah I have a bit. I'm on their Agile energy with half hour pricing intervals (this is what the smart meter standards give).
It's really cool. Sometimes the price goes below zero - "plunge pricing" - if there is too much production and too little use. They give you the pricing info for today and tomorrow ahead of time, not sure how they predict future pricing - probably based on weather forecasts, since they only use solar and wind?
I have WiFi smart sockets on things like electric heaters to turn them on when electricity price is below my threshold level. It's a nice feeling being paid for using electricity. If you have an electric car you could also programmatically only charge it when electricity price is below a certain level. Octopus also have an EV leasing company, thinking of selling my current car and leasing one from them after covid19 is gone to a degree that I still need a car again.
It's not an abomination it has some flaws which don't affect me, others that make me compromise a bit (drive more slowly, stop more often), but some reasons I chose it other than it being the only affordable electric car available that met all my requirements, was because of the bmw quality and attention to detail which are also hallmarks of the brand.