More

kerridge0 · 2026-03-19T22:34:42 1773959682

The recent one was should I drive my car to the car wash if it's only 300 feet from my house although it wasn't a slam dunk.

Eridrus · 2026-03-20T12:44:14 1774010654

Right, but if these things are so rare that we all only know the one viral example, I feel like that lends credence to the models basically generally not having this problem.

Researchers built the Winnograd Schema Challenge more than a decade ago to assess common sense reasoning, and LLMs beat that challenge task around GPT 4.

ndsipa_pomu · 2026-03-20T13:35:17 1774013717

They're not so rare. Hallucinations have been spotted everywhere, but the "driving a car to the car wash" is an amusing one that's been recently publicised. Developers aren't going to point out every time an LLM hallucinates an entire library.

carlmr · 2026-03-20T14:35:02 1774017302

I'd add to this, any moderately involved logical or numerical problem causes hallucinations for me on all frontier models.

If you ask them in isolation they may write a script to solve it "properly", but I guess this is because they added enough of these to the training set. But this workaround doesn't scale.

As soon as I give the LLM a proper problem and a small part of it requires numeric reasoning, it almost always hallucinates something and doesn't solve it with a script.

If the logic/math is part of a larger problem the miss rate is near 100%.

LLMs have massive amounts of knowledge, encoded in verbal intelligence, but their logic intelligence is well below even average human intelligence.

If you look at how they work (tokenization and embeddings) it's clear that transformers will not solve the issue. The escape hatches only work very unreliably.

Eridrus · 2026-03-21T17:17:08 1774113428

What's a typical example?

I have been broadly quite happy with gpt 5.4 xhigh's reasoning on things like performance engineering tasks.

lps41 · 2026-03-20T00:05:16 1773965116

If you ask this of any current day AI it will answer exactly how you would expect. Telling you to drive, and acknowledging the comedic nature of the question.

batshit_beaver · 2026-03-20T00:57:06 1773968226

That's because AI labs keep stamping out the widely known failures. I assume without actually retraining the main model, but with some small classifier that detects the known meme questions and injects correct answer in the context.

But try asking your favorite LLM what happens if you're holding a pen with two hands (one at each end) and let go of one end.

snypher · 2026-03-20T03:29:07 1773977347

https://chatgpt.com/s/t_69bcbeeaa2f081918113f42940803007

Seems fine to me?

batshit_beaver · 2026-03-20T04:50:02 1773982202

Are you also an LLM? Do objects often begin rotating when you're only holding them with one hand?

carlmr · 2026-03-20T14:38:11 1774017491

Not unlikely that you're talking to a lot of AI-based AI boosters. It's easier to create astroturfed comments with chatbots than fixing the inherent problems.

sroussey · 2026-03-20T04:48:00 1773982080

I always like to ask AI to generate a middle aged blond man with gray hair. Turns out that all models with gray have black roots.

https://chatgpt.com/share/69bcd01a-a750-800d-95f5-3b840b9ee2...

https://gemini.google.com/share/edc223bb6291 (the try again gave a woman, oops)

Even Midjourney couldn't do it.

carlmr · 2026-03-20T14:39:56 1774017596

Nice. My test was always a blond bald guy. It always adds hair. If you ask for bald you get a dark haired bald guy, if you add blond, you can't get bald because I guess saying the hair color implies hair (on the head), while you may just want blonde eyebrows and/or blond stubble.

kerridge0 · 2026-03-13T19:50:52 1773431452

I believe that that's the stuff you buy in the shop, the non-refillable containers. If you buy a proper refillable balloon gas cylinder it's the higher grade stuff. Source: bought the shop stuff, got disappointed, bought the cylinder, happy.

kerridge0 · 2026-03-08T12:03:59 1772971439

And we had good skeletons - in 1985, Aardman Animations created this advert for VHS cassettes https://youtu.be/ffa1E9k3H4k

Angostura · 2026-03-08T13:03:18 1772974998

I can hear the voice of Derek Gyller without even clicking the link.

kerridge0 · 2026-02-01T23:28:41 1769988521

Focusme for windows has been around a long time.

kerridge0 · on Jan 7, 2022

Thanet?

kerridge0 · on Nov 5, 2021

Like a dog chasing it's tail. Or bundling them together like a dogpile.

kerridge0 · on April 22, 2020

Octopus energy in the UK with the beta agile tariff is based on 30 minute pricing and prices have been so low at times such as yesterday that the price per unit has been negative, for example on Sunday and Monday I was paid 4.2 pence per unit to take electricity off the network.

lovemenot · on April 23, 2020

Their web site advertises support for IFTTT. Has anyone here hacked on Octopus APIs? Looks very cool.

dharma1 · on April 23, 2020

Yeah I have a bit. I'm on their Agile energy with half hour pricing intervals (this is what the smart meter standards give).

It's really cool. Sometimes the price goes below zero - "plunge pricing" - if there is too much production and too little use. They give you the pricing info for today and tomorrow ahead of time, not sure how they predict future pricing - probably based on weather forecasts, since they only use solar and wind?

I have WiFi smart sockets on things like electric heaters to turn them on when electricity price is below my threshold level. It's a nice feeling being paid for using electricity. If you have an electric car you could also programmatically only charge it when electricity price is below a certain level. Octopus also have an EV leasing company, thinking of selling my current car and leasing one from them after covid19 is gone to a degree that I still need a car again.

Here are their dev docs:

https://developer.octopus.energy/docs/api/

They even have an open graphQL API and a storybook - how many utility companies you know who do this? Looks like Octopus hired the right people!

https://api.octopus.energy/v1/graphql/

https://octopus.energy/static/common/storybook/account-manag...

kerridge0 · on Jan 8, 2020

used to be one called the global ideas bank but went away presumably due to lack of funding. Founded by Nicholas Albery https://en.wikipedia.org/wiki/Nicholas_Albery

kerridge0 · on Nov 5, 2019

No because of the increased risk of death means you might get a worse reputation if people OD.

kerridge0 · on July 5, 2019

It's not an abomination it has some flaws which don't affect me, others that make me compromise a bit (drive more slowly, stop more often), but some reasons I chose it other than it being the only affordable electric car available that met all my requirements, was because of the bmw quality and attention to detail which are also hallmarks of the brand.