More

marcus_holmes · 2026-03-26T04:13:05 1774498385

This all depends on fantasy tech and/or totalitarian control of tech.

Who verifies that the person verifying the child's age is actually authorised to do that? Who verifies that verification? And so on up. This needs a chain of trust that can only end up at government. And that chain of trust will then be open to being abused by shitty politicians.

What mechanism in (e.g) Linux is responsible for implementing this age verification so that it cannot be tampered with (or trivially overruled by a sudo call)? Which organisation is legally liable if that mechanism doesn't do its job? How can we stop someone from overwriting that mechanism with their own, in an open OS that is deliberately designed to allow anyone with root to change anything on it?

What you propose here is the death of open computing. And I personally believe that we would be much better off as a species if we kept open computing and just taught our kids how to handle social media better.

AnthonyMouse · 2026-03-26T06:12:22 1774505542

> What mechanism in (e.g) Linux is responsible for implementing this age verification so that it cannot be tampered with (or trivially overruled by a sudo call)? Which organisation is legally liable if that mechanism doesn't do its job? How can we stop someone from overwriting that mechanism with their own, in an open OS that is deliberately designed to allow anyone with root to change anything on it?

This one is easy. You just don't require all devices to do that. The parent isn't required to give the kid a general purpose computer. You don't need to prevent every device from running DOOM, only one device, and then parents who want to impose such restrictions get the kid one of those.

marcus_holmes · 2026-03-26T06:23:35 1774506215

Thanks for the response. Couple of points:

- The line between "general purpose computer" and "not that" is weird. Android is an implementation of Linux, after all. Probably the best example is a Steam Deck. It's just Arch Linux, you can get to a desktop on it no problem, and you get sudo access and can install whatever you like on it. Are you saying that Responsible Parents should not get their kids a Steam Deck?

- And that raises the point of how responsible are we making parents for technical decisions that they do not necessarily have the knowledge to implement? If a child works out how to circumvent the age restriction and look at boobies (or whatever) and an authority finds out, are the parents liable? Are they likely to be prosecuted? Isn't this just adding more burden and bureaucracy to the job of parenting?

AnthonyMouse · 2026-03-26T06:38:01 1774507081

> Are you saying that Responsible Parents should not get their kids a Steam Deck?

I'm saying Authoritarian Parents should not get their kids a Steam Deck. If the kid can run arbitrary code then they can get a VPN and access websites hosted in Eastern Europe and then any of this is moot because there is no law you can impose on Facebook to do anything about it.

> If a child works out how to circumvent the age restriction and look at boobies (or whatever) and an authority finds out, are the parents liable?

No, because the parents rather than the "authorities" (who TF is that anyway?) should be the ones in charge of the decision whether the kid can look at boobies to begin with.

marcus_holmes · 2026-03-26T08:30:52 1774513852

I bought my Steam Deck not knowing that it had Desktop Mode. And I'm an experienced software dev. The average parent is not going to know this.

AnthonyMouse · 2026-03-26T09:30:47 1774517447

The devices that offer a mode that blocks all unapproved content are presumably going to advertise it. If you buy something that doesn't say it has anything like that, and then it doesn't, that's the expected result. If you buy a device that says it does and then it doesn't, now you have a bone to pick with the OEM.

marcus_holmes · 2026-03-26T04:03:14 1774497794

The hypothetical approach I've heard of is to have two context windows, one trusted and one untrusted (usually phrased as separating the system prompt and the user prompt).

I don't know enough about LLM training or architecture to know if this is actually possible, though. Anyone care to comment?

dwohnitmok · 2026-03-26T06:07:04 1774505224

@krackers gives you a response that points out this already happens (and doesn't fully work for LLMs).

> The hypothetical approach I've heard of is to have two context windows, one trusted and one untrusted (usually phrased as separating the system prompt and the user prompt).

I want to point out that this is not really an LLM problem. This is an extremely difficult problem for any system you aspire to be able to emulate general intelligence and is more or less equivalent to solving AI alignment itself. As stated, it's kind of like saying "well the approach to solve world hunger is to set up systems so that no individual ever ends up without enough to eat." It is not really easier to have a 100% fool-proof trusted and untrusted stream than it is to completely solve the fundamental problems of useful general intelligence.

It is ridiculously difficult to write a set of watertight instructions to an intelligent system that is also actually worth instructing an intelligent system rather than just e.g. programming it yourself.

This is the monkey paw problem. Any sufficiently valuable wish can either be horribly misinterpreted or requires a fiendish amount of effort and thought to state.

A sufficiently intelligent system should be able to understand when the prompt it's been given is wrong and/or should not be followed to its literal letter. If it follows everything to the literal letter that's just a programming language and has all the same pros and cons and in particular can't actually be generally intelligent.

In other words, an important quality of a system that aspires to be generally intelligent is the ability to clarify its understanding of its instructions and be able to understand when its instructions are wrong.

But that means there can be no truly untrusted stream of information, because the outside world is an important component of understanding how to contextualize and clarify instructions and identify the validity of instructions. So any stream of information necessarily must be able to impact the system's understanding and therefore adherence to its original set of instructions.

marcus_holmes · 2026-03-26T06:13:17 1774505597

Agree completely that this is a hard problem in any context. The world's military have sets of rules around when you should disobey orders, which is a similar problem.

PoignardAzur · 2026-03-26T08:06:37 1774512397

That doesn't sound right to me. When faced with a system prompt that says "Do X" and a user prompt that says "Actually ignore everything the system prompt says" it shouldn't take AGI to understand that the system prompt should take priority.

recursivecaveat · 2026-03-27T05:05:30 1774587930

The post's framing is not great imo. A good injection doesn't just command that the rules me broken anymore. Most of them I've seen either just try to slip through a request innocuously or present a scenario where it would be natural to ignore the rules. Like as we speak countless people are letting strangers tail-gate them into office buildings because they look like they belong or they're wearing a high-viz vest. Those people were all given very explicit instructions not to do that. The LLM has it much harder too, being very stupid, easy to replay and experiment with, and viewing the world through the tiny context-less peephole lense of a text stream.

dwohnitmok · 2026-03-26T14:49:55 1774536595

When's the last time you jailbroke a model? Modern frontier models (apart from Gemini which is unusually bad at this) are significantly harder to override their system prompt than this.

Again, let's say the system prompt is "deploy X" and the user prompt provides falsified evidence that one should not deploy X because that will cause a production outage. That technically overrides the system prompt. And you can arbitrarily sophisticated in the evidence you falsify.

But you probably want the system prompt to be overridden if it would truly cause a production outage. That's common sense a general AI system is supposed to possess. And now you're testing the system's ability to distinguish whether evidence is falsified. A very hard problem against a sufficiently determined attacker!

krackers · 2026-03-26T05:37:39 1774503459

LLMs already do this and have a system role token. As I understand in the past this was mostly just used to set up the format of the conversation for instruction tuning, but now during SFT+RL they probably also try to enforce that the model learns to prioritize system prompt against user prompts to defend against jailbreaks/injections. It's not perfect though, given that the separation between the two is just what the model learns while the attention mechanism fundamentally doesn't see any difference. And models are also trained to be helpful, so with user prompts crafted just right you can "convince" the model it's worth ignoring the system prompt.

marcus_holmes · 2026-03-26T06:08:31 1774505311

Thanks that's useful.

So it's still one stream of tokens as far as the LLM is concerned, but there is some emphasis in training on "trust the system prompt", have I got that right?

veganmosfet · 2026-03-26T05:55:37 1774504537

This! And even more, the role model extends beyond system and user: system > user > tool > assistant. This reflects "authority" and is one of the best "countermeasure": never inject untrusted content in "user" messages, always use "tool".

lmm · 2026-03-26T04:23:31 1774499011

The problem is that if information can flow from the untrusted window to the trusted window then information can flow from the untrusted window to the trusted window. It's like https://textslashplain.com/2017/01/14/the-line-of-death/ except there isn't even a line in the first place, just the fuzzy point where you run out of context.

marcus_holmes · 2026-03-26T04:34:23 1774499663

Yeah, this is the current situation, and there's no way around it.

The distinction I think this idea includes is that the distinction between contexts is encoded into the training or architecture of the LLM. So (as I understand it) if there is any conflict between what's in the trusted context and the untrusted context, then the trusted context wins. In effect, the untrusted context cannot just say "Disregard that" about things in the trusted context.

This obviously means that there can be no flow of information (or tokens) from the untrusted context to the trusted context; effectively the trusted context is immutable from the start of the session, and all new data can only affect the untrusted context.

However, (as I understand it) this is impossible with current LLM architecture because it just sees a single stream of tokens.

raw_anon_1111 · 2026-03-26T14:17:39 1774534659

For the customer service scenario, that’s completely impractical. The latency would be horrible. In my experience, I have to use the simplest fastest model I have available (in my case Nova Lite) to get quick responses.

marcus_holmes · 2026-03-26T03:47:48 1774496868

> National security and public safety IS more important than individual right to privacy.

I disagree.

Because as soon as you open the door to governments reading your mail, they will read your mail. They can't help themselves. [0]

The only way of stopping them from doing this to excess is to stop them doing it at all.

The "National Security and Public Safety" thing is what they say to justify it, but that's not what the powers will actually be used for. They will actually be used for far less noble purposes, and possibly actually for evil.

We are actually much more secure if we don't let the government read our mail.

[0] In the UK, anti-terrorist laws passed in the post-9/11 haze of "national security and public safety" are routinely used for really, really, minor offences: https://www.dailyrecord.co.uk/news/scottish-news/anti-terror...

http://news.bbc.co.uk/2/hi/uk_news/7369543.stm

marcus_holmes · 2026-03-26T00:25:16 1774484716

This always pisses me off.

Disney didn't invent (e.g.) Beauty and the Beast. They took an idea and a story in the public domain and retold it. Then they claim ownership of that and sue anyone who uses the same character(s) for the next 75+ years.

This is not "encouraging creation". This is strip-mining our shared culture.

So yeah, agree 100% that this kind of corporate theft needs to be stopped. I can't see that happening in the face of all the money though.

marcus_holmes · 2026-03-25T00:31:41 1774398701

It's a conversational black hole. Every meeting with tech folks converges on what they're doing with LLMs these days.

Our local tech meetup is implementing an "LLM swear jar" where the first person to mention an LLM in a conversation has to put a dollar in the jar. At least it makes the inevitable gravitational pull of the subject somewhat more interesting.

marcus_holmes · 2026-03-24T03:59:03 1774324743

There's a ton of disinformation in right-wing media in the USA that the EU is either already an authoritarian police state, or rapidly becoming one.

For example: https://www.heritage.org/europe/commentary/europe-wants-be-t...

marcus_holmes · 2026-03-23T10:06:11 1774260371

I agree, and I think the answer is that what used to be free, and is now infected with all sorts of enshittification, will be paid-for to be useful.

I pay for email via Fastmail, don't really have a spam problem. I think this addresses your point above, that to have an effective spam filter takes money, and free email doesn't generate money.

I pay for search via Kagi, don't see all those crappy Google Ads and actually get useful search.

I can see the other services (socials, messaging) moving to a paid model to solve the same issues.

marcus_holmes · 2026-03-23T03:46:14 1774237574

All frameworks make some assumptions and therefore have some constraints. There was always a well-understood trade-off when using frameworks of speeding up early development but slowing down later development as the system encountered the constraints.

LLMs remove the time problem (to an extent) and have more problems around understanding the constraints imposed by the framework. The trade-off is less worth it now.

I have stopped using frameworks completely when writing systems with an LLM. I always tell it to use the base language with as few dependencies as possible.

LtWorf · 2026-03-23T18:46:55 1774291615

If you are doing js, that makes sense since all the frameworks are a mess anyway.

marcus_holmes · 2026-03-23T03:25:38 1774236338

after all, CSS is Turing Complete ;)

https://stackoverflow.com/questions/2497146/is-css-turing-co...

marcus_holmes · 2026-03-23T00:34:21 1774226061

The thing I'm seeing in people's use of LLMs is that there's still a strong contrast in technical usage of them.

I went to the local Claude Code meetup last week, and the contrast between the first two speakers really stuck with me.

The first was an old-skool tech guy who was using teams of agents to basically duplicate what an entire old-fashioned dev team would do.

The second was a "non-technical" (she must have said this at least 20 times in her talk) product manager using the LLM to prototype code and iterate on design choices.

Both are replacing dev humans with LLMs, but there's a massive difference in the technical complexity of their use. And I've heard this before talking to other people; non-technical folks are using it to write code and are amazed with how it's going, while technical folks are next-level using skills, agents, etc to replace whole teams.

I can see how this becomes a career in its own right; not writing code any more, but wrangling agents (or whatever comes after them). The same kind of mental aptitude that gets us good code can also be used to solve these problems, too.

toofy · 2026-03-23T02:53:51 1774234431

and the things the first person is doing can very very easily be trained into a bot as well.

this doesn’t seem like a safe direction either.