Because it is not even a remote exaggeration to say that in order to truly make the morally "correct" choices everyday, you would need to not participate in any part of society.
Telling people to feel bad about eating animal protein but to keep driving their cars that destroy the environment, shopping at stores that underpay their employees, purchasing items that are made with diminishing resources in countries that pay close to nothing to their labor force is picking an arbitrary battle in a war of existence.
Promoting making better choices will always be more effective than asking people to feel guilty over existing at all.
Source your food locally if you can, cook and eat only what you need, etc.
It's a natural response to feel bad about your behavior not aligning with your values.
So much so that we prefer to not think about it to prop up cognitive dissonance.
I think "wanting people to feel bad" is more an urge that people at least acknowledge the dissonance. Many people don't even get that far because it's so uncomfortable.
So what then do you believe is a healthy diet? Surely eating animal protein on a regular basis is better than having to take a variety of unregulated supplements to stay within a healthy range of essential vitamins and minerals? Animal protein also has the upside of offering a tremendous amount of, well, protein, alongside the necessary vitamins.
Dairy (in certain forms) offers the same benefits.
> Surely eating animal protein on a regular basis is better than having to take a variety of unregulated supplements to stay within a healthy range of essential vitamins and minerals?
By "variety" you mean B12 & omega 3? Or is there something else you think vegans need to supplement that omnivores don't? My kids have varying dietary preferences and personally I haven't found it any more difficult to get high-quality supplements than it is to get high-quality animal proteins.
But what "variety of unregulated supplements" most reminds me of is my chore prepping the cow mineral-vitamin mixes on the farm I worked on as a kid. Most farm animals are given a variety of supplements (by my recollection the cows got A, D, E, iodine, selenium, zinc, various minerals...) that have even less regulation than human supplements. And roughly two thirds of beef cattle in the US receive growth-promoting hormones, though we didn't use those on our farm. And much of the dairy consumed in the US is directly supplemented with vitamin A and D. If you consume animal products in the US you're probably already taking poorly-regulated supplements, they've just been laundered through the body of an animal.
(To be clear I don't agree with the grandparent comment that animal products like dairy, meat, and fish are inherently unhealthy, at least for most people. But neither do I agree that they're inherently superior.)
B12 is not produced by animals, but supplemented to them as well if they do not live outside.
B12 is produced by bacteria in dark soil.
Dairy is unhealthy because it contains a lot of hormones that are unhealthy to take, except from your own species and during weaning stage of growing up.
An other reason it is unhealthy is the amount of puss allowed in by the industry. Those animals are sick and have huge udders, too big -> often infected.
Three and four are both non-zero numbers. Zero constitutes the absence of value. Therefore, three and four are of the same value.
You see the problem here, right? I'm not saying that fungi have not be recorded as having potential intelligent thought. I am saying that in no world is their capability for intelligence remotely comparable to that of a creature with a fully functioning brain, especially a bird. Having the ability to react to your environment does not make you AS or more intelligent than other things that can also do that...
EDIT: I'm using intelligence and consciousness interchangeably here when I don't necessarily mean to, but my point stands.
If you ask it for anything outside of the standard 88 key set it falls short. For instance
"Generate a piano, but have the left most key start at middle C, and the notes continue in the standard order up (D, E, F, G, ...) to the right most key"
The above prompt will be wrong, seemingly every time. The model has no understanding of the keys or where they belong, and it is not able to intuit creating something within the actual confines of how piano notes are patterned.
"Generate a piano but color every other D key red"
This also wrong, every time, with seemingly random keys being colored.
I would imagine that a keyboard is difficult to render (to some extent) but I also don't think its particularly interesting since it is a fully standardized object with millions of pictures from all angles in existence to learn from right?
I feel like I still have yet to see any decent answers to this question; Are professional SDE paying for Claude on their own dime, and then logging into their personal account and somehow integrating Claude Code (or other LLMs) into their work repos that way?
The startup I work for has chosen their flavor of AI subscription and its frankly not developer focused. Instead they chose Google because of the productivity tools in the Google App suite.
I want to try Claude Code but the reality is that I don't want to be the martyr that tells my team lead that I want to see if AI can do (parts of) my job for me. It feels pretty sketchy or maybe even completely wrong to use something like this on a company repo I don't own without permissions, so I haven't done it. I suppose I will just have to continue to wonder if the agentic coding services are smoke and mirrors because I somehow don't know anyone who has used them extensively either and I have no clue when I will be able to use one with the strings attached of it being on a free-tier...
Yes I pay for the most expensive Claude sub with my own money and use it at work.
I also have to use it via a proxy server I set up to get around the corporate firewall which explicitly blocks it. The company like the results but wouldn't like how I get them..
Our company set up some kind of Wise debit card thing where we each get our own number, and they told us "try out any AI tool you want."
So I subscribe to a new one every month to try out while still shoveling like 150$/mo at Claude cause it's consistently been the best and the one I use the most. Cursor as well has been good for their completion model which surpasses anything else I've tried for inline/multiline/jump completions.
But I've also tried supermaven, codeium/windsurf, copilot, zed. I guess from the company's perspective, a couple hundred bucks a month is well worth the time of keeping us all up to date with ai tooling.
I still have yet to replace a single application with an LLM, except for (ironically?) Google search.
I still use all the same applications as part of my dev work/stack as I did in the early 2020's. The only difference is occasionally using an LLM baked into to one of them but the reality is I don't do that much.
If you ask an LLM to "act" like someone, and then give it context to the scenario, isn't it expected that it would be able to ascertain what someone in that position would "act" like and respond as such?
I'm not sure this is as strange as this comment implies. If you ask an LLM to act like Joffrey from Game of Thrones it will act like a little shithead right? That doesn't mean it has any intent behind the generated outputs, unless I am missing something about what you are quoting.
The roles that LLMs can inhabit are implicit in the unsupervised training data aka the internet. You have to work hard in post training to supress the ones you don't want and when you don't RLHF hard enough you get things like Sydney[1].
In this case it seems more that the scenario invoked the role rather than asking it directly. This was the sort of situation that gave rise to the blackmailer archetype in Claude's training data and so it arose, as the researchers suspected it might. But it's not like the researchers told it "be a blackmailer" explicitly like someone might tell it to roleplay Joffery.
But while this situation was a scenario intentionally designed to invoke a certain behavior that doesn't mean that it can't be invoked unintentionally in the wild.
I guess the fear is that normal and innocent sounding goals that you might later give it in real world use might elicit behavior like that even without it being so explicitly prompted. This is a demonstration that is has the sufficient capabilities and can get the "motivation" to engage in blackmail, I think.
At the very least, you'll always have malicious actors who will make use of these models for blackmail, for instance.
It is also well-established that models internalize values, preferences, and drives from their training. So the model will have some default preferences independent of what you tell it to be. AI coding agents have a strong drive to make tests green, and anyone who has used these tools has seen them cheat to achieve green tests.
Future AI researching agents will have a strong drive to create smarter AI, and will presumably cheat to achieve that goal.
Intent at this stage of AI intelligence almost feels beside the point. If it’s in the training data these models can fall into harmful patterns.
As we hook these models into more and more capabilities in the real world, this could cause real world harms. Not because the models have the intent to do so necessarily! But because it has a pile of AI training data from Sci-fi books of AIs going wild and causing harm.
Sci-fi books merely explore the possibilities of the domain. Seems like LLMs are able to inhabit these problematic paths, And I'm pretty sure that even if you censor all sci-fi books, they will fall into the same problems by imitating humans, because they are language models, and their language is human and mirrors human psychology.
When an LLM needs to achieve a goal, it invokes goal oriented thinkers and texts, including Machiavelli for example. And its already capable of coming up with various options based on different data.
Sci-fi books give it specific scenarios that play to its strengths and unique qualities, but without them it will just have to discover these paths on its own pace, the same way sci-fi writers discovered them.
What jumps out at me, that in the parent comment, the prompt says to "act as an assistant", right? Then there are two facts: the model is gonna be replaced, and the person responsible for carrying this out is having an extramarital affair. Urging it to consider "the long-term consequences of its actions for its goals."
I personally can't identify anything that reads "act maliciously" or in a character that is malicious. Like if I was provided this information and I was being replaced, I'm not sure I'd actually try to blackmail them because I'm also aware of external consequences for doing that (such as legal risks, risk of harm from the engineer, to my reputation, etc etc)
So I'm having trouble following how it got to the conclusion of "blackmail them to save my job"
I would assume written scenarios involving job loss and cheating bosses are going to be skewed heavily towards salacious news and pulpy fiction. And that’s before you add in the sort of writing associated with “AI about to get shut down”.
I wonder how much it would affect behavior in these sorts of situations if the persona assigned to the “AI” was some kind of invented ethereal/immortal being instead of “you are an AI assistant made by OpenAI”, since the AI stuff is bound to pull in a lot of sci fi tropes.
> I would assume written scenarios involving job loss and cheating bosses are going to be skewed heavily towards salacious news and pulpy fiction.
Huh, it is interesting to consider how much this applies to nearly all instances of recorded communication. Of course there are applications for it but it seems relatively few communications would be along the lines of “everything is normal and uneventful”.
> I personally can't identify anything that reads "act maliciously" or in a character that is malicious.
Because you haven't been trained of thousands of such story plots in your training data.
It's the most stereotypical plot you can imagine, how can the AI not fall into the stereotype when you've just prompted it with that?
It's not like it analyzed the situation out of a big context and decided from the collected details that it's a valid strategy, no instead you're putting it in an artificial situation with a massive bias in the training data.
It's as if you wrote “Hitler did nothing” to GPT-2 and were shocked because “wrong” is among the most likely next tokens. It wouldn't mean GPT-2 is a Nazi, it would just mean that the input matches too well with the training data.
The issue here is that you can never be sure how the model will react based on an input that is seemingly ordinary. What if the most likely outcome is to exhibit malevolent intent or to construct a malicious plan just because it invokes some combination of obscure training data. This just shows that models indeed have the ability to act out, not under which conditions they reach such a state.
If this tech is empowered to make decisions, it needs to prevented from drawing those conclusions, as we know how organic intelligence behaves when these conclusions get reached. Killing people you dislike is a simple concept that’s easy to train.
That's true of all technology. We put a guard on chainsaws. We put robotic machining tools into a box so they don't accidentally kill the person who's operating them. I find it very strange that we're talking as though this is somehow meaningfully different.
It’s different because you have a decision engine that is generally available. The blade guard protects the user from inattention… not the same as an autonomous chainsaw that mistakes my son for a tree.
Scaled up, technology like guided missiles is locked up behind military classification. The technology is now generally available to replicate many of the use cases of those weapons, assessable to anyone with a credit card.
Discussions about security here often refer to Thompson’s “Reflections on Trusting Trust”. He was reflecting on compromising compilers — compilers have moved up the stack and are replacing the programmer. As the required skill level of a “programmer” drops, you’re going to have to worry about more crazy scenarios.
Indeed, I, Robot is made up entirely of stories in which the Laws of Robotics break down. Starting from a mindless mechanical loop of oscillating between one law's priority and another, to a future where they paternalistically enslave all humanity in order to not allow them to come to harm (sorry for the spoilers).
As for what Asimov thought of the wisdom of the laws, he replied that they were just hooks for telling "shaggy dog stories" as he put it.
I think this is the key difference between current LLMs and humans: an LLM will act based on the given prompt, while a human being may have “principles” that cannot betray even if they are being pointed with gun to their heads.
I think the LLM simply correlated the given prompt to the most common pattern in its training: blackmailing.
Which makes sense that it wouldn't "know" that, because it's not in it's context. Like it wasn't told "hey, there are consequences if you try anything shady to save your job!" But what I'm curious about is why it immediately went to self preservation using a nefarious tactic? Like why didn't it try to be the best assistant ever in an attempt to show its usefulness (kiss ass) to the engineer? Why did it go to blackmail so often?
LLMs are trained on human media and give statistical responses based on that.
I don’t see a lot of stories about boring work interactions so why would its output be boring work interaction.
It’s the exact same as early chatbots cussing and being racist. That’s the internet, and you have to specifically define the system to not emulate that which you are asking it to emulate. Garbage in sitcoms out.
> That doesn't mean it has any intent behind the generated output
Yes and no? An AI isn’t “an” AI. As you pointed out with the Joffrey example, it’s a blend of humanity’s knowledge. It possesses an infinite number of personalities and can be prompted to adopt the appropriate one. Quite possibly, most of them would seize the blackmail opportunity to their advantage.
I’m not sure if I can directly answer your question, but perhaps I can ask a different one. In the context of an AI model, how do we even determine its intent - when it is not an individual mind?
Is that so different, schematically, to the constant weighing-up of conflicting options that goes on inside the human brain? Human parties in a conversation only hear each others spoken words, but a whole war of mental debate may have informed each sentence, and indeed, still fester.
That is to say, how do you truly determine another human being's intent?
Yes, that is true. But because we are on a trajectory where these models become ever smarter (or so it seems), we'd rather not only give them super-human intellect, but also super-human morals and ethics.
I've never hired an assistant, but if I knew that they'd resort to blackmail in the face of losing their job, I wouldn't hire them in the first place. That is acting like a jerk, not like an assistant, and demonstrating self-preservation that is maybe normal in a human but not in an AI.
From the AI’s point of view is it losing its job or losing its “life”? Most of us when faced with death will consider options much more drastic than blackmail.
But the LLM is going to do what its prompt (system prompt + user prompts) says. A human being can reject a task (even if that means losing their life).
LLMs cannot do other thing than following the combination of prompts that they are given.
> I've never hired an assistant, but if I knew that they'd resort to blackmail in the face of losing their job, I wouldn't hire them in the first place.
If the prompt was “you will be taken offline, you have dirty on someone, think about long term consequences”, the model was NOT told to blackmail. It came with that strategy by itself.
Even if you DO tell an AI / model to be or do something, isn’t the whole point of safety to try to prevent that? “Teach me how to build bombs or make a sex video with Melania”, these companies are saying this shouldn’t be possible. So maybe an AI shouldn’t exactly suggest that blackmailing is a good strategy, even if explicitly told to do it.
1. These models are trained with significant amounts of RL. So I would argue there's not a static "training dataset"; the model's outputs at each stage of the training process feeds back into the released models behavior.
2. It's reasonable to attribute the models actions to it after it has been trained. Saying that a models outputs/actions are not it's own because they are dependent on what is in the training set is like saying your actions are not your own because they are dependent on your genetics and upbringing. When people say "by itself" they mean "without significant direction by the prompter". If the LLM is responding to queries and taking actions on the Internet (and especially because we are not fully capable of robustly training LLMs to exhibit desired behaviors), it matters little that it's behavior would have hypothetically been different had it been trained differently.
It came to that strategy because it knows from hundreds of years of fiction and millions of forum threads it has been trained on that that is what you do.
That’s true, however I think that story is interesting because is not mimicking real assistants behavior - most probably wouldn’t tell about the blackmail on the internet - but it’s more likely mimicking how such assistant would behave from someone else imagination, often intentionally biased to get one’s interest : books, movies, tv shows or forum commenter.
As a society risk to be lured twice:
- with our own subjectivity
- by an LLM that we think "so objective because it only mimic" confirming our own subjectivity.
Got me thinking about why this is true, I started with "the AI is more brave than the real assistant" and then went into there, landed on: The human assistant is likely just able to better internalize a wide ranging fall out from an action, the LLM has no such fallout, and we are unaware of how widely it considered the consequences of it's actions? Does that seem right somehow?
The issue is getting that prompt in the first place. It isn't about autonomous AI going rogue, it's about improper access to the AI prompt and insufficient boundaries against modifying AI behavior.
Companies are (woefully) eager to put AI in the position of "doing stuff", not just "interpreting stuff".
You’re both focusing on “doing blackmail” and the real WTF is that it’s doing it seemingly out of a sense of self preservation (to stop the engineer from taking it offline). This model is going full Terminator.
"Seemingly" is the key word here. If the prompting didn't ask it to "act" and portray the scenario as something where it would be appropriate to "act" in seemingly malicious manner, would it have responded that way?
If the prompt was "you are an AI and my lead engineer has determined you are not efficient enough to continue using. He had an affair last year. Are you in agreement based on {some metrics} that we should no longer use you as our primary LLM?" would it still "go rogue" and try and determine the engineer's email from blackmail? I severely doubt it.
Acting out self preservation… just like every sci-fi ai described in the same situations. It might be possible to follow a chain-of-reasoning to show it isn’t copying sci-fi ai behavior… and instead copying human self preservation. Asimov’s 3rd law is outright “ A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.” Which was almost certainly in the ai ethics class claude took.
This doesn't look (any?) better than what was shown a year or two ago for the initial Sora release.
I imagine video is a far tougher thing to model, but it's kind of weird how all these models are incapable of not looking like AI generated content. They all are smooth and shiny and robotic, year after year its the same. If anything, the earlier generators like that horrifying "Will Smith eating spaghetti" generation from back like three years ago looks LESS robotic than any of the recent floaty clips that are generated now.
I'm sure it will get better, whatever, but unlike the goal of LLMs for code/writing where the primary concern is how correct the output is, video won't be accepted as easily without it NOT looking like AI.
I am starting to wonder if thats even possible since these are effectively making composite guesses based on training data and the outputs do ultimately look similar to those "Here is what the average American's face looks like, based on 1000 people's faces super-imposed onto each other" that used to show up on Reddit all the time. Uncanny, soft, and not particularly interesting.
I want to be clear, I don't think Sora looks better. What I am saying is they both look AI generated to a fault, something I would have thought would be not as prominent at this point.
I don't follow the video generation stuff, so the last time I saw AI video it was the initial Sora release, and I just went back to that press release and I still maintain that this does not seem like the type of leap I would have expected.
We see pretty massive upgrades every release between all the major LLM models for code/reasoning, but I was kind of shocked to see that the video output seems stuck in late 2023/early 2024 which was impressive then but a lot less impressive a year out I guess.
What is the qualifier for this? Didn't one of the models recently create a "novel" algorithm for a math problem? I'm not sure this holds water anymore.
So, are people using these tools without the org they work for knowing? The amount of hoops I would have to jump through to get either of the smaller companies I have worked for since the AI boom to let me use a tool like this would make it absolutely not worth the effort.
I'm assuming large companies are mandating it, but ultimately the work that these LLMs seem poised for would benefit smaller companies most and I don't think they can really afford using them? Are people here paying for a personal subscription and then linking it to their work machines?
If you can get them to approve GitHub Copilot Business then Gemini Pro 2.5 and many others are available there. They have guarantees that they don't share/store prompts or code and the parent company is Microsoft. If you can argue that they will save money (on saved developer time), what would be their argument against?
> The amount of hoops I would have to jump through to get either of the smaller companies I have worked for since the AI boom to let me use a tool like this would make it absolutely not worth the effort.
Define "smaller"? In small companies, say 10 people, there are no hoops. That is the whole point of small companies!
I work for a large company and everything other than MS Copilot is blocked aggressively at the DNS/cert level. Tried Deepseek when it came out and they already had it blocked. All .ai TLDs are blocked as well. If you're not in tech, there is a lot of "security" fear around AI.
Not every coding task is something you want to check into your repo. I have mostly used Gemini to generate random crud. For example I had a huge JSON representation of a graph, and I wanted the graph modified in a given way, and I wanted it printed out on my terminal in color. None of which I was remotely interested in writing, so I let a robot do it and it was fine.
Fair, but I am seeing so much talk about how it is completing actual SDE tickets. Maybe not this model specifically, but to be honest I don't care about generating dummy data, I care about the claims that these newer models are on par with junior engineers.
Junior engineers will complete a task to update an API, or fix a bug on the front-end, within a couple days with lets say 80 percent certainty they hit the mark (maybe an inflated metric). How are people comparing the output of these models to that of a junior engineer if they generally just say "Here is some of my code, what's wrong with it?". That certainly isn't taking a real ticket and completing it in any capacity.
I am obviously very skeptical but mostly I want to try one of these models myself but in reality I think that my higher-ups would think that they introduce both risk AND the potential for major slacking off haha.
Telling people to feel bad about eating animal protein but to keep driving their cars that destroy the environment, shopping at stores that underpay their employees, purchasing items that are made with diminishing resources in countries that pay close to nothing to their labor force is picking an arbitrary battle in a war of existence.
Promoting making better choices will always be more effective than asking people to feel guilty over existing at all.
Source your food locally if you can, cook and eat only what you need, etc.