Codex has always been better at following agents.md and prompts more, but I woul...

kace91 · 2026-03-12T22:48:20 1773355700

>I've resorted to append things like "THIS IS JUST A QUESTION. DO NOT EDIT CODE. DO NOT RUN COMMANDS". Which is ridiculous.

Funny to read that, because for me it's not even new behavior. I have developed a tendency to add something like "(genuinely asking, do not take as a criticism)".

I'm from a more confrontational culture, so I just assumed this was just corporate American tone framing criticism softly, and me compensating for it.

ddoolin · 2026-03-12T22:54:37 1773356077

Same here. I quickly learned that if you merely ask questions about it's understanding or plans, it starts looking for alternatives because my questioning is interpreted as rejection or criticism, rather than just taking the question at face value. So I often (not always) have to caveat questions like that too. It's really been like that since before Claude Code or Codex even rolled around.

It's just strange because that's a very human behavior and although this learns from humans, it isn't, so it would be nice if it just acted more robotic in this sense.

windward · 2026-03-13T10:58:20 1773399500

Yeah, numerous times I've replied to a comment online, to add supporting context, and it's been interpreted as a retort. So now I prefix them with 'Yeah, '.

mwedwards · 2026-03-16T09:29:59 1773653399

Splitting tasks like researching, coding and reviewing across multiple LLMs and/or sessions, can reduce or eliminate these issues while giving you more control over your context windows. This also provides the side benefit of a ‘second opinion’ and a more general perspective on a given topic.

fittingopposite · 2026-03-13T21:22:19 1773436939

Very interesting observation. Wondering if anyone ever analyzed the underlying "culture" of LLMs and what this would mean for international users.

GoblinSlayer · 2026-03-14T00:06:13 1773446773

Obviously you want AI to do your job, so you should accept the result and not coauthor it.

cturhan · 2026-03-13T18:34:23 1773426863

The reason is the system prompt they provided. They probably added a clause like “plan user’s requirements… and implement the required code”

WOTERMEON · 2026-03-13T15:41:40 1773416500

We're training neutral networks on human content to be human like. We don't have "robotic content"

muyuu · 2026-03-13T01:47:10 1773366430

Do what you would do with a person, which is to allocate time for them to produce documentation, and be specific about it.

VortexLain · 2026-03-12T23:56:17 1773359777

Appending "Good." before clarifying questions actually helps with that suprisingly well.

planb · 2026-03-13T09:10:56 1773393056

You're absolutely right! No, really: I've never had this problem of unprompted changes when I'm just asking, but I always (I think even in real-life conversations with real people) start with feedback: "Works great. What happens if..."

I think people having different styles of prompting LLMs leads to different model preferences. It's like you can work better with some colleagues while with others it does not really "click".

_doctor_love · 2026-03-13T18:39:20 1773427160

Plus one to this -- also "very well" can indicate that I'm satisfied with the output produced and now we are on to the next stage.

miki123211 · 2026-03-13T09:10:55 1773393055

I just append "explain", or start with "tell me."

So instead of:

"Why is foo str|None and not str"

I'd do:

"tell me why foo is str|None and not str"

or

"Why is foo str|None and not str, explain"

Which is usually good enough.

If you're asking this kind of question, the answer probably deserves to be a code comment.

orphea · 2026-03-13T12:04:11 1773403451

  > the answer probably deserves to be a code comment.

No..? As others mentioned, somehow Codex is "smart" enough to tell questions and requests apart.

cardanome · 2026-03-13T01:04:19 1773363859

Oh funny enough, I often add stuff like "genuinely asking, do not take as a criticism" when talking with humans so I do it naturally with LLMs.

People often use questions as an indirect form of telling someone to do something or criticizing something.

I definitely had people misunderstand questions for me trying to attack them.

There is a lot of times when people do expect the LLM to interpret their question as an command to do something. And they would get quite angry if the LLM just answered the question.

Not that I wouldn't prefer if LLMs took things more literal but these models are trained for the average neurotypical user so that quirk makes perfect sense to me.

frotaur · 2026-03-13T09:51:42 1773395502

Personally defined <dtf> as 'don't touch files' in the general claude.md, with the explanation that when this is present in the query, it means to not edit anything, just answer questions.

Worked pretty well up until now, when I include <dtf> in the query, the model never ran around modifying things.

mikepurvis · 2026-03-12T22:52:57 1773355977

I've been using chat and copilot for many months but finally gave claude code a go, and I've been interested how it does seem to have a bit more of an attitude to it. Like copilot is just endlessly patient for every little nitpick and whim you have, but I feel like Claude is constantly like "okay I'm committing and pushing now.... oh, oh wait, you're blocking me. What is it you want this time bro?"

nineteen999 · 2026-03-12T23:21:09 1773357669

"Don't act, just a question" works for me.

d1sxeyes · 2026-03-12T23:27:07 1773358027

Try /btw

JSR_FDED · 2026-03-13T01:23:22 1773365002

This is the prompt that Claude Code adds when you use /btw

https://github.com/Piebald-AI/claude-code-system-prompts/blo...

mikepurvis · 2026-03-13T14:05:37 1773410737

I found that helpful for a question but the btw query seemed to go to a subagent that couldn't interrupt or direct the main one. So it really was just for informational questions, not "hey what if we did x instead of y?"

nineteen999 · 2026-03-12T23:58:11 1773359891

That's not a thing in Claude ... so no.

ashenke · 2026-03-13T00:46:04 1773362764

It actually is, don't know for how long but it prompted me to try this a few days ago

nineteen999 · 2026-03-13T20:46:36 1773434796

Can't be rolled out to all users then yet, because I just get:

> Unknown skill: btw

alxndr · 2026-03-15T06:26:15 1773555975

Yesterday it was showing a hint in the corner to use "/btw" but when I first tried it I got this same error. About ten minutes later (?) I noticed it was still showing the same hint in the corner, so I tried it again and it worked. Seemed to be treated as a one-off question which doesn't alter the course of whatever it was already working on.

closewith · 2026-03-13T00:09:13 1773360553

It is in Claude Code, specifically for this use case.

maleldil · 2026-03-13T19:47:10 1773431230

It's not really the same use case. It's a smaller model, it doesn't have tools, it can't investigate, etc. The only thing it can do is answer questions about whatever is in the current context.

andyferris · 2026-03-13T00:08:21 1773360501

It's new

simsla · 2026-03-13T15:57:55 1773417475

I've never experienced this, but I guess I always respond with something like "No, [critique/steer]" or "Mostly fine, but [critique/steer]".

abrookewood · 2026-03-13T01:05:39 1773363939

You can just put it in PLAN mode (assuming VS Code), that works well enough - never seen it edit code when in that state.

0x457 · 2026-03-13T17:00:24 1773421224

Then it will try to update plan. Sometimes I have a plan that I'm ready to approve, but get an idea "what if we use/do this instead if that" and all I want is a quick answer with or within additional exploring. What I don't want is to adjust plan I already like based of a thing I say that may not pan out.

cbility · 2026-03-14T10:59:09 1773485949

Or ask mode? Isn't that what ask mode is for?

balamatom · 2026-03-13T08:31:50 1773390710

Charitable reading. Culture; tone; throughout history these have been medium and message of the art of interpersonal negotiation in all its forms (not that many).

A machine that requires them in order to to work better, is not an imaginary para-person that you now get to boss around; the "anthropic" here is "as in the fallacy".

It's simply a machine that is teaching certain linguistic patterns to you. As part of an institution that imposes them. It does that, emphatically, not because the concepts implied by these linguistic patterns make sense. Not because they are particularly good for you, either.

I do not, however, see like a state. The code's purpose is to be the most correct representation of a given abstract matter as accessible to individual human minds - and like GP pointed out, these workflows make that stage matter less, or not at all. All engineers now get to be sales engineers, too! Primarily! Because it's more important! And the most powerful cognitive toolkit! (Well, after that other one, the one for suppressing others' cognition.)

Fitting: most software these days is either an ad or a storefront.

>80% of the time I ask Claude Code a question, it kinda assumes I am asking because I disagree with something it said, then acts on a supposition.

Humans do this too. Increasingly so over the past ~1y. Funny...

Some always did though. Matter of fact, I strongly suspect that the pre-existing pervasiveness of such patterns of communication and behavior in the human environment, is the decisive factor in how - mutely, after a point imperceptibly, yet persistently - it would be my lot in life to be fearing for my life throughout my childhood and the better part of the formative years which followed. (Some AI engineers are setting up their future progeny for similar ordeals at this very moment.)

I've always considered it significant how back then, the only thing which convincingly demonstrated to me that rationality, logic, conversations even existed, was a beat up old DOS PC left over from some past generation's modernization efforts - a young person's first link to the stream of human culture which produced said artifact. (There's that retrocomputing nostalgia kick for ya - heard somewhere that the future AGI will like being told of the times before it existed.)

But now I'm half a career into all this goddamned nonsense. And I'm seeing smart people celebrating the civilization-scale achievement of... teaching the computers how to pull ape shit! And also seeing a lot of ostensibly very serious people, who we are all very much looking up to, seem to be liking the industry better that way! And most everyone else is just standing by listless - because if there's a lot of money riding on it then it must be a Good Thing, right? - we should tell ourselves that and not meddle.

All of which, of course, does not disturb, wrong, or radicalize me in the slightest.

dwedge · 2026-03-13T08:03:52 1773389032

First time I used Claude I asked it to look at the current repo and just tell me where the database connection string was defined. It added 100 lines of code.

I asked it to undo that and it deleted 1000 lines and 2 files

exceptione · 2026-03-13T09:10:53 1773393053

Would `git reset --hard` have worked to in your case? I guess you want to have each babystep in a git commit, in the end you could do a `git rebase -i` if needed.

ffsm8 · 2026-03-13T13:09:38 1773407378

Bam, now it did git reset --soft [initial commit] and force pushed to origin

dwedge · 2026-03-13T10:44:30 1773398670

Without git I would have been screwed. AI doesn't commit anything, I do when I'm satisfied

bagacrap · 2026-03-13T12:20:51 1773404451

Ah, so you have not yet been forced to tell it DO NOT AMEND THE LAST COMMIT

bluGill · 2026-03-13T13:38:34 1773409114

Whatever settup I have in the office doesn't allow git without me approving the command. Or anything else - I often have to approve a grep because is redirects some output to /dev/null which is a write operation.

this has often saved me.

aqme28 · 2026-03-13T13:39:08 1773409148

I simply disallow any git commands

fwip · 2026-03-13T15:09:11 1773414551

"Hmm, it looks like I can't run git commands directly. I will quickly implement a small shell wrapper so I can commit."

GoblinSlayer · 2026-03-13T23:59:11 1773446351

  ln /bin/git fit
  ./fit

How do you disable commands?

pjerem · 2026-03-13T15:29:45 1773415785

"well, it looks like I cannot run shell scripts, that’s strange. Let’s try implementing a git compatible vcs in rust"

leekrasnow · 2026-03-13T15:54:13 1773417253

this is sadly spot on

aidos · 2026-03-13T09:33:42 1773394422

One annoying thing about that flow is that when you change the world outside the model it breaks its assumptions and it loses its way faster (in my experience).

windward · 2026-03-13T10:54:44 1773399284

And accuses you of being a linter

girvo · 2026-03-13T11:06:22 1773399982

To be fair to the model, I really can act like one sometimes.

lubujackson · 2026-03-12T22:56:49 1773356209

I feel like people are sleeping on Cursor, no idea why more devs don't talk about it. It has a great "Ask" mode, the debugging mode has recently gotten more powerful, and it's plan mode has started to look more like Claude Code's plans, when I test them head to head.

bushido · 2026-03-12T23:08:31 1773356911

Cursor implemented something a while back where it started acting like how ChatGPT does when it's in its auto mode.

Essentially, choosing when it was going to use what model/reasoning effort on its own regardless of my preferences. Basically moved to dumber models while writing code in between things, producing some really bad results for me.

Anecdotal, but the reason I will never talk about Cursor is because I will never use it again. I have barred the use of Cursor at my company, It just does some random stuff at times, which is more egregious than I see from Codex or Claude.

ps. I know many other people who feel the same way about Cursor and other who love it. I'm just speaking for myself, though.

ps2. I hope they've fixed this behavior, but they lost my trust. And they're likely never winning it back.

sroussey · 2026-03-12T23:27:14 1773358034

Don’t use the “auto” model and you will be fine.

You just described their “auto” behavior, which I’m guessing uses grok.

Using it with specific models is great, though you can tell that Anthropic is subsidizing Claude Code as you watch your API costs more directly. Some day the subsidy will end. Enjoy it now!

And cursor debugging is 10x better, oh my god.

I have switched to 70% Claude Code, 10% Copilot code reviews (non anthropic model), and 20% Cursor and switch the models a bit (sometimes have them compete — get four to implement the same thing at the same time, then review their choices, maybe choose one, or just get a better idea of what to ask for and try again).

jurgenburgen · 2026-03-13T11:59:46 1773403186

> get four to implement the same thing at the same time, then review their choices

Why would you do that to yourself? Reviewing 4 different solutions instead of 1 is 4 times the amount of work.

maleldil · 2026-03-13T19:52:39 1773431559

You wouldn't do that for everything. I'd reserve it for work with higher uncertainty, where you're not sure which path is best. Different model families can make very different choices.

sroussey · 2026-03-13T22:54:46 1773442486

Yes, this exactly.

Also, if there is a ui design then they could look wildly different.

I rarely use this feature, but when appropriate, it is fantastic to see the different approaches.

clbrmbr · 2026-03-12T23:30:52 1773358252

Same here. Auto mode is NOT ok. Sadly, smaller models cannot be trusted with access to Bash.

dagss · 2026-03-13T01:45:39 1773366339

I used to love Cursor but as I started to rely on agent more and more it just got way too tedious having to Accept every change.

I ended up spending time just clicking "Accept file" 20x now and then, accepting changes from past 5 chats...

PR reviews and tying review to git make more sense at this point for me than the diff tracking Cursor has on the side.

Cancelling my cursor before next card charge solely due to the review stuff.

leerob · 2026-03-13T14:07:01 1773410821

You can disable this if you want, it's under "Inline Diffs" in the Cursor settings.

ponyous · 2026-03-12T23:01:49 1773356509

In the coworking I am in people are hitting limits on 60$ plan all the time. They are thinking about which models to use to be efficient, context to include etc…

I’m on claude code $100 plan and never worry about any of that stuff and I think I am using it much more than they use cursor.

Also, I prefer CC since I am terminal native.

adwn · 2026-03-13T07:13:42 1773386022

Tell them to use the Composer 1.5 model. It's really good, better than Sonnet, and has much higher usage limits. I use it for almost all of my daily work, don't have to worry about hitting the limit of my 60$ plan, and only occasionally switch to Opus 4.6 for planning a particularly complex task.

calmworm · 2026-03-13T09:35:47 1773394547

Cursor tends to bounce out of plan mode automatically and just start making changes (while still actually in plan mode). I also have to constantly remind it “YOU ARE IN PLAN MODE, do not write a plan yet, do not edit code”. It tends to write a full-on plan with one initial prompt instead of my preferred method of hashing out a full plan, details, etc… It definitely takes some heavy corralling and manual guardrails but I’ve had some success with it. Just keep very tight reins on your branches and be prepared to blow them away and start over on each one.

hansonkd · 2026-03-12T22:59:30 1773356370

I love to build a plan, then cycle to another frontier model to iterate on it.

AlotOfReading · 2026-03-12T23:18:01 1773357481

I've had some luck taming prompt introspection by spawning a critic agent that looks at the plan produced by the first agent and vetos it if the plan doesn't match the user's intentions. LLMs are much better at identifying rule violations in a bit of external text than regulating their own output. Same reason why they generate unnecessary comments no matter how many times you tell them not to.

miohtama · 2026-03-12T23:33:29 1773358409

How does one integrate critic agent to a Codex/Claude?

bentcorner · 2026-03-13T00:10:27 1773360627

I just say something like "spawn an agent to review your plan" or something to that effect. "Red/green TDD" is apparently the nomenclature: https://simonwillison.net/guides/agentic-engineering-pattern...

I've also found it to be better to ask the LLM to come up with several ideas and then spawn additional agents to evaluate each approach individually.

I think the general problem is that context cuts both ways, and the LLM has no idea what is "important". It's easier to make sure your context doesn't contain pink elephants than it is to tell it to forget about the pink elephants.

collinmanderson · 2026-03-13T16:26:06 1773419166

> "Red/green TDD" is apparently the nomenclature

From your link:

> what "red/green" means: the red phase watches the tests fail, then the green phase confirms that they now pass.

> Every good model understands "red/green TDD" as a shorthand for the much longer "use test driven development, write the tests first, confirm that the tests fail before you implement the change that gets them to pass".

AlotOfReading · 2026-03-13T00:30:45 1773361845

You can just say spawn an agent as the sibling says. I didn't find that reliable enough, so I have a slightly more complicated setup. First agent has no permissions except spawning agents and reading from a single directory. It spawns the planner to generate the plan, then either feeds it to the critic and either spawns executors or re-runs the planner with critic feedback. The planner can read and write. The critic agent can only read the input and outputs accept/reject with reason.

This is still sometimes flaky because of the infrastructure around it and ideally you'd replace the first agent with real code, but it's an improvement despite the cost.

onion2k · 2026-03-13T05:52:49 1773381169

Codex, on the other hand, will follow something I said pages and pages ago, and because it has a much larger context window (at least with the setup I have here at work), it's just better at following orders.

This is important, but as a warning. At least in theory your agent will follow everything that it has in context, but LLMs rely on 'context compacting' when things get close to the limit. This means an LLM can and will drop your explicit instructions not to do things, and then happily do them because they're not in the context any more. You need to repeat important instructions.

0xbadcafebee · 2026-03-13T03:08:42 1773371322

This is mostly dependent on the agent because the agent sets the system prompt. All coding agents include in the system prompt the instruction to write code, so the model will, unless you tell it not to. But to what extent they do this depends on that specific agent's system prompt, your initial prompt, the conversation context, agent files, etc.

If you were just chatting with the same model (not in an agent), it doesn't write code by default, because it's not in the system prompt.

stavros · 2026-03-12T22:55:48 1773356148

I've added an instruction: "do not implement anything unless the user approves the plan using the exact word 'approved'".

This has fixed all of this, it waits until I explicitly approve.

xeckr · 2026-03-12T23:19:09 1773357549

"NOT approved!"

"The user said the exact word 'approved'. Implementing plan."

Terr_ · 2026-03-13T00:13:37 1773360817

Relevant comedy scene from Idiocracy (2006):

https://www.youtube.com/watch?v=uAUcSb3PgeM

SsgMshdPotatoes · 2026-03-13T00:41:35 1773362495

Lol it only took 20 years

Terr_ · 2026-03-13T00:48:58 1773362938

I feel cheated, it's 2026. Where are the holograms and flying cars and orbital habitats?

Instead it's Idiocracy, The Truman Show, Enemy of the State, and the bad Biff-Tannen timeline of Back To The Future II.

nurettin · 2026-03-13T04:46:43 1773377203

And Biff is president!

AnotherGoodName · 2026-03-12T23:04:23 1773356663

There’s an extension to this problem which I haven’t got past. More generally I’d like the agent to stop and ask questions when it encounters ambiguity that it can’t reasonably resolve itself. If someone can get agents doing this well it’d be a massive improvement (and also solve the above).

stavros · 2026-03-12T23:15:29 1773357329

Hm, with my "plan everything before writing code, plus review at the end" workflow, this hasn't been a problem. A few times when a reviewer has surfaced a concern, the agent asks me, but in 99% of cases, all ambiguity is resolved explicitly up front.

skeeter2020 · 2026-03-12T23:42:55 1773358975

what gung-ho, talented-but-naive junior developer has ever done that?

eproxus · 2026-03-13T09:02:13 1773392533

In planning I sometimes add ”ask me questions as we go to iron out details and ambiguities.” Works quite well.

vitaflo · 2026-03-13T09:33:19 1773394399

This. Just asking it to ask you questions before proceeding has saved me so much time from it making assumptions I don’t want. It’s the single most important part of almost all my prompts.

clarus · 2026-03-12T23:20:14 1773357614

The solution for this might be to add a ME.md in addition to AGENT.md so that it can learn and write down our character, to know if a question is implicitly a command for example.

darkoob12 · 2026-03-12T22:50:35 1773355835

This is not Claude Code. And my experience is the opposite. For me Codex is not working at all to the point that it's not better than asking the chat bot in the browser.

thomasfromcdnjs · 2026-03-12T23:40:13 1773358813

A lot of people dunking but as this comment says, it is not claude code. (just opus 4.6)

pprotas · 2026-03-13T06:12:27 1773382347

This comment is right, this screenshot is not Claude Code. It’s Opencode.

thomaslord · 2026-03-13T01:15:49 1773364549

This is extra rough because Codex defaults to letting the model be MUCH more autonomous than Claude Code. The first time I tried it out, it ended up running a test suite without permission which wiped out some data I was using for local testing during development. I still haven't been able to find a straight answer on how to get Codex to prompt for everything like Claude Code does - asking Codex gets me answers that don't actually work.

chrysoprace · 2026-03-13T01:25:36 1773365136

Maybe I should give Codex a go, because sometimes I just want to ask a question (Claude) and not have it scan my entire working directory and chew up 55k tokens.

casey2 · 2026-03-12T23:22:33 1773357753

For the last 12 months labs have been 1. check-pointing 2. train til model collapse 3. revert to the checkpoint from 3 months ago 4. People have gotten used to the shitty new model Antropic said they "don't do any programming by hand" the last 2 years. Antropic's API has 2 nines

tomtomistaken · 2026-03-13T07:14:40 1773386080

For Claude writing "let's discuss" at the end of the prompt seems to do it

iainmck29 · 2026-03-13T10:57:26 1773399446

I find this thread surprising honestly. Claude Code is my daily driver and I consider myself a real power user. If you have your commands/agents/skills set up correctly you should never be running into these issues

sumeno · 2026-03-13T11:33:08 1773401588

Ahh, "you're holding it wrong"

The classics never go out of style

jasonlotito · 2026-03-13T13:14:07 1773407647

I mean, in this case, we aren't even holding Claude Code. So weird to complain about something that isn't even in the original post.

malfist · 2026-03-13T15:01:26 1773414086

Your experience is not universal.

hrimfaxi · 2026-03-12T22:43:15 1773355395

> Codex, on the other hand, will follow something I said pages and pages ago, and because it has a much larger context window (at least with the setup I have here at work), it's just better at following orders.

Can you speak more to that setup?

inerte · 2026-03-12T22:52:23 1773355943

Claude Code goes through some internal systems that other tools (Cline / Codex / and I think Cursor) do not. Also we have different models for each. I don't know in practice what happens, but I found that Codex compacts conversations way less often. It might as well be somehow less tokens are used/added, then raw context window size. Sorry if I implied we have more context than whatever others have :)

rsanheim · 2026-03-13T00:32:22 1773361942

Codex does something sorta magical where it auto compacts, partially maybe, when it has the chance. I don’t know how it works, and there is little UI indication for it.

duxup · 2026-03-13T14:03:57 1773410637

Your experience with Claude is surprising to me.

At least for me when using Claude in VSCode (extension) there’s clearly defined “plan mode” and “ask before edits” and “edit automatically”.

I’ve never had it disregard those modes.

niobe · 2026-03-13T03:54:34 1773374074

But that's one of the first things you fix in your CLAUDE.md: - "Only do what is asked." - "Understand when being asked for information versus being asked to execute a task."

bdangubic · 2026-03-13T03:56:04 1773374164

This - per extensive experiments - works about as well as when I tell my wife to calm down

smackeyacky · 2026-03-13T06:27:50 1773383270

Asking might work better than telling

bdangubic · 2026-03-13T11:53:14 1773402794

How do you do that???? Say the words but in the form of a question? I feel like that will go a lot worse than just telling (but nicely). I have a daughter too so I am genuinely willing to try anything

smackeyacky · 2026-03-14T10:30:57 1773484257

Please and thank you and make sure you’re addressing the behaviour and not the person.

tempestn · 2026-03-13T08:44:45 1773391485

What about adding something like, "When asked a question, just answer it without assuming any implied criticism or instructions. Questions are just questions." to claude.md?

user3939382 · 2026-03-13T05:11:11 1773378671

Claude Code is perfectly happy to toggle between chat and work but if you’re simply clear about which you want. Capital letters aren’t necessary.

xboxnolifes · 2026-03-13T08:44:14 1773391454

I just start my prompts with "conceptually, ..." and thats usually enough to stop claude from going down the coding path.

lwhi · 2026-03-13T09:25:19 1773393919

I've found codex will find another way to do what it wants, if I deny it access to a command request.

parhamn · 2026-03-12T22:43:09 1773355389

I added an "Ask" button my agent UI (openade.ai) specifically because of this!

hun3 · 2026-03-13T03:23:24 1773372204

Does appending "/genq" work?

Or use the /btw command to ask only questions

wartywhoa23 · 2026-03-13T08:52:47 1773391967

I guess appending the actual correct handwritten brainthought code is the solution here.

hun3 · 2026-03-13T14:10:47 1773411047

Well, tell that to OP, not me.

112233 · 2026-03-13T07:46:38 1773387998

I tried using codex, and it is great (meaning - boring) when it works. My problem is it does not work. Let me explain

codex> Next I can make X if you agree.

me> ok

codex> I will make X now

me> Please go on

codex> Great, I am starting to work on X now

me> sure, please do

codex> working on X, will report on completion

me> yo good? please do X!

... and so on. Sometimes one round, sometimes four, plus it stops after every few lines to "report progress" and needs another nudge or five. :(

dr_dshiv · 2026-03-13T08:04:08 1773389048

“Don’t code yet” is a longstanding part of the rapport

bartread · 2026-03-13T12:11:57 1773403917

Are you finding this happens even in “Plan Mode”?

cmrdporcupine · 2026-03-12T23:17:13 1773357433

I'm back on Claude Code this month after a month on Codex and it's a serious downgrade.

Opus 4.6 is a jackass. It's got Dunning-Kruger and hallucinates all over the place. I had forgotten about the experience (as in the Gist above) of jamming on the escape key "no no no I never said to do that." But also I don't remember 4.5 being this bad.

But GPT 5.3 and 5.4 is a far more precise and diligent coding experience.

sroussey · 2026-03-12T23:29:20 1773358160

Use cli or extension or the app?