For what it's worth, I've been trying Opus 4.1 in VS Code through GitHub Copilot...

cpursley · 2025-08-08T17:52:05 1754675525

Use Claude Code, the rest aren't worth the bother.

addandsubtract · 2025-08-08T18:27:38 1754677658

What does Claude Code do differently to Copilot Agent? Shouldn't they produce the same(ish) result if they're using the same model?

DannyBee · 2025-08-08T19:08:24 1754680104

If they prompt the same and ..., They should.

But they definitely don't taking into account whatever prompts the tools are really using (or ms is using a neutered version to reduce cost). So I would agree with the suggestion. Using sonnet through copilot seems very very different than cursor or cline or Claude code.

Using the same exact model, Copilot consistently often fails to finish tasks or makes a mess. It is consistent at this across ides (ie using the jetbrains plugin generates nearly identical bad results as vscode copilot). I then discard all it did and try the exact same (user) prompt in cursor or Claude code or cline with the same model and it does the same task perfectly.

gwd · 2025-08-09T09:20:46 1754731246

I've used both aider and opencode with both Opus and Sonnet. Opencode, at least initially, used Claude Code's exact prompt; and I found the results surprisingly different.

Perhaps it shouldn't be surprising; after all, we do want the LLMs to listen to the prompts and act differently. And, the Claude team will presumably be tuning both Claude and Claude Code's prompts to each other optimize their own experience, so it's perhaps not surprising that Claude + Claude Code's prompts well together.

akmarinov · 2025-08-09T06:19:43 1754720383

Copilot sucks more at applying what the model is instructing it to do

bongodongobob · 2025-08-08T17:55:06 1754675706

To me it seems that Opus is really good at writing code if you give it a spec. The other day I had Gpt come up with a spec for a DnD text game that uses the GPT API. It one shotted a 1k line program.

However, if I'm not detailed with it, it does seem to make weird choices that end up being unmaintainable. It's like it has poor creative instincts but is really good at following the directions you give it.

stavros · 2025-08-08T19:18:13 1754680693

Wait, are you talking about Opus or GPT? Which GPT? You switched models mid-sentence.

bongodongobob · 2025-08-08T20:23:22 1754684602

GPT 4o came up with a design spec that I gave to Opus to implement.

muzani · 2025-08-08T18:01:04 1754676064

Opus seems to need more babysitting IME, which is great if you are going to actually pair program. Terrible if you like leaving it to do its own thing or try to do multiple things at once.

torginus · 2025-08-08T19:22:49 1754680969

I just want a model that feels like an extension of me. For example if I there's a task I can describe in one sentence - "add a rest api for user management in the db, and makes sure only users in the admin group are allowed to use it" - would result in an API endpoint that's properly wired up to the right places, and the model does what I tell it, and nothing else, even if it would logically follow from what I told it.

And if it's gets confused, needs clarification, or has its own initative - I want it to stop and ask.

Oh and it needs to be fast it's tokens per minute should be as fast as I can read what it generates (and I can read boilerplate-y code quite fast), and it shouldn't stop and think on every prompt, only when it needs to, and it should be much faster and granular in backtracking.

The loop of waiting on the AI then having to fix and steer it constantly as it doggedly follows its own ideas has really taken the enjoyment out of vibe coding for me.

Uw7yTcf36gTc · 2025-08-11T22:21:59 1754950919

Have it break the problem into phases. Have it unit testing after every phase. Only move forward after all the test for the phase have passed. I’m using the free Qwen3-Coder and with proper prompting is fairly good.

epolanski · 2025-08-08T18:18:51 1754677131

That's insightful.

I spend a lot of time planning tasks, generating various documents per pr (requirements, questions, todo), having AI poke my ideas (business/product/ux/code-wise) etc.

After 45 minutes of back and forth in general we end up with a detailed plan.

This has also many benefits: - writing tests becomes very simple (unit, integration, E2Es) - writing documentation becomes very simple - writing meaningful PRs becomes very simple

It is quite boring though, not gonna lie. But that's a price I have accepted for quality.

Also, clearing the ideas so much before hand often leads me to come with creative ideas later in the day, when I go for walks and review mentally what we've done/how.

muzani · 2025-08-08T18:27:27 1754677647

You might want to try Claude Code if you haven't. It's perfect for exactly this plan, then build flow with a ton of documents. A colleague set up some strict code guidelines, right down to say, put constructors at the top, constants at the bottom, use this name for this, snake case for that. Code quality just shoots up with these details. Can't just hack away with a blunt axe.

People tend to hate Claude Code because it's not vibe coding anymore but it was never really meant to be.

epolanski · 2025-08-08T18:31:06 1754677866

Yes I use Claude Code a lot, but I'm on the $ 20 tier so I've never seen opus in action (I think it's sonnet only?).