More

srush · 2025-10-29T21:04:41 1761771881

A lot of people use it! It scores very well on our benchmarks, significantly better than Composer-1.

srush · 2025-10-29T18:08:02 1761761282

There are lots of good models we like here. But we agree that getting the right point on the smart+fast graph can make agentic coding feel really good.

(Cursor researcher)

srush · 2025-10-29T18:03:19 1761760999

We will do our best. Luckily I don't think there are major telecom companies called Composer-2.

Sander_Marechal · 2025-10-31T13:03:47 1761915827

There is also a very polular package manager called Composer. Do companies not search for name collisions? Or do they squat on community projects on purpose?

srush · 2025-10-29T18:02:14 1761760934

Unfortunately not, as we used our own internal code for the benchmark. We would also like to see more benchmarks that reflect the day-to-day agentic coding use.

gabriel666smith · 2025-10-29T19:26:08 1761765968

Is there any information at all available, anywhere, on what Cursor Bench is testing and how?

It's the most prominent part of the release post - but it's really hard to understand what exactly it's saying.

srush · 2025-10-29T21:11:28 1761772288

Roughly, we had Cursor software engineers record real questions they were asking models, and then had them record the PR that they made that contained the result. We then cleaned these up. That is the benchmark.

gabriel666smith · 2025-10-30T09:04:15 1761815055

Are you able to give a sense of how many questions, which domains they were split over, and how that split looked in % terms?

As a user, I want to know - when an improvement is claimed - whether it’s relevant to the work I do or not. And whether that claim was tested in a reasonable way.

These products aren’t just expensive - it requires switching your whole workflow. Which is becoming an increasingly big ask in this space.

It’s pretty important for me to be able to understand, and subsequently, believe a benchmark - I find it really hard not to read it as ad copy where this information isn’t present.

ukblewis · 2025-10-29T21:26:42 1761773202

Which programming languages/tools/libraries did the teams questions/code involve?

srush · 2025-10-29T17:23:34 1761758614

We like the name Composer and were sad to see it go. Excited to bring it back. (Agree Cheetah is a cool name too.)

srush · 2025-10-29T17:04:44 1761757484

There is a footnote that should help with the models. Training is a harder thing to report on, but roughly our finding here is that RL scales.

srush · 2025-10-29T17:02:56 1761757376

We don't wear shoes [1].

[1] https://www.businessinsider.com/no-shoes-policy-in-office-cu...

timcobb · 2025-10-29T19:28:45 1761766125

I would have thought it's because you use Cursor...

srush · 2025-10-29T16:55:16 1761756916

Agree that Sonnet 4.5 is an excellent model. Would be curious to hear your experience using Composer though, it's quite good.

jasonjmcghee · 2025-10-29T17:24:38 1761758678

I'll try it out! I haven't yet - just generally conveying my opinion that I personally weigh "better model" much more important than speed, assuming some "fast enough"

Also, didn't realize you worked at Cursor - I'm a fan of your work - they're lucky to have you!

srush · 2025-10-29T18:06:25 1761761185

Thanks! Yeah, been working here for 9 months now. Fascinated byt agentic coding both as a researcher and user.

Totally agree that "smart model" is the table stakes for usefulness these days.

timcobb · 2025-10-30T03:34:41 1761795281

> Composer though, it's quite good

Wow, no kidding. It is quite good!

srush · 2025-10-29T16:52:37 1761756757

We also are big Tab users here at Cursor. In the blog we talk about the motivation for this project came from thinking about a Tab-like agent.

srush · 2025-10-29T16:49:56 1761756596

Hi everyone,

I am an ML researcher at Cursor, and worked on this project. Would love to hear any feedback you may have on the model, and can answer question about the blog post.

coder543 · 2025-10-29T21:50:18 1761774618

Impressive systems write-up. A question: if Composer is an RL finetune on an open model, why keep weights closed? The edge from a slightly better checkpoint erodes quickly in this market, it's not a durable advantage. Composer protects Cursor's margins from being squeezed by the big AI labs, but that is true whether the weights are open or closed, and I think Cursor would have more lasting benefit by generating developer goodwill than from a narrow, short-lived advantage. But, that's just my opinion. I personally find it hard to get excited about yet-another proprietary model. GPT-5 and Sonnet 4.5 are around when I need one of those, but I think the future is open.

Agingcoder · 2025-10-30T00:20:17 1761783617

It's stunning.

I don't use these tools that much ( I tried and rejected Cursor a while ago, and decided not to use it ) but having played with GPT5 Codex ( as a paying customer) yesterday in regular VSCode , and having had Composer1 do the exact same things just now, it's night and day.

Composer did everything better, didn't stumble where Codex failed, and most importantly, the speed makes a huge difference. It's extremely comfortable to use, congrats.

Edit: I will therefore reconsider my previous rejection

srush · 2025-10-30T01:17:40 1761787060

Awesome to hear, I will share with the team.

WanderPanda · 2025-10-29T17:30:07 1761759007

Why did you stop training shy of the frontier models? From the log plot it seems like you would only need ~50% more compute to reach frontier capability

srush · 2025-10-29T17:34:25 1761759265

We did a lot of internal testing and thought this model was already quite useful for release.

WanderPanda · 2025-10-29T17:38:29 1761759509

Makes sense! I like that you guys are more open about it. The other labs just drop stuff from the ivory tower. I think your style matches better with engineers who are used to datasheets etc. and usually don't like poking a black box

srush · 2025-10-29T17:47:27 1761760047

Thanks! I do like the labs blog posts as well though, OpenAI and Anthropic have some classics.

chaidhat · 2025-10-29T17:44:25 1761759865

Which model did you distill it from? Great work! PS getting a few scenarios where it doesn't follow rules as well as sonnet 4.5

srush · 2025-10-29T17:48:35 1761760115

The blog talks about the training process. Specifically we trained with RL post-training on coding examples.

chis · 2025-10-29T17:58:50 1761760730

Makes sense, but what model was used for the base? Is it some open-source model, and you're not at liberty to disclose?

W0WL0LXD · 2025-10-31T18:46:03 1761936363

not a Cursor employee but still a researcher, it’s Zhipu/Z.ai GLM-4.6/4.5. there’s traces of Chinese in the reasoning output + its the only model that would make sense to do this with RL, and is a model that already delivers near SOTA performance + is open-source/open-weight.

Cursor Composer and Windsurf SWE 1.5 are both finetuned versions of GLM.

chaidhat · 2025-11-01T07:37:32 1761982652

interesting, thank you

chaidhat · 2025-10-29T18:48:59 1761763739

that's cool thanks!

embedding-shape · 2025-10-29T20:40:34 1761770434

Do you have any graphs handy that kind of replicates the one used first in the blog post but a bit less ambiguous, maybe without model grouping? I feel like it would have been a bit more fair to include proper names, and individualize them rather than group everything together by something, and then present your own model on its own.

alyxya · 2025-10-29T17:12:45 1761757965

Is the new model trained from scratch? What training data went into it?

dfltr · 2025-10-29T17:29:07 1761758947

Is it true that Cheetah is Grok Code Fast 2? Does this mean that the new Cursor model is also based on Grok?

srush · 2025-10-29T17:32:39 1761759159

Cheetah was an earlier (and dumber) version of this model that we used to test production speed. They are both developed in-house. If you liked Cheetah, give this model a try.

carlosbaraza · 2025-10-29T18:00:38 1761760838

This is nice. I liked Cheetah for grunt work that I want to get out quickly and is not too hard. The speed is really awesome. A model that would run at even higher speeds like the OSS models at groq/cerebras would really be workflow changing, because the slowness of SOTA models really breaks the flow. I find myself taking a ton of breaks and getting distracted while I wait for a model to complete a task (e.g. just now).

srush · 2025-10-29T18:10:18 1761761418

Let us know how you like it.

dfltr · 2025-10-29T17:47:29 1761760049

Awesome, thanks for the clarification. So are the rumors around Cheetah being based on a Grok model just straight up untrue? I want to try Composer but have a pretty strict no X/Grok policy.

srush · 2025-10-29T18:09:56 1761761396

Straight up untrue.

MysticFear · 2025-10-29T18:01:30 1761760890

There is a youtube livestreamer building with it now, if you are looking for direct feedback: https://www.youtube.com/watch?v=1bDPMVq69ac

srush · 2025-10-29T19:24:54 1761765894

neat!

dlojudice · 2025-10-29T23:59:07 1761782347

Congratulations on your work. I spent the day working with a mix of the Composer/Sonnet 4.5/Gemini 2.5 Pro models. In terms of quality, the Composer seems to perform well compared to the others. I have no complaints so far. I'm still using Claude for planning/starting a task, but the Composer performed very well in execution. What I've really enjoyed is the speed. I had already tested other fast models, but with poor quality. Composer is the first one that combines speed and quality, and the experience has been very enjoyable to work with.

juanma0216 · 2025-10-29T19:18:14 1761765494

I prefer the approach of focusing on faster models despite their lower intelligence because I want my IDE to fly when I can see the code. I find this useful when I need to manually debug something that any model is able to do, so I know it's going to fail but at least it will fail fast. On the other hand, if I need more intelligence I have my other CLI that doesn't allow me to see the code but gets the planning and difficult code done.

srush · 2025-10-29T19:24:00 1761765840

Our view is that there is a now a minimal amount of intelligence that is necessary to be productive, and that if you can pair that with speed that is awesome.

nickpsecurity · 2025-10-30T03:42:37 1761795757

What's funny is there's many industries outside A.I. that pick their talent the same way. ;)

pdeva1 · 2025-10-29T17:36:19 1761759379

is Composer a fine tune of an existing open source base model?

srush · 2025-10-29T17:44:47 1761759887

Our primary focus is on RL post-training. We think that is the best way to get the model to be a strong interactive agent.

comex · 2025-10-29T17:48:24 1761760104

So, yes, but you won’t say what the base model is? :)

typpilol · 2025-10-30T07:03:32 1761807812

It seems like a sort of sonnet model as a lot of people are reporting it like to spam documentation on Twitter like sonnet 4.5

smg · 2025-10-29T20:11:55 1761768715

Can you please tell us more about how you used Ray for setting up the RL infrastructure?

srush · 2025-10-29T20:29:51 1761769791

Oh good question. Actually speaking at the Ray Summit next week in SF so we will talk more about it. We used Ray throughout the pipeline for running evals, for the RL controller, for data collation, and for visualizations. One tool we found helpful was Ray Data which let us easily scale over data and run logs.

nvartolomei · 2025-10-29T20:38:27 1761770307

Please share more about Ray Data use case.

srush · 2025-10-29T20:58:52 1761771532

We use Ray data for our map-style processing jobs. For example one tool have runs over all the rollouts from the RL system and collects qualitative statistics to understand which type of agent trajectories are being reward, and what types of searches and terminal commands are being made.

ripped_britches · 2025-10-29T23:26:35 1761780395

Amazing work! The UX is great.

GPT-5-codex does more research before tackling a task, that is the biggest weakness for me not using Composer yet.

Could you provide any color on whether ACP (from zed) will be supported?

az226 · 2025-10-29T18:25:17 1761762317

How many times have you needed to reset the optimizer during the RL training cycles?

carlosbaraza · 2025-10-29T18:02:10 1761760930

How do you work with multiple agents?

srush · 2025-10-29T19:25:41 1761765941

We train with a single agent. is that the question?