I've been running Claude Code in my Cursor IDE for a while now via extension. I like the setup, and I direct Claude on one task at a time, while still having full access to my code (and nice completions via Cursor). I still spend time tweaking, etc. before committing. I have zero interest in these new "swarms of agents" they are trying to force on us from every direction. I can barely keep straight my code working on one feature at a time. AI has greatly helped me speed that up, but working serially has resulted in the best quality for me. I'll likely drop Cursor for good now and switch back to vanilla VsCode with CC.
I just wish Claude code would also offer fast inline auto complete. Sometimes I’ll just want to have a function definition or some boilerplate spelled out without waiting for the slow Claude response. Or actively switching models.
——-
Maybe I can set up a shortcut for that?
Is there a significant difference between Claude Code in VSCode and Copilot in VSCode? I’ve been using Copilot with the Claude models (including Sonnet/Opus 4.6) and it seems to work spectacularly.
My subscription is only $10 a month, and it has unlimited inline suggestions. I just wonder if I’m missing anything.
I tried copilot for a bit in vscode as well with opus and felt something was off. Somehow as if copilots harness around it just wasn’t as good. But I can’t give solid prove.
> Is there a significant difference between Claude Code in VSCode and Copilot in VSCode? I’ve been using Copilot with the Claude models (including Sonnet/Opus 4.6) and it seems to work spectacularly.
Most models are limited to 200k context in GitHub Copilot. The Claude models are now 1M context elsewhere.
The $10/month plan offers a quite limited number of tokens for advanced models. And if you are not careful and set the model to Auto it will quickly deplete them.
Not a real solution but you could try using AquaVoice for dictation. It can gather screen context so you just say the function name out loud and it capitalizes and spells everything correctly. (Even hard cases!)
This. I have effectively used multiple agents to do large refactors. I have not used them for greenfield development. How are folks leveraging the agentic swarm, and how are you managing code quality and governance? Does anyone know of a site that highlights code, features, or products produced by this type of development?
I think it would be fantastic to have a reference site for significant, complex projects either developed or substantially extended primarily via agent(s). Every time I look at someone's incredible example of a workflow for handling big context projects, it ends up being a greenfield static microblog example with vague, arm-wavey assertions that it will definitely scale.
You don't have to use swarms if you don't need them though, and in fact you can continue using the editor view with the side chat like before. Why switch away now just because this optional UI was announced?
Same setup here. Claude Code in the terminal, one task at a time. The swarm thing never clicked for me. When I'm building I need to hold the full context in my head, and watching the agent work is actually part of that. I catch things I missed in my own prompt while it's thinking. Parallelizing that would just mean reviewing code I have no mental model for. Serial is slower on paper but the code actually works at the end.
I think these products are trying to capture no-coders, which is a recipe for disaster. They're trying to create architectures so people can say "build me X" and the agents perform magic end-to-end, output a hot pile of garbage. The actual value here is taking the finger-to-keyboard burden off the user and abstracting up to architect level. That means you still need to be able to review the goddamn code and offer an opinion on it to end up with something good. AI slop comes from people who don't have the skills and context to offer any valuable opinion or pushback to the AI.
Vanilla CC is the best IMO.
>I have zero interest in these new "swarms of agents"
I think you misunderstand "swarms of agents", based on what you say above. An agent swarm, in my understanding and checked via a google search, does not imply working on multiple features at one time.
It is working on one feature with multiple agents taking different roles on that task. Like maybe a python expert, a code simplifier, a UI/UX expert, a QA tester, and a devils advocate working together to implement a feature.
"Expertise" is a completely different beast from "knowledge".
Expecting to gain it from a model only through prompting is similar to expecting to become capable of something only because you bought a book on the topic.
> does not imply working on multiple features at one time.
How can multiple parallel agents some local and some in the cloud be working on a single task?
How can:
> All local and cloud agents appear in the sidebar, including the ones you kick off from mobile, web, desktop, Slack, GitHub, and Linear.
(From the announcement, under “Run many agents in parallel”)
…be working on the same task?
Subagents are different, but the OP is not confused about what cursor is pushing, and it is not what you describe.
Same way a developer and designer can work on the same feature during the same week? Or two developers working on the same feature during the same week. They can have a common api contract and then one builds the frontend and the other works on the backend.
Subagents are isolated context windows, which means they cannot get polluted as easily with garbage from the main thread. You can have multiple of them running in parallel doing their own separate things in service of whatever your own “brain thread”… it’s handy because one might be exploring some aspect of what you are working on while another is looking at it from a different perspective.
I think the people doing multiple brain threads at once are doing that because the damn tools are so fucking slow. Give it little while and I’m sure these things will take significantly less time to generate tokens. So much so that brand new bottlenecks will open up…
They are confused in the word they use: the article on what Cursor is pushing does not, according to ^F, mention "swarm" at all. Since we have a word for multiple agents working on one task, it is probably best not to use that word if you are referring to multiple agents working on multiple tasks, right?
I bring it up not to be pedantic, but because if you think it implies multi-tasking and dismiss it, you are missing out on it's ability to help in single-tasking.
I think cursor doesn't make distinction between single or multiple logical tasks for swarm-like workloads. Subagents is the word they use for the swarm workers.
Fwiw when I select multiple models for a prompt it just feeds the same prompt to them in parallel (isolated worktrees), this isn't the same as the swarm pattern in 2.4+ (default no worktrees).
I did, but having the buttons on the bottom vs the side is a deal breaker for me, esp. since they are VERY tiny on my 4K screen. I can barely even get my mouse over them, and it seems they aren't movable to the left side like VSC? Am I missing something? Hard to believe this shipped, it is unusable for me.
Because the AI apologists cannot deal with the much studied and proven placebo effect of perceived increased productivity, so they have to try and make themselves feel
better by claiming that others are lagging behind in a race no one else is really interesting in running.
> have zero interest in these new "swarms of agents" they are trying to force on us from every direction.
Good for you! Personally waiting for one agent to do something while I shove my thumb up my butt just waiting around for it to generate code that I'll have to fix anyway is peak opposite of flow state, so I've eagerly adopted agents (how much free will I had in that decision is for philosophers to decide) so there's just more going on so I don't get bored. (Cue the inevitable accusations of me astroturfing or that this was written by AI. Ima delve into that one and tell there was not. Not unless you count me having stonks in the US stock market as being paid off by Big AI.)
I have personally found that I cannot context switch between thinking deeply about two separate problems and workstreams without a significant cognitive context-switching cost. If it's context-switching between things that don't require super-deep thought, it's definitely doable, but I'm still way more mentally burnt-out after an hour or two of essentially speed-running review of small PRs from a bunch of different sources.
Curious to know more about your work:
Are your agents working on tangential problems? If so, how do you ensure you're still thinking at a sufficient level of depth and capacity about each problem each agent is working on?
Or are they working on different threads of the same problem? If so, how do you keep them from stepping on each other's toes? People mention git worktrees, but that doesn't solve the conflict problem for multiple agents touching the same areas of functionality (i.e. you just move the conflict problem to the PR merge stage)
It's easier when I have 10 simple problems as a part of one larger initiative/project. Think like "we had these 10 minor bugs/tweaks we wanted to make after a demo review". I can keep that straight. A bunch of agents working in parallel makes me notably faster there though actually reviewing all the output is still the bottleneck.
It's basically impossible when I'm working on multiple separate tasks that each require a lot of mental context. Two separate projects/products my team owns, two really hard technical problems, etc. This has been true before and after AI - big mental context switches are really expensive and people can't multitask despite how good we are at convincing ourselves we can.
I expect a lot of folks experience here depends heavily on how much of their work is the former vs the later. I also expect that there's a lot of feeling busy while not actually moving much faster.
Yes, also doesn't work for me. If the changes are simple, it is fine but if the changes are complex and there isn't a clear guideline then there is no AI that is good enough or even close to it. Gives you a few days of feeling productive and then weeks of trying to tidy up the mess.
Also, I have noticed, strangely, that Claude is noticeably less compliant than GPT. If you ask a question, it will answer and then try to immediately make changes (which may not be related). If you say something isn't working, it will challenge you and it was tested (it wasn't). For a company that is seems to focus so much on ethics, they have produced an LLM that displays a clear disregard for users (perhaps that isn't a surprise). Either way, it is a very bad model for "agent swarm" style coding. I have been through this extensively but it will write bad code that doesn't work in a subtle way, it will tell that it works and that the issues relate to the way you are using the program, and then it will do the same thing five minutes later.
The tooling in this area is very good. The problem is that the AI cannot be trusted to write complex code. Imo, the future is something like Cerbaras Code that offers a speed up for single-threaded work. In most cases, I am just being lazy...I know what I want to write, I don't need the AI to do it, and I am seeing that I am faster if I just single-thread it.
Only counterpoint to this is that swarms are good for long-running admin, housekeeping, etc. Nowhere near what has been promised but not terrible.
How does one work with a team of developers to solve larger problems? You break down the problems into digestible chunks and have each teammate tackle a stack of those tasks.
Its far closer to being a project manager than it is being a solo developer.
I tried swarms as well, but I came back to you as well. It’s not worth it even th e small worse description double-checking, fine-tuning is not worth the effort the worse code will cost me in the future. Also when I don’t know about it.
It's not that difficult. You get it to work on one deep problem, then another does more trivial bug fixes/optimizations, etc. Maybe in another you're architecting the next complex feature, another fixes tests, etc etc
Unrelated problems simultaneously in the same git tree. Worktrees are unnecessary overhead if the area they're working in are disjoint. My Agents.md has instructions to commit early and often instead of one giant commit at the end, otherwise it wouldn't work.
> how do you ensure you're still thinking at a sufficient level of depth and capacity about each problem each agent is working on?
The context switching is hell and I have to force myself to dig deep into the MD file and understand things and not just rubber stamp the LLM output. It would be dishonest of me to say that I'm always 100% successful at that though.
I find it puzzling whenever someone claims to reach "flow" or "zen state" when using these tools. Reviewing and testing code, constantly switching contexts, juggling model contexts, coming up with prompt incantations to coax the model into the right direction, etc., is so mentally taxing and full of interruptions and micromanagement that it's practically impossible to achieve any sort of "flow" or "zen state".
This is in no way comparable to the "flow" state that programmers sometimes achieve, which is reached when the person has a clear mental model of the program, understands all relevant context and APIs, and is able to easily translate their thoughts and program requirements into functional code. The reason why interrupting someone in this state is so disruptive is because it can take quite a while to reach it again.
Working with LLMs is the complete opposite of this.
Thank you so much. These comments let me believe in my sanity in an over-hyped world.
I see how people think its more productive, but honestly I iterate on my code like 10-15 times before it goes into production, to make sure it logs the right things, it communicates intent clearly, the types are shared and defined where they should be. It’s stored in the right folder and so on.
Whilst the laziness to just pass it to CC is there I feel more productive writing it on my own, because I go in small iterations. Especially when I need to test stuff.
Let’s say I have to build an automated workflow and for step 1 alone I need to test error handling, max concurrency, set up idempotency, proper logging. Proper intent communication to my future self. Once I’m done I never have to worry about this specific code again (ok some error can be tricky to be fair), but often this function is just practically my thought and whenever i need it. This only works with good variable naming and also good spacing of a function. Nobody really talks about it, but if a very unimportant part takes a lot of space in a service it should be probably refactored into a smaller service.
The goal is to have a function that I probably never have to look again and if I have to do it answers me as fast as possible all the questions my future self would ask when he’s forgotten what decisions needed to be made or how the external parts are working. When it breaks I know what went wrong and when I run it in an orchestration I have the right amount of feedback.
As others I could go very long about that and I’m aware of the other side of the coin overengineering, but I just feel that having solid composable units is just actually enabling to later build features and functionality that might be moat.
Slow, flaky units aren’t less likely to become an asset..
And even if I let AI draft the initial flow, honestly the review will never be as good as the step by step stuff I built.
I have to say AI is great to improve you as a developer to double check you, to answer (broad questions), before it gets to detailed and you need to experiment or read docs. Helps to cover all the basics
So don't write slow flakey unit tests? Or better yet, have the AI make them not slow and not flakey? Of if you wanna be old school, figure out why they're flakey yourself and then fix it? If it's a time thing then fix that or if it's a database thing then mock the hell out of that and integration test, but at this point if your tests suck, you only have yourself to blame.
Sorry I don’t get your point and you didn’t seem to get mine.
I’m saying I would guess I’m faster building manually then to let AI write it, arguably it won’t even achieve the level I feel best with in the future aka the one having the best business impact to my project.
Also the way I semantically define unit tests is that they are instant and non-flaky as they are deterministic else it would be a service for me.
I can lose track of time watching a movie or playing a video game, but it's not what Mihály Csíkszentmihályi would call "flow state", but just immersion.
> Personally waiting for one agent to do something while I shove my thumb up my butt just waiting around for it to generate code that I'll have to fix anyway
I spend that time watching it think and then contemplating the problem further since often, as deep and elaborate as my prompts are, I've forgotten something. I suspect it might be different if you are building something like a CRUD app, but if you are building a very complicated piece of software, context switching to a new topic while it is working is pretty tough. It is pretty fast anyway and can write the amount of code I would normally write in half a day in like 15 minutes.
In my workflow, it's totally interactive: Give the LLM some instructions, wait very briefly, look at code diff #1, correct/fix it before approving it, look at code diff #2, correct/fix it before approving it, sometimes hitting ESC and stopping the show because the agent needs to be course corrected... It's an active fight. No way I'm going to just "pre-approve all" and walk away to get coffee. The LLMs are not ready for that yet.
I don't know how you'd manage a "swarm" of agents without pre-approving them all. When one has a diff, do you review it, and then another one comes in with an unrelated diff, and you context switch and approve that, then a third one comes in with a tool use it wants to do... That sounds absolutely exhausting.
It sounds like diff #2 depends on approval of diff #1? But with cursor it's a set of diffs that'll be retroactively approved or rejected one by one.
So you can get coffee during the thinking and still have interactive checks. Swarm changes nothing about this, except affecting the thinking time.
For my work I’ve never found myself sitting around with nothing to do because there’s always so much review of the generated code that needs to be done
The only way I can imagine needing to run multiple agents in parallel for code gen is if I’m just not reviewing the output. I’ve done some throwaway projects where I can work like that, but I’ve reviewed so much LLM generated code that there is no way I’m going to be having LLMs generate code and just merge it with a quick review on projects that matter. I treat it like pair programming where my pair programmer doesn’t care when I throw away their work
Why is this comment so pale I cat read it? What’s the contrast on this is this accessible to anyone?
I’m guessing it was downvoted by the masses but at the same time I’d like the choice to be able to read it I’m not that into what the general public think about something.
I’m getting into downmaxxing at this point. I love that you have to earn being negative on this site. Give it to me.
Claude Code isn't really "all terminal" if you embed that terminal in your IDE. I still use Cursor (for now), but I embed a CC panel via extension. With this launch of Cursor 3, I'll probably get off Cursor for good. I have zero interest in this.
Probably momentum. It takes some effort to change tooling. This is why Cursor worked so well in the beginning. It just took over from VSCode seamlessly.
That is actually still React. React is React Core + ReactDOM (web) renderer. React Native is the React Core + native renderer. They are both still React, and they both use Javascript, which while fast when JIT'd, is typically much slower than native code.
I'm not commenting on whether this is a good or bad thing, but the article strikes me as a bit misleading.
No automatic restarts! I understand that in our security patching world that patching and restarting automatically is the default, fine, but there absolutely should be a dead simple way of disabling auto restarts in settings. I'm fine if it pesters me to restart or whatever, perhaps with growing alarm the longer I wait, but it should always be optional in the end. There are just no words for how bad it can be for mission critical workloads when your computer restarts without your consent. Please make disabling this simple.
I disagree, at least on end-user devices as opposed to servers.
If you make it possible to defer updates indefinitely, users will. Guaranteed. Doesn't matter how urgent or critical the update is, how bad the bug or vulnerability it patches is, how disastrous the consequences may be: they'll never, ever voluntarily apply them.
If you're running a server, and willing to accept the risk of deferral because 1) you're in a better position to assess the risk and apply compensating controls than a regular user is, and 2) you're OK accepting the personal risk of having to explain to your boss why you kept deferring the urgent patch until after it blew up in your face, then yes, you should have a control to delay or disable it.
But end users? No. I use to believe otherwise, but now I've seen far, far too many cases where people train themselves to click "Delay 1 day" without even consciously seeing the dialog.
The real sin is combining security updates with feature updates. An argument can be made for enforced security updates(1). There is no good argument for forcing feature updates.
Most security-only updates have a low risk of interfering with with the user or causing instability. Most feature updates have a high risk of doing so.
(1) Although I think there should be some way of disabling even those, even if that way is hard to find and/or cumbersome to keep the regular users away.
The problem is that there's dozens of security updates every month, so even if you can skip feature updates, you'll have to reboot every Patch Tuesday anyway.
Even the Server Core edition, which has a much smaller "surface area" needs reboots almost every month.
Alright, I can buy that. Although from a dev POV I can also appreciate the not-fun of testing a combinatorial explosion of security updates vs features.
Basically, if I trust you (the dev/software maker/whatever) to not change UIs and add in bullshit, I'm okay having auto updates on. Unfortunately can't trust much now
> I disagree, at least on end-user devices as opposed to servers.
And who determines what is an "end-user device" vs a "server"?
> If you're running a server, and willing to accept the risk of deferral because 1) you're in a better position to assess the risk and apply compensating controls than a regular user is, and 2) you're OK accepting the personal risk of having to explain to your boss why you kept deferring the urgent patch until after it blew up in your face, then yes, you should have a control to delay or disable it.
So you do want choice after all it seems. Who do you think should make this choice on risk vs. workload/criticality?
I would say you actually agree with me mostly based on your comments, but you have not clarified _who_ makes these choices. I'm saying as the consumer, _I_ should get to make that choice. In the enterprise, my admin will make that choice via group policy, but I do not want Microsoft determining what I'm allowed to do with my OS. They are of course free to keep doing that, but then I also have the right to keep not buying their products.
No thanks. I should be able to use any copy of Windows for whatever use case I want. MS is free to disagree, and I am therefore free to keep not buying their products.
I'm the wrong person to ask about that. I've gone ages between Debian reboots while applying regular updates, and I'm not sure what it is about the Windows model that requires a reboot after patching a few things.
Fedora also wants to reboot to install (dnf) updates offline, as I understand it's to prevent potential instability from running processes getting confused when their files get swapped out under their feet.
It's also good since you can't swap out the kernel without rebooting.
I assume Microsoft took the same approach, just replace everything offline then reboot into a fully up-to-date system without any chance of things in RAM still being outdated.
These automatic restarts are just the outcome of bigger problem with how Windows Update has been changed initially in W10. Namely the removal of selective updates installing and indirectly lack of QA, are the main sources of problems here.
Windows isn't MacOS that runs on set of verified configurations - it runs on variety of hardware with vendor drivers and other software. That combined may cause issues but so lack of testing - we know that Microsoft in its wisdom dismantled QA and replaced it with this prosthetics of enthusiasts community that all the time suggest "sfc /scannow". Now they put Charlie Bell in role of "engineering quality" position but I have no hope that something will change with a good outcome for users.
And users should be again allowed to avoid updates which were proven to cause issues - that's the fundamental need here. Deterring a scheduled action isn't enough.
Considering Windows behavior, all the telemetry that was smuggled to W7 in poorly described updates, I see how appealing is to Microsoft to use this big updates package format and add features, components which surely would be avoided by experienced users. Since W10 and maybe even partially during W7 they're fighting their users when it comes to control over operating system.
I'm on CachyOS now but I still get calls from friends who struggle with all this MS circus. Recently, this friend lost data to bitlocker encrypted machine because she didn't had backup keys. She's that kind of user that doesn't know what happens on the screen beside text processor and web browser - everything is a nuance that has to be quickly dealt with by "next next done" tactic. Should she be more patient and read what's being displayed on the screen - sure but I've told her that years ago.
Anyway, CachyOS: arch-update renders a popup in KDE about recommended restart, sometimes update process requires restarting services and users can select ones it needs or everything listed altogether. There's snapshots support for updates: https://wiki.cachyos.org/configuration/btrfs_snapshots/ and pretty sure other distributions have this as an option as well.
HN is the best tech site on the web for a reason. It has a generally intelligent audience, and while there are certainly inappropriate comments, compared to what you find on social media or even other sites, it is unique and far more respectful. Due to this, you can often have better and more meaningful discussions.
Sadly, probably not. I fear new languages will struggle from here on out. As a language guy, very few things in this new AI world make me more sad than this.
I don't get the feeling this will happen. LLMs are extremely good at learning new languages because that's basically their whole point. If your new language has a standard library, and the LLM can see its source code, I am sure you can give it to any last-generation AI and it will happily spit out perfectly correct new code in it. If you give it access to a reference docs, then it can even ensure it never generates syntactically incorrect code quite well. As long as your error messages are enough to understand what a problem's root cause is, the LLM will iterate and explore until it gets it right.
Not sure if this is a good example, but I used ChatGPT (not even Codex) to fix some Common Lisp code for me, and it absolutely nailed it. Sure, Common Lisp has been around for a long time, but there's not so much Common Lisp code around for LLMs to train on... but OTOH it has a hyperspec which defines the language and much of the standard libraries so I believe the LLM can produce perfect Common Lisp based on mostly that.
I'm getting ~30 tok/s on the A3B model with my 3070 Ti and 32k context.
> Do you feel you could replace the frontier models with it for everyday coding? Would/will you?
Probably not yet, but it's really good at composing shell commands. For scripting or one-liner generation, the A3B is really good. The web development skills are markedly better than Qwen's prior models in this parameter range, too.
Thinking about getting a new MBP M5 Max 128GB (assuming they are released next week). I know "future proofing" at this stage is near impossible, but for writing Rust code locally (likely using Qwen 3.5 for now on MLX), the AIs have convinced me this is probably my best choice for immediate with some level of longevity, while retaining portability (not strictly needed, but nice to have). Alternatively was considering RTX options or a mac studio, but was leaning towards apple for the unified memory. What does HN think?
I've been mulling the same, but decided against (for now)
Using Claude Code Max 20 so ROI would be maybe 2+ years.
CC gives me unlimited coding in 4-6 windows in parallel. Unsure if any model would beat (or even match) that, both in terms in quality and speed.
I wouldn't gamble on that now. With a subscription, I can change any time. With the machine, you risk that this great insane model comes out but you need 138GB and then you'll pay for both.
Thermals. Your workloads will be throttled hard once it inevitably runs hot. See comments elsewhere in thread about why LLMs on laptops like MBP is underwhelming. The same chips in even a studio form factor would perform much better.
Strix Halo machines are a good option too if you are at all price sensitive. AMD (with all the downsides of that for AI work) but people are getting decent performance from them.
I have a Mac Studio with 128GB and a M4 Max and I'd recommend it. The power usage is also pretty good, but you may not care if you live somewhere where energy is cheap.
Have you used this for Rust coding by chance? I'm curious how it compares to Opus 4.6. I realize it isn't going to think to the same level, but curious how code quality is for a more straight forward task.
> The computer science answer: a compiler is deterministic as a function of its full input state. Engineering answer: most real builds do not control the full input state, so outputs drift.
To me that implies the input isn't deterministic, not the compiler itself
You're not wrong but I think the point is to differentiate between the computer science "academic" answer and the engineering "pragmatic" answer. The former is concerned about correctly describing all possible behavior of the compiler, whereas the latter is concerned about what the actual experience is when using the compiler in practice.
You might argue that this is redefining the question in a way that changes the answer, but I'd argue that's also an academic objection; pragmatically, the important thing isn't the exact language but the intent behind the question, and for an engineer being asked this question, it's a lot more likely that the person asking has context for asking that cares about more than just the literal phrasing of "are compilers deterministic?"
> ... the important thing isn't the exact language but the intent behind the question ...
If we're not going to assume the input state is known then we definitely can't say what the intent behind the question is - for many engineering applications the compiler is deterministic. Debian has the whole reproducible builds thing going which has been a triumph of pragmatic engineering on a remarkable scale. And suggests that, pragmatically, compilers may be deterministic.
It matters a lot. For instance, many compilers will put time stamps in their output streams. This can mess up the downstream if your goal is a bit-by-bit identical piece of output across multiple environments.
And that's just one really low hanging fruit type of example, there are many more for instance selecting a different optimization path when memory pressure is high and so on.
Also, most real build systems build from a clean directory and checkout.. so, outside of a dev's machine they should be 100% reproducible, because the inputs should be reproducible. If builds aren't 100% reproducible that's an issue!
> To me that implies the input isn't deterministic, not the compiler itself
or the system upon which the compiler is built (as well as the compiler itself) has made some practical trade offs.
the source file contents are usually deterministic. the order in which they're read and combined and build-time metadata injections often are not (and can be quite difficult to make so).
I mean, if you turn off incremental compilation and build in a container (or some other "clean room" environment), it should turn out the same each time. Local builds are very non-deterministic, but CI/CD shouldn't be.
Either way it's a nitpick though, a compiler hypothetically can be deterministic, an LLM just isn't? I don't think that's even a criticism of LLMs, it's just that comparing the output of a compiler to the output of an LLM is a bad analogy.
> I mean, if you turn off incremental compilation and build in a container (or some other "clean room" environment), it should turn out the same each time. Local builds are very non-deterministic, but CI/CD shouldn't be.
lol, should. i believe you have to control the clock as well and even then non-determinism can still be introduced by scheduler noise. maybe it's better now, but it used to be very painful.
> Either way it's a nitpick though, a compiler hypothetically can be deterministic, an LLM just isn't? I don't think that's even a criticism of LLMs, it's just that comparing the output of a compiler to the output of an LLM is a bad analogy.
llm inference is literally sampling a distribution. the core distinction is real though, llms are stochastic general computation where traditional programming is deterministic in spirit. llm inference can hypothetically be deterministic as well if you use a fixed seed, although, like non-trivial software builds on modern operating systems, squeezing out all the entropy is a non-trivial affair. (some research labs are focused on just that, deterministic llm inference.)
reply