Hacker Newsnew | past | comments | ask | show | jobs | submit | karmasimida's commentslogin

> If you would like to grieve, I invite you to grieve with me.

I think we should move past this quickly. Coding itself is fun but is also labour , building something is the what is rewarding.


By that logic prompting an AI is also labour.

It's not even always a more efficient form of labour. I've experienced many scenarios with AI where prompting it to do the right thing takes longer and requires writing/reading more text compared to writing the code myself.


Then you are using it the wrong way

Driving is a skill that needs to be learnt, same with working with agents.


I give a year, the realization would be brutal.

> With Codex (5.3), the framing is an interactive collaborator: you steer it mid-execution, stay in the loop, course-correct as it works.

> With Opus 4.6, the emphasis is the opposite: a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.

Ain't the UX is the exact opposite? Codex thinks much longer before gives you back the answer.


I've also had the exact opposite experience with tone. Claude Code wants to build with me, and Codex wants to go off on its own for a while before returning with opinions.

Its likely that both are steering towards the middle from their current relative extremes and converging to nearly the same place.

also my experience in using these two models. they are trying to recover from oversteer perhaps.

well with the recent delays i can easily find claude code going off on it's own for 20 minutes and have no idea what it's going to come back with. but one time it overflowed it's context on a simple question, and then used up the rest of my session window. in a way a lot of ai assistants have ime have this awkward thing where they complicate something in a non-visible and think about it for a long time burning up context before coming up with a summary based upon some misconception.

For complex tasks I ask ChatGPT or Grok to define context then I take it to Claude for accurate execution. I also created a complete pipeline to use locally and enrich with skills, agents, RAG, profiles. It is slower but very good. There is no magic, the richer the context window the more precise and contained the execution.

The key is a well defined task with strong guardrails. You can add these to your agents file over time or you can probably just find someone's online to copy the basics from. Any time you find it doing something you didn't expect or don't like, add guardrails to prevent that in future. Claude hooks are also useful here, along with the hookify plugin to create them for you based on the current conversation.

I have started using openspec for this. I find it works far better to have a proposal and a list of tasks the ai stays more focused.

https://openspec.dev/


In terms of 'tone', I have been very impressed with Qwen-code-next over the last 2 days, especially as I have it running locally on a single modest 4090.

Did you set that up following a guide or anything you could share?

Easiest way I know is to just use LMStudio. Just download and press play :). Optional, but recommended, increase the context length to 262144 if you have the DRAM available. It will definitely get slower as your interaction prolongs, but (at least for me) still tolerable speed.

not OP, but I got it running on my 4090 (and RAM) by following this guide: https://unsloth.ai/docs/models/qwen3-coder-next

I see around 30 t/s


Same here, CC gives me options to pick direction after the planning stage.

Yes, you’re right for 4.5 and 5.2. Hence they’re focusing on improving the opposite thing and thus are actually converging.

Codex now lets you tell the LLM tgings in the middle of its thinking without interrupting it, so you can read the thinking traces and tell it to change course if it's going off track.

That just seems like a UI difference. I've always interrupted claude code added a comment and it's continued without much issue. Otherwise if you just type the message is queued for next. There's no real reason to prefer one over the other except it sounds like codex can't queue messages?

Codex can queue messages, but the queue only gets flushed once the agent is done with whatever it was working on, whereas Claude will read messages and adjust accordingly in the middle of whatever it is doing. It sounds like OP is saying that Codex can now do this latter bit as well.

The problem is if you're using subagents, the only way to interject is often to press escape multiple times which kills all the running subagents. All I wanted to do was add a minor steering guideline.

This might be better with the new teams feature.


They actually made a change a few weeks ago that made subagents more steerable

When they ask approval for a tool call, press down til the selector is on "No" and press tab, then you can add any extra instructions


That is so annoying too because it basically throws away all the work the subagent did.

Another thing that annoys me is the subagents never output durable findings unless you explicitly tell their parent to prompt the subagent to “write their output to a file for later reuse” (or something like that anyway)

I have no idea how but there needs to be ways to backtrack on context while somehow also maintaining the “future context”…


This is most likely an inference serving problem in terms of capacity and latency given that Opus X and the latest GPT models available in the API have always responded quickly and slowly, respectively

Your feeling is not my feeling, codex is unambiguously smarter model for me

For those who cared:

GPT-5.3-Codex dominates terminal coding with a roughly 12% lead (Terminal-Bench 2.0), while Opus 4.6 retains the edge in general computer use by 8% (OSWorld).

Anyone knows the difference between OSWorld vs OSWorld Verified?


From Claude 4.6 Thinking:

OSWorld is the full 369-task benchmark. OSWorld Verified is a ~200-task subset where humans have confirmed the eval scripts reliably score success/failure — the full set has some noisy grading where correct actions can still get marked wrong.

Scores on Verified tend to run higher, so they're not directly comparable.


I think people here need to accept that software is becoming electricity, you get charged when you use it and by how much. You don't pay for a box shaped electricity or purple color electricity, it is just electricity.

It is obvious.

A middle 100-500 heads firm don't need enterprise level SaaS, a vibe coded website will suit them better.

Fundamentally, those workflow/orchestration SaaS needs to answer the question why people should pay you premium while only getting 80% where they want to be.


Not to rain on the parade, but this app feels to me ... unpolished. Some of the options in the demo feels less thought out and just put together.

I will try it out, but is this just me, or product/UX side of recent OpenAI products are sort of ... skipped over? It is good that agents help ship software quickly, but please no half-baked stuff like Altas 2.0 again ...


I don’t get why they announce it as a “Mac app” when the UI looks and feels nothing like a Mac app. Also electron… again.

Why not flex some of those codex skills for a proper native app…


What else do you expect from vibecoding? Even the announcement for this app is LLM generated.

This is true. The font and animation feels basic to me, even as a programmer focused app

Regardless, knowing syntax of programming language or remember some library API, is a dead business.

I for one am quite happy to outsource this kind of simply memorisation to a machine. Maybe it's the thin end of the slippery slope? It doesn't FEEL like it is but...

Why even learn how to read when you can just yell at the computers?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: