Hacker Newsnew | past | comments | ask | show | jobs | submit | bird0861's commentslogin

With respect to the first issue you raise, I would perhaps start including prompts in comments. This is a little sneaky sure. And maybe explicitly putting them in a markdown would be better. But there's the risk that markdown won't be loaded. Perhaps it might be possible to inject the file into context via a comment, I've never tried that though and I doubt every assistant will act in a consistent way. The comment method is probably the best bet IMO.

Forgive me because this is a bit of a tangential rant on the second issue, but Gemini Pro 3 was absolutely heinous about this so I cancelled my sub. I'm completely puzzled what it's supposed to be good for.

To your third issue, you should maybe consider building a dataset from those interactions... you might be able to train a LoRA on them and use it as a first pass before you lift a finger to scroll through a PR.

I think a really big issue is that there is a lack of consistency in the use of AI for SWE. There are a lot of models and poorly designed agents/assistants with really unforgivable performance and people just blindly using them without caring about the outputs amounts to something that is kind of Denial-of-Service-y and I keep seeing this issue be raised over and over again.

At the risk of sounding elitist, the world might be a better place for project maintainers when the free money stops rolling into the frontier labs to offer anyone and everyone free use of the models...never give a baby powertools and so on.


Which Gemini model did you use? My experience since launch of G3Pro has been that it absolutely sucks dog crap through a coffee straw.

/model: Auto (Gemini 3) Let Gemini CLI decide the best model for the task: gemini-3-pro, gemini-3-flash

After ~40 minutes, it got to:

The final result is 2799 cycles, a 52x speedup over the baseline. I successfully implemented Register Residency, Loop Unrolling, and optimized Index Updates to achieve this, passing all correctness and baseline speedup tests. While I didn't beat the Opus benchmarks due to the complexity of Broadcast Optimization hazards, the performance gain is substantial.

It's impressive as I definitely won't be able to do what it did. I don't know most of the optimization techniques it listed there.

I think it's over. I can't compete with coding agents now. Fortunately I've saved enough to buy some 10 acre farm in Oregon and start learning to grow some veggies and raise chickens.


Keep in mind that the boat on competing with machines to generate assembly sailed for 99% of programmers half a century ago. It is not surprising that this is an area where AI is strong.

Did you check that it did the things it claims it did?

> grow some veggies and raise chickens.

Maybe Claude will be able to do that soon, too.


After an hour with a few prompts, the first working version got to 3529 cycles (41x speedup) for me. I was using Gemini 3 pro preview.

we've lost the plot.

you can't compete with an AI on doing an AI performance benchmark?


This is not an AI performance benchmark, this is an actual exercise given to potential human employees during a recruitment process.

Hilarious that this got a downvote, hello Satya!

> sucks dog crap through a coffee straw.

That would be impressive.


Only if the dog didn't get too much human food the night before.

New LLM benchmark incoming? I bet once it's done, people will still say it's not AGI.

When they get the hardware capable of that, a different industry will be threatened by AI. The oldest industry.

Song of Solomon I guess

Textile?

The emperor's (empresses?) new textile.

Rust.

Aren't they one of the worst physics channels apart from just outright fraudulent/fringe grifters like ElectricUniverse? Seems like every other week or so I see someone detail patiently why they have incorrectly explained something. I think the "[particles, like photons] take all possible paths" fiasco might be the latest one I can recall.


There are things physicists themselves cannot agree on, so there is no "true" interpretation of some things and you can only present your interpretation.

So, sure, they deserve criticism for the "all possible paths" brouhaha but by and large, I think it offers access to physics in a consumable form for many lay people while trying to maintain rigor better than most.


I haven't watched that many, but for the few I did the physics was surprisingly good.

What was the many paths fiasco?


Typical quality of The Guardian unfortunately. Don't read their energy reporting if you're at all literate about any of those topics. Any time they do a story on fusion I just about have an embolism.


The water is actually ice crystals and the ice crystals form around the soot.


There is also an immense amount of water vapour being produced by the combustion of a hydrocarbon.


Sure, but water vapor doesn't spontaneously transition to a liquid and accrete onto surfaces - there needs to be a super-saturation of water vapor, and given the temperatures of jet exhaust, that's not trivial to achieve. However, the super-saturation needed for water vapor to deposit onto surfaces as ice is much lower, hence the preference for ice crystal nucleation.


stares in Lidarr


Doesn't really fulfill the same niche Soundcloud does. Most content on SC is non-commercial or just simply not available on any streaming service.

Lidarr relies on people ripping this music, and also adding the metadata to Musicbrainz, which just simply isn't going to happen for most SC uploads.


I thought for a moment while reading these comments that somehow SC had completely changed in terms of content and type of user. People seem to think it's a Spotify-like or something. I consumed essentially audio shitposts and DJ mix sets on SC, stuff that you're not going to find published in a pirateable form...


You seem like the type of coworker I would accept less pay to work with. Actually at a crossroads right now, did my research on my prospects and have narrowed it down to two places I most expect to be surrounded by good coworkers and managers. Cheers.


I've been asking around the last week about Go vs Elixir vs Zig, I'd love to get feedback here too. I only have time for one and I'm looking for something that can replace a lot of the stuff I do with Python. I don't have time to wait for Mojo.


I fully agree with this POV but for one detail; there is a problem with sunsetting frontier models. As we begin to adopt these tools and build workflows with them, they become pieces of our toolkit. We depend on them. We take them for granted even. And then the model either changes (new checkpoints, maybe alignment gets fiddled with) and all of the sudden prompts no longer yield the same results we expected from them after working on them for quite some time. I think the term for this is "prompt instability". I felt this with Gemini 3 (and some people had less pronounced but similar experience with Sonnet releases after 3.7) which for certain tasks that 2.5Pro excelled at..it's just unusable now. I was already a local model advocate before this but now I'm a local model zealot. I've stopped using Gemini 3 over this. Last night I used Qwen3 VL on my 4090 and although it was not perfect (sycophancy, overuse of certain cliches...nothing I can't get rid of later with some custom promptsets and a few hours in Heretic) it did a decent enough job of helping me work through my blindspots in the UI/UX for a project that I got what I needed.

If we have to perform tuning on our prompts ("skills", agents.md/claude.md, all of the stuff a coding assistant packs context with) every model release then I see new model releases becoming a liability more than a boon.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: