This comment would make sense 6 months ago. Now it is much, much, much more likely any given textually answerable problem will be way easier for a bleeding edge frontier AI than a human, especially if you take time into account
Juniors from non target schools are getting pushed out since the skill floor is too high.
I graduated 9 months ago. In that time I've merged more PRs than anyone else, reduced mean time to merge by 20% on a project with 300 developers with an automated code review tool, and in the past week vibe coded an entire Kubernetes cluster that can remotely execute our builds (working on making it more reliable before putting it into prod).
None of this matters.
The companies/teams like OpenAI or Google Deepmind that are allegedly hiring these super juniors at huge salaries only do so from target schools like Waterloo or MIT. If you don't work at a top company your compensation package is the same as ever. I am not getting promoted faster, my bonus went from 9% to 14% and I got a few thousand in spot bonuses.
From my perspective, this field is turning into finance or law, where the risk of a bad hire due to the heightened skill floor is so high that if you DIDN'T go to a target school you're not getting a top job no matter how good you are. Like how Yale goes to Big Law at $250k while non T14 gets $90k doing insurance defence and there's no movement between the categories. 20-30% of my classmates are still unemployed.
We cannot get around this by interviewing well because anyone can cheat on interviews with AI, so they don't even give interviews or coding assessments to my school. We cannot get around this with better projects because anyone can release a vibe coded library.
It appears the only thing that matters is pedigree of education because 4 years of in person exams from a top school aren't easy to fake.
Can I ask what you and others that posts things like this here -"What are you actually developing?"
People are posting about pull requests, use of AIs, yada yada. But they never tell us what they are trying to produce. Surely this should be the first thing in the post:
- I am developing an X
- I use an LLM to write some of the code for it ... etc.
- I have these ... testing problems
- I have these problems with the VCS/build system ...
Otherwise it is all generalised, well "stuff". And maybe, dare I say it, slop.
I'm hosting a Kubernetes cluster on Azure and trying to autoscale it to tens of thousands of vCPUs. The goal is to transparently replace dedicated developer workstations (edit: transparently replace compiling) because our codebase is really big and we've hired enough people this is viable.
edit: to clarify, I'm using recc which wraps the compiler commands like distcc or ccache. It doesn't require developers to give up their workspace.
Right now I'm using buildbarn. Originally, I used sccache but there's a hard cap on parallel jobs.
In terms of how LLMs help, they got me through all the gruntwork of writing jsonnet and dockerfiles. I have barely touched that syntax before so having AI churn it out was helpful to driving towards the proof of concept. Otherwise I'd be looking up "how do I copy a file into my Docker container".
AI also meant I didn't have to spend a lot of time evaluating competing solutions. I got sccache working in a day and when it didn't scale I threw away all that work and started over.
In terms of where the LLM fell short, it constantly lies to me. For example, it mounted the host filesystem into the docker image so it could get access to the toolchains instead of making the docker images self-contained like it said it would.
It also kept trying to not to the work, e.g. It randomly decides in the thinking tokens "let's fall back to a local caching solution since the distributed option didn't work" then spams me with checkmark emojis and claims in the chat message the distributed solution is complete.
A decent amount of it is slop, to be honest, but an 80% working solution means I am getting more money and resources to turn this into a real initiative. At which point I'll rewrite the code again but I'll pay closer attention now that I know docker better.
> The goal is to transparently replace dedicated developer workstation
Isn't there a less convoluted way of making the best engineers leave? I am half serious here. If you want your software to run slow, IT could equally well install corporate security software on developer laptops. Oops, I did it again. Oh well, in all seriousness, I have never seen any performance problem being solved by running it on Azure's virtualization. I am afraid you are replacing the hardware layer by a software layer with ungodly complexity, which you are sure of will be functionally incomplete.
Are you sure they don't have to fix the build pipeline first? Tens of thousands of vCPUs for a single compilation run, or to accommodate 100 developers who try to compile their own changes?
> I have never seen any performance problem being solved by running it on Azure's virtualization
Sorry, I wasn't clear. I am not virtualizing the workspace. I'm using `recc` which is like `distcc` or `ccache` in that it wraps the compiler job. Every developer keeps their workstation. It just routes the actual `clang` or `gcc` calls to a Kubernetes cluster which provides distributed build and cache.
> Isn't there a less convoluted way of making the best engineers leave?
We have 7000+ compiler jobs in a clean build because it is a big codebase. People are waiting hours for CI.
I'm sure that drives attrition and bringing that down to minutes will help retain talent.
> Tens of thousands of vCPUs for a single compilation run, or to accommodate 100 developers who try to compile their own changes?
Because it uses remote execution, it will ideally do both. My belief is that an individual developer launching 6000 compiler jobs because they changed a header will smooth out over 300 developers that generally do incremental builds. Likewise, this'll eliminate redundant recompilation when git pulling since this also serves as a cache.
Thanks for expanding on it, now it's more clear what you want to achieve. If I see things like this, it seems Linus was up to something for banning C++. That sounds like a nasty compilation scheme, but I guess the org has painted itself too deep into that corner to get out of it.
This makes absolutely no sense to me. Are you really recompiling 6000 things each time a dev in the company needs to add a line somewhere in the codebase?
Have you thought about splitting that giant thing in smaller chunks?
> Are you really recompiling 6000 things each time a dev in the company needs to add a line somewhere in the codebase?
It happens when someone modifies a widely included header file. Which there are a lot of thanks to our use of templates. And this is just our small team of 300 people.
> Have you thought about splitting that giant thing in smaller chunks?
Yes. We've tried but it's not scaling. Unfortunately, we've banned tactics like pImpl and dynamic linking that would split a codebase unless they're profiled not to be on a hot path. Speed is important because I'm writing tests for a semiconductor fab and test time is more expensive than any other kind of factory on Earth.
I tried stuff like precompiled headers but the fact only one can be used per compilation job meant it didn't scale to our codebase.
Thanks for the detailed breakdown. The template header cascade problem makes total sense, I underestimated how bad it gets at scale with heavy template usage.
The semiconductor fab constraint is interesting too. When test time costs that much per minute, banning pImpl on hot paths is a pretty obvious call, even if it makes compile times painful.
Appreciate the real-world context.
You seem exceptionally bright. Most people are not like this. This is why they are struggling.
It sounds like you have a job, right out of college, but you're griping about not getting promoted faster. People generally don't get promoted 9 months into a job.
I'm reading your post and I am genuinely impressed but what you claim to have done. At the same time I am confused about what you would like to achieve within the first year of your professional career. You seem to be doing quite well, even in this challenging environment.
> At the same time I am confused about what you would like to achieve within the first year of your professional career.
I am in great fear of ending up on the wrong side of the K shaped recovery.
Everyone is telling me I need to be exceptional or unemployed because the middle won't exist in 2 years.
I want to secure the résumé that gives me the highest possibility of retaining emoloyment if there's a sudden AI layoff tomorrow. A fast career trajectory catches HR's eye even if they don't understand the technicals.
I mean you don’t need your first job go to top of the top companies. Your first job is to get you into the industry then you can flourish.
How many juniors OpenAI GDM are going to hire in a year, probably double digits at max, the chances are super slim and they are by nature are allowed to be as picky as they should be.
That being said, I do agree this industry is turning into finance/law, but that won’t last long either. I genuinely can’t foresee what if when AGI/ASI is really here, it should start giving human ideas to better itself, and there will be no incentive to hire any human for a large sum anymore, maybe a single digit individuals on earth perhaps
I vibe coded a Kubernetes cluster in 2 days for a distributed compilation setup. I've never touched half this stuff before. Now I have a proof of concept that'll change my whole organization.
That would've taken me 3 months a year ago, just to learn the syntax and evaluate competing options. Now I can get sccache working in a day, find it doesn't scale well, and replace it with recc + buildbarn. And ask the AI questions like whether we should be sharding the CAS storage.
The downside is the AI is always pushing me towards half-assed solutions that didn't solve the problem. Like just setting up distributed caching instead of compilation. It also keeps lying which requires me to redirect & audit its work. But I'm also learning much more than I ever could without AI.
> I vibe coded a Kubernetes cluster in 2 days for a distributed compilation setup. I've never touched half this stuff before. Now I have a proof of concept that'll change my whole organization.
Dunning-Kruger as a service. Thank God software engineers are not in charge of building bridges.
I'm writing a HIP (amd gpu kernels) linker in my job and the calling convention is contained in a metadata section in the object file.
Whether the array is passed in registers or pointers can be chosen by the compiler based on factors like profile guided optimization. That the ABI isn't stable doesn't matter because the linker handles it for the programmer.
This is all publicly documented in the llvm docs too so you can write your own loader.
> Instead children would own special devices that are locked down and tagged with a "underage" flag when interacting with online services, while adults could continue as normal.
California is mandating OSes provide ages to app stores, and HN lost their mind because it's a ban on Linux.
> California is mandating OSes provide ages to app stores,
They forgot to put in the provision which exempts apps which do not need an age rating? As in: everything os related.
Sounds like a good way to get rid of snap at least since that is where all the commercial bloat is located. Last time I did a fresh Debian install I do not remember installing any app from the os repository which would require age restrictions (afaik).
My company rewards impact because we work in a competitive industry and we feel like we cannot afford to waste time on unnecessary complexity. Our goal is to make money and if it doesn't meet business goals we don't waste time on it.
> Now, promotion time comes around. Engineer B’s work practically writes itself into a promotion packet: “Designed and implemented a scalable event-driven architecture, introduced a reusable abstraction layer adopted by multiple teams, and built a configuration framework enabling future extensibility.” That practically screams Staff+.
What are these companies where you have unlimited money to pay engineers to solve problems in less efficient ways??
Western militaries have a parallel commissioned officer and enlisted command structure where an O1 (junior officer) is technically senior to an E9 (senior enlisted NCO) and can order them around.
The idea is that command requires a separate set of skills and that experience needs to start early to have senior officers in their 50s.
In practice, junior officers are "advised" by senior enlisted on how to order people around and not taking that advice is a bad idea.
Kind of like how companies have managers and technical tracks where a line manager ignoring a senior technical person always blows up in their face.
I like this concept because everyone's thought of "commit the agent prompts and reproduce everything from scratch every time" as a "dumb idea" I'm unsure if anyone has actually executed on it in a snappy git-like UI.
Now, because the author took the time to work on it, we can see if this is actually a better method of software development. If LLM development continues deflating the cost of quality software, maybe this will turn out to be the future.
I just avoided $1.8 million/year in review time w/ parallel agents for a code review workflow.
We have 500+ custom rules that are context sensitive because I work on a large and performance sensitive C++ codebase with cooperative multitasking. Many things that are good are non-intuitive and commercial code review tools don't get 100% coverage of the rules. This took a lot of senior engineering time to review.
Anyways, I set up a massive parallel agent infrastructure in CI that chunks the review guidelines into tickets, adds to a queue, and has agents spit up GitHub code review comments. Then a manager agent validates the comments/suggestions using scripts and posts the review. Since these are coding agents they can autonomously gather context or run code to validate their suggestions.
Instantly reduced mean time to merge by 20% in an A/B test. Assuming 50% of time on review, my org would've needed 285 more review hours a week for the same effect. Super high signal as well, it catches far more than any human can and never gets tired.
Likewise, we can scale this to any arbitrary review task, so I'm looking at adding benchmarking and performance tuning suggestions for menial profiling tasks like "what data structure should I use".
This is what Google uses in their internal review systems - at least their AI team does this.
Heard a presentation from one of their AI engineers where they had a few slides about using multi-agent systems with different focuses looking through the code before a single human is pinged to look at the pull request.
Unfortunately I didn't graduate from Waterloo nor did I have referrals last year, so Google autorejects me from even forward deployed engineer roles without even giving me an OA.
Instead I get to maintain this myself for several hundred developers as a junior and get all my guidance from HN.
That sounds like a completely made up bullshit number that a junior engineer would put on a resume. There’s absolutely no way you have enough data to state that with anything approaching the confidence you just did.
It's definitely a resume number I calculated as a junior engineer. Feel free to give feedback on my math.
It is based on $125/hr and it assumes review time is inversely proportional to number of review hours.
Then time to merge can be modelled as
T_total = T_fixed + T_review
where fixed time is stuff like CI. For the sake of this T_fixed = T_review i.e. 50% of time is spent in review. (If 100% of time is spent in review it's more like $800k so I'm being optimistic)
T_review is proportional to 1/(review hours).
We know the T_total has been reduced by 23.4% in an A/B test, roughly, due to this AI tool, so I calculate how much equivalent human reviewer time would've been needed to get the same result under the above assumptions. This creates the following system of equations:
T_total_new = T_fixed + T_review_new
T_total_new = T_total * (1 - r)
where r = 23.4%. This simplifies to:
T_review_new = T_review - r * T_total
since T_review / T_review_new = capacity_new / capacity_old (because inverse proportionality assumption). Call this capacity ratio `d`. Then d simplifies to:
d = 1/(1 - r/(T_review/T_total))
T_review/T_total is % of total review time spent on PR, so we call that `a` and get the expression:
d = 1 / (1 - r/a)
Then at 50% of total time spent on review a=0.5 and r = 0.234 as stated. Then capacity ratio is calculated at:
d ≈ 1.8797
Likewise, we have like 40 reviewers devoting 20% of a 40 hr workweek giving us 320 hours. Multiply by original d and get roughly 281.504 hours of additional time or $31588/week which over 52 weeks is little over $1.8 million/year.
Ofc I think we cost more than $125 once you consider health insurance and all that, likewise our reviewers are probably not doing 20% of their time consistently, but all of those would make my dollar value higher.
The most optimistic assumption I made is 50% of time spent on review.
The feedback is don’t put it on a resume because it looks ridiculous. I can almost guarantee you that an A/B test design wasn’t rigorous enough for you to be that confident in your numbers.
But even if that is correct you need a much longer time frame to tell if reviews using this new tool are equivalent as a quality control measure.
And you have so many assumptions built in to this that are your number is worthless. You aren’t controlling for all the variables you need to control for. How do you know that workers spend 8 hours a week on reviews vs spending 2 hours and slacking off the other 6 hours? How do you know that the change of process created by using this tool doesn’t just cause the reviewers to work harder, but they’ll stop doing that once the novelty wears off? What if reviewers start relying on this tool to catch a certain class of errors for which it has low sensitivity?
It’s also a moot point if they don’t actually end up saving the money you say they will. It could be that all the savings is eaten up because of the reviewers just use the extra time to dick around on hacker news. It could just be that people aren’t able to make productive use of their time saved. Maybe they were already maxing out their time doing other useful activities.
All of this screams junior engineer took very limited results and extrapolated to say “saved the company millions” without nearly enough supporting evidence. Run your tool for 6 months, take an actual business outcome like time to merge PRs, measure that, and put that on your resume.
It’s incredibly common for a junior engineer to create some new tooling, and come up with some numbers to justify how this new tooling saves the company millions in labor. I have never once seen these “savings” actually pan out.
I took it off LinkedIn and replaced with time to merge reduction of 20% over two weeks of PRs (rounding down). I expect to justify the expenditure to non-technical managers in my current role, which is why I picked $s.
> All of this screams junior engineer took very limited results and extrapolated to say “saved the company millions” without nearly enough supporting evidence.
That's what the only person in my major who got a job at FAANG in California did, which is why I borrowed the strategy since it seems to work.
> I can almost guarantee you that an A/B test design wasn’t rigorous enough for you to be that confident in your numbers.
Shoot me an email about methodology! It's my username at gmail. I'd be happy to get more mentorship about more rigorous strategies and I can respond to concerns in less of a PR voice.
Most normal people want the LLM to remember their interests and favourite things, so they don't have to manually re-explain when asking for advice.
They also don't know what "context" is or that the LLM has a limited number of tokens it can understand at any given time. They just believe it knows everything at once.
Do you have example prompts where this would be usual? Why would you want an LLM to know your favorite type of cheese? Now that I say that, I guess if you use it for recipes then it's useful if it remembers things like dietary restrictions. And even then a project seems like the better option.
I can't think of much else though so I'm still curious what you or others use it for.
ChatGPT knows what's in my bar and what types of base liquors I love and/or can't drink. It knows what fruit, syrups and mixes are in my fridge. It knows that my friend is allergic to mint. It knows that when I ask for recommendations, I tend to want a choice between spirit forward, tiki, martini and herbaceous.
ChatGPT knows the broad strokes of the 3-4 main hardware projects I have on the go, and depending on the questions I'm asking, it will often structure its responses in a way that differentiates based on which one I'm thinking about.
It knows what resistor and capacitor values I have on my pick and place machine, and when I ask for divider ratios it will do its best to calculate based on those values to the degree that it will chain 1-2 resistors together to achieve those ratios.
I knows what kind of solder I use, and has warned me about components with sensitive reflow temperature concerns.
It's an extraordinarily useful feature for engineering and drinking, two things that are commonly found in the same Venn diagram.
Thank you! That helped me understand. Hobbies that you regularly do, and an LLM is continuously helpful for, benefiting from memory.
Personally, I would still be wary of the black box aspect -not knowing what it does remember and what it doesn't - so I would probably still use projects to make it more deterministic. But that's probably being overcautious and unnecessary in most common cases.
> It knows what resistor and capacitor values I have on my pick and place machine, and when I ask for divider ratios it will do its best to calculate based on those values to the degree that it will chain 1-2 resistors together to achieve those ratios.
Also relevant: it knows that you know what a resistor and capacitor is, and is able to tune responses to your level of knowledge. (It's not great at this, in my experience, since domain knowledge is still so jagged, but I think it's better than nothing.)
Can projects overlap? If not there’s general context information that’s often useful.
My job, my kids and time preferences around those things, my preferred tech setup and way of working and types of tech I’m better at. Things I already have (home assistant, little nuc, etc). I can throw a random question and not have to add this kind of information or manage it.
I get that those are the things that go into memory. What I don't get is what kind of prompt your job and kids are useful information for. Especially on the regular.
Science experiments explained at a few levels, finding good background info and where to read up about some safety information
Maths help for specific areas my kids are looking at and proposed games for that
Evaluation of coding options for my kids
How to link up some ideas on coding, electronics and using the home automation side as some fun outputs
LED strip info and work, again integrating with smart homes and what’s good around the kids
Framework evaluations for automation at work and home
Crystal identification
Looking up local council info
Relevant music suggestions for kids to play on the piano
Here some things cross over. I’m happy writing code, I typically want easy open source options, I have languages and tech I prefer, I’m moving g things to matter, I have home assistant, my son is excellent at maths given his age but I’m working more on comprehension of problems, and a lot more. All those are things that with a bit of background info change the types of answers I get and make it more useful.
The reply about knowledge about their job and familt made me think.
The only thing I can now think of is using it as a personal therapist. Or asking how to approach their kids. And they're a bit embarrassed about it, because it's still outside the Overton window -especially on HN - which is why they aren't sharing it.
If someone has different usecases, please do prove me wrong! Maybe I just lack imagination.
Such an incredible amount of personal, intimate knowledge to share with a company. Sure, Google can figure out where I live and who I visit because I have an Android phone, but they'll never know the contents of those relationships.
I have a line in the sand with the AI vendors. It's a work relationship. If I wouldn't share it with a colleague I didn't know super well, I'm not telling it to a AI vendor.
I recently asked about baby-led weaning. If my baby were 2 months old, it would have been smart to mention "not yet!" but it knows she's 8 months old and was able to give contextual advice.
I ask gpt a lot of questions about plants and gardening - I’m happy that it remembers where I live and understands the implications. I could remind it in every question, but this is convenient.
I was redoing our agency's website and thinking about new sections. Claude already knows who I am and what do we do, so it was able to offer extremely relevant suggestions based on this without any further prompting.
In my personal experience the memory in Claude works much better than in ChatGPT where it indeed feels forced and leads to "remember the user loves cheese" moments.
I use it for my work. So i went it to remember everything about my business, website, the domain, which country we operate and on and on. It’s a ton of context which I don’t want to repeat each time.
That's what projects are for. All the major chatbot companies have some equivalent and all support a standard instruction where you can include anything you need automatically.
I broke my ankle and have multiple chats related to medicine, physical therapy, pain management, lawyer questions, how to handle messaging to boss and HR
ChatGPT "knows" (has context that includes) some of the things I'm good at, and some of the things I'm not good at. I have my own tolerances for communication and it has context about that, too.
I use the bot for mostly techy things. So, for instance, I'm alright with using tools, and building electronics, and punting around on a Linux box so I don't need my hand held for that. But I'm terrible at writing code, so baby steps and detailed explanation there helps me a lot. I strongly prefer pragmatism and verifiable facts. I despise sycophant speech, the empty positivity of corpo-speak, assumptions, false praise, superfluous verbosity, and apologies and/or the implication of feelings from bots.
Through a combination of some deliberate training (custom instructions, memory), and just using it (shared context), it mostly does what I want in the way that I want it done -- the first time.
I don't have to steer in the right direction with every new session. There was a time when that was necessary, but it is no longer that way. Adjustments happen increasingly automatically these days.
That saves me time and frustration, and enhances the utility of the bot.
Meanwhile: Others have their own skills and preferences that may be very different in comparison to my own. That's OK. We each get to have our own experience.
In my experience, target schools are the only universities now that can make their assignments too hard for AI.
When my university tried that, the assignments were too hard for students. So they gave up.
reply