Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> how it normally takes him 4 to 8 hours to put together complicated, data-heavy reports. Now he fires off an agent request, goes to walk his dog, and comes back to a downloadable spreadsheet of dense data, which he pulls up and says "I think it got 98% of the information correct...

This is where the AI hype bites people.

A great use of AI in this situation would be to automate the collection and checking of data. Search all of the data sources and aggregate links to them in an easy place. Use AI to search the data sources again and compare against the spreadsheet, flagging any numbers that appear to disagree.

Yet the AI hype train takes this all the way to the extreme conclusion of having AI do all the work for them. The quip about 98% correct should be a red flag for anyone familiar with spreadsheets, because it’s rarely simple to identify which 2% is actually correct or incorrect without reviewing everything.

This same problem extends to code. People who use AI as a force multiplier to do the thing for them and review each step as they go, while also disengaging and working manually when it’s more appropriate have much better results. The people who YOLO it with prompting cycles until the code passes tests and then submit a PR are causing problems almost as fast as they’re developing new features in non-trivial codebases.



From John Dewey's Human Nature and Conduct:

“The fallacy in these versions of the same idea is perhaps the most pervasive of all fallacies in philosophy. So common is it that one questions whether it might not be called the philosophical fallacy. It consists in the supposition that whatever is found true under certain conditions may forthwith be asserted universally or without limits and conditions. Because a thirsty man gets satisfaction in drinking water, bliss consists in being drowned. Because the success of any particular struggle is measured by reaching a point of frictionless action, therefore there is such a thing as an all-inclusive end of effortless smooth activity endlessly maintained.

It is forgotten that success is success of a specific effort, and satisfaction the fulfillment of a specific demand, so that success and satisfaction become meaningless when severed from the wants and struggles whose consummations they arc, or when taken universally.”


The proper use of these systems is to treat them like an intern or new grad hire. You can give them the work that none of the mid-tier or senior people want to do, thereby speeding up the team. But you will have to review their work thoroughly because there is a good chance they have no idea what they are actually doing. If you give them mission-critical work that demands accuracy or just let them have free rein without keeping an eye on them, there is a good chance you are going to regret it.


What a awful way to think about internship.

The goal is to help people grow, so they can achieve things they would not have been able to deal with before gaining that additional experience. This might include boring dirty work, yes. But that means they thus prove they can overcome such a struggle, and so more experienced people should be expected to also be able to go though it - if there is no obvious more pleasant way to go.

What you say of interns regarding checks is just as true for any human out there, and the more power they are given, the more relevant it is to be vigilent, no matter their level of experience. Not only humans will make errors, but power games generally are very permeable to corruptible souls.


I agree that it sounds harsh. But I worked for a company that hired interns and this was the way that managers talked about them- as cheap, unreliable labor. I once spoke with an intern hoping that they could help with a real task: using TensorFlow (it was a long time ago) to help analyze our work process history, but the company ended up putting them on menial IT tasks and they checked out mentally.


>The goal is to help people grow, so they can achieve things they would not have been able to deal with before gaining that additional experience.

You and others seem to be disagreeing with something I never said. This is 100% compatible with what I said. You don't just review and then silently correct an interns work behind their back, the review process is part of the teaching. That doesn't really work with AI, so it wasn't explicitly part of my analogy.


What an awful way to think about other people, always assuming the very worst version of what they said.


Certainly such a message demonstrates that a significant amount of efforts have been put to not fall in the kind of behavior it warrants against. ;)


The goal of internships in a for profit company is not the personal growth of the intern. This is a nice sentiment but the function of the company is to make money, so an intern with net negative productivity doesn't make sense when goals are quarterly financials.


Sure, companies wouldn't do anything that negatively affects their bottom line, but consider the case that an intern is a net zero - they do some free labor equal to the drag they cause demanding attention of their mentor. Why have an intern in that case? Because long term, expanding the talent pool suppresses wages. Increasing the number of qualified candidates gives power to the employer. The "Learn to Code" campaign along with the litany of code bootcamps is a great example, it poses as personal growth / job training to increase the earning power of individuals, but on the other side of that is an industry that doesn't want to pay its workers 6 figures, so they want to make coding a blue collar job.

But coding didn't become a low wage job, now we're spending GPU credits to make pull requests instead and skipping the labor all together. Anyway I share the parent poster's chagrin at all the comparisons of AI to an intern. If all of your attention is spent correcting the work of a GPU, the next generation of workers will never have mentors giving them attention, starving off the supply of experienced entry level employees. So what happens in 10, 20 years ? I guess anyone who actually knows how to debug computers instead of handing the problem off to an LLM will command extraordinary emergency-fix-it wages.


I’ve never experienced an intern who was remotely as mediocre and incapable of growth as an LLM.


I had an intern who didn’t shower. We had to have discussions about body odor in an office. AI/LLM’s are an improvement in that regard. They also do better work than that kid did. At least he had rich parents.


I had a coworker who only showered once every few days after exercise, and never used soap or shampoo. He had no body odor, which could not be said about all employees, including management.

It’s that John Dewey quote from a parent post all over again.


Was he Asian? Seems like somehow asians win the genetic lottery in the stink generation department.


Wait, you had Asmongold work for you? Tell us more! xD


I have always been told expect an intern to be a net loss in productivity to you and anything else is a bonus since the point is to help them learn.


What about a coach's ability for improving instruction?


The point of coaching a Junior is so they improve their skills for next time

What would be the point of coaching an LLM? You will just have to coach it again and again


coaching a junior doesn’t just improve the junior. It also tends to improve the senior.


Coaching an LLM seems unlikely to improve you meaningfully


What about it?


Isn't the point of an intern or new grad that you are training them to be useful in the future, acknowledging that for now they are a net drain on resources.


An overly eager intern with short term memory loss, sure.


And working with interns requires more work for final output compared do-it-yourself


For this example - Let’s replace the word “intern” with “initial-stage-experts” or something.

There’s a reason people invest their time with interns.


Yeah, most of us are mortal, that’s the reason.


But LLMs will not move to another company after you train them. OTOH, interns can replace mid level engineers as they learn the ropes in case their boss departs.


Yeah, people complaining about accuracy of AI-generated code should be examining their code review procedures. It shouldn’t matter if the code was generated by a senior employee, an intern, or an LLM wielded by either of them. If your review process isn’t catching mistakes, then the review process needs to be fixed.

This is especially true in open source where contributions aren’t limited to employees who passed a hiring screen.


This is taking what I said further than intended. I'm not saying the standard review process should catch the AI generated mistakes. I'm saying this work is at the level of someone who can and will make plenty of stupid mistakes. It therefore needs to be thoroughly reviewed by the person using before it is even up to the standard of a typical employee's work that the normal review process generally assumes.


Yep, in the case of open source contributions as an example, the bottleneck isn't contributors producing and proposing patches, it's a maintainer deciding if the proposal has merit, whipping (or asking contributors to whip) patches into shape, making sure it integrates, etc. If contributors use generative AI to increase the load on the bottleneck it is likely to cause a negative net effect.


This very much. Most of the time, it's not a code issue, it's a communication issue. Patches are generally small, it's the whole communication around it until both parties have a common understanding that takes so much time. If the contributor comes with no understanding of his patch, that breaks the whole premise of the conversation.


I can still complain about the added workload of inaccurate code.


If 10 times more code is being created, you need 10 times as many code reviewers..


Plus the overhead of coordinating the reviewers as well!


"Corporate says the review process needs to be relaxed because its preventing our AI agents from checking in their code"


”The people who YOLO it with prompting cycles until the code passes tests and then submit a PR are causing problems almost as fast as they’re developing new features in non-trivial codebases.”

This might as well be the new definition of “script kiddie”, and it’s the kids that are literally going to be the ones birthed into this lifestyle. The “craft” of programming may not be carried by these coming generations and possibly will need to be rediscovered at some point in the future. The Lost Art of Programming is a book that’s going to need to be written soon.


Too bad no one will be able to read it. Better make it a video essay.


with subtitles flashing in the center of the screen


So is here to stay. If you’re unable to write good code with it. Doesn’t mean everyone is writing bad code with it.


Oh come on, people have been writing code with bad, incomplete, flaky, or absent tests since automated testing was invented (possibly before).

It's having a good, useful and reliable test suite that separates the sheep from the goats.*

Would you rather play whack-a-mole with regressions and Heisenbugs, or ship features?

* (Or you use some absurdly good programing language that is hard to get into knots with. I've been liking Elixir. Gleam looks even better...)


It sounds like you’re saying that good tests are enough to ensure good code even when programmers are unskilled and just rewrite until they pass the tests. I’m very skeptical.


It may not be a provable take, but it’s also not absurd. This is the concept behind modern TDD (as seen in frameworks like cucumber):

Someone with product knowledge writes the tests in a DSL

Someone skilled writes the verbs to make the DSL function correctly

And from there, any amount of skill is irrelevant: either the tests pass, or they fail. One could hook up a markov chain to a javascript sourcebook and eventually get working code out.


> One could hook up a markov chain to a javascript sourcebook and eventually get working code out.

Can they? Either the dsl is so detailed and specific as to be just code with extra steps or there is a lot of ground not covered by the test cases with landmines that a million monkeys with typewriters could unwittingly step on.

The bugs that exist while the tests pass are often the most brutal - first to find and understand and secondly when they occasionally reveal that a fundamental assumption was wrong.


Tests are just for the bugs you already know about


They're also there to prevent future bugs.


“The quip about 98% correct should be a red flag for anyone familiar with spreadsheets”

I disagree. Receiving a spreadsheet from a junior means I need to check it. If this gives me infinite additional juniors I’m good.

It’s this popular pattern of HN comments - expect AI to behave deterministically correct - while the whole world operates on stochastically correct all the time…


In my experience the value of junior contributors is that they will one day become senior contributors. Their work as juniors tends to require so much oversight and coaching from seniors that they are a net negative on forward progress in the short term, but the payoff is huge in the long term.


I don't see how this can be true when no one stays at a single job long enough for this to play out. You would simply be training junior employees to become senior employees for someone else.


So this has been a problem in the tech market for a while now. Nobody wants to hire juniors for tech because even at FAANGs the average career trajectory is what, 2-3 years? There's no incentive for companies to spend the time, money, and productivity hit to train juniors properly. When the current cohort ages out, a serious problem is going to occur, and it won't be pretty.


It seems there's a distinct lack of enthusiasm for hiring people who've exceeded that 2-3 year tenure at any given place, too. Maintaining a codebase through its lifecycle seems often to be seen as a sign of complacency.


Exactly this

And it should go without saying that LLMs do not have the same investment/value tradeoff. Whether or not they contribute like a senior or junior seems entirely up to luck

Prompt skill is flaky and unreliable to ensure good output from LLMs


When my life was spreadsheets, we were expected to get to the point of being 99.99% right.

You went from “do it again” to “go check the newbies work”.

To get to that stage your degree of proficiency would be “can make out which font is wrong at a glance.”

You wouldn’t be looking at the sheet, you would be running the model in your head.

That stopped being a stochastic function, with the error rate dropping significantly - to the point that making a mistake had consequences tacked on to it.


98% sure each commit doesn’t corrupt the database, regress a customer feature, open a security vulnerability. 50 commits later … (which is like, one day for an agentic workflow)


It’s only a 64% chance of corruption after 50 such commits at a 98% success.


I would be embarrassed to be at OpenAI releasing this and pretending the last 9 months haven't happened... waxing poetically about "age of agents" - absolutely cringe and pathetic


Or as I would like to put it, LLM outputs are essentially the Library of Babel. Yes, it contains all of the correct answers, but might as well be entirely useless.


> A great use of AI in this situation would be to automate the collection and checking of data. Search all of the data sources and aggregate links to them in an easy place. Use AI to search the data sources again and compare against the spreadsheet, flagging any numbers that appear to disagree.

Why would you need ai for that though? Pull your sources. Run a diff. Straight to the known truth without the chatgpt subscription. In fact by that point you don’t even need the diff if you pulled from the sources. Just drop into the spreadsheet at that point.


In reality most people will just scan for something that is obviously wrong, check that, and call the rest "good enough". Government data is probably going to get updated later anyhow. It's just a target for a company to aim for. For many companies the cost savings is much more than having a slightly larger margin of error on some projections. For other companies they will just have to accept the several hours of saved time rather than the full day.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: