Cool. That sure sounds nice and simple. What do you do when the multiple LLMs disagree on what the correct tests are? Do you sit down and compare 5 different diffs to see which have the tests you actually want? That sure sounds like a task you would need an actual programmer for.
At some point a human has to actually use their brain to decide what the actual goals of a given task are. That person needs to be a domain expert to draw the lines correctly. There's no shortcut around that, and throwing more stochastic parrots at it doesn't help.
Just because you can't (yet) remove the human entirely from the loop, doesn't mean that economising on the use of the humans time is impossible.
For comparison have a look at compilers: nowadays approximately no one writes their software by hand, we write a 'prompt' in something like Rust or C, and ask another computer program to create the actual software.
We still need the human in the loop here, but it takes much less human time than creating the ELF directly.
It’s not “economizing” if I have to verify every test myself. To actually validate that tests are good I need to understand the system under test, and at that point I might as well just write the damn thing myself.
This is the fundamental problem with this “AI” mirage. If I have to be an expert to validate that the LLM actually did the task I set out, and isn’t just cheating on tests, then I might as well code the solution myself.
I can see how LLMs can help with testing, but one should never compare LLMs with deterministic tools like compilers. LLMs are entirely a separate category.
> otherwise they would have paid for the service they were "depending fundamentally" on.
It's a "*.ai" company. Deductive probably spent more human time on their fancy animated landing page than engineering their actual system. If they vibe coded most of their product, I wouldn't be surprised if they didn't even know they were using Datadog until they got the email.
Doesn’t “doxxing” refer to publicly linking a private/anonymous identity to a public one? It’s disingenuous to imply the congresswoman had a problem with “linking to a public web page.” She had a problem with linking to a public web page in the context of unmasking the anonymous identity of a US military member. The linkage, not the linking, is the doxxing.
I don’t agree with Luna here, and my exposure to this story is limited to what’s in the article. But I do agree with the GP comment regarding the lack of impartiality in this article.
It’s the military’s responsibility to protect the identity of their operators, by ensuring they don’t publish information that could lead to doxxing. If they miss something, that’s on them. And prosecuting a private citizen is a deflection of responsibility that ignores the risk of actual motivated attackers (e.g. Maduro loyalists) uncovering the same information, without publishing it to twitter, but using it to threaten or harm the doxxed victim.
That's not sufficient. If a user copies customer data into a public google sheet, I can reprimand and otherwise restrict the user. An LLM cannot be held accountable, and cannot learn from mistakes.
It's structurally impossible. LLMs, at their core, take trusted system input (the prompt) and multiply it against untrusted input from the users and the internet at large. There is no separation between the two, and there cannot be with the way LLMs work. They will always be vulnerable to prompt injection and manipulation.
The _only_ way to create a reasonably secure system that incorporates an LLM is to treat the LLM output as completely untrustworthy in all situations. All interactions must be validated against a security layer and any calls out of the system must be seen as potential data leaks - including web searches, GET requests, emails, anything.
You can still do useful things under that restriction but a lot of LLM tooling doesn't seem to grasp the fundamental security issues at play.
So many people here want to bury their heads in the sand, like that will protect them. It's disgraceful to see people calling themselves "hackers" while they support the feds with no reservations or critical thought.
A good number of people on this site don’t mind the boot as long as they are wearing it. It’s not particularly rare for groups of software engineers to have American style libertarians amongst them
I really want an HN overlay where we can act in public. The rank cowardice here is so low, these dogs hiding and cloaking the truth again and again, putting the veil over the truth. Being able to post publicly what we are up to, being able to see more directly, not have these obfuscating hiding folks hiding in their anonymity on the layer feels necessary. What a sad pathetic foe humanity faces. It's so sad than HN allows a couple people to destroy our ability to understand and see the world.
The stock seems completely disconnected from the antics of Musk. I would think that having a CEO who is clearly a heavy ketamine user and spends more time playing politician than actually running the company would have a negative impact on the stock, but tesla's stock has been divorced from reality for a long time.
The nature of our current political crisis is changing by the minute, and with every fascist act this administration emboldens the next wave of left wing opposition.
The moderate position for future liberal candidates is now the full dissolution of ICE.
The more radical position - which is rapidly gaining support - is the arrest and prosecution of everyone involved in this administration. Starting with the president but including his cabinet and the oligarchs who spent the last year fomenting corruption and enriching themselves.
I'm not sure that isn't the moderate position. Every one of them are certified criminals. They all need to stand trial. I pledge to do everything in my power to ensure that they do. So say we all.
See also the mind-boggling sign-on bonuses they get. They know ICE being the 8th most funded army in the world can't last forever. The left wasn't allowed to do a wealth transfer from rich to poor so instead the right did a massive wealth transfer from all of us to the most racist of us.
But is what you have now "right" wing? Or just oligarchy of no real wing? I'm not sure that ideology has a real role in what they actually do (looking beyond words), unless "I do what I please" is an ideology...
I've actually done a fair bit of ML work in Elixir, in practice I found:
1) It's generally harder to interface with existing libraries and models (example: whisperX [0] is a library that combines generic whisper speech recognition models with some additional tools like discrete-time-warping to create a transcription with more accurate time stamp alignment - something that was very helpful when generating subtitles. But because most of this logic just lives in the python library, using this in Elixir requires writing a lot more tooling around the existing bumblebee whisper implementation [1]).
but,
2) It's way easier to ship models I built and trained entirely with Elixir's ML ecosystem - EXLA, NX, Bumblebee. I trained a few models doing basic visual recognition tasks (detecting scene transitions, credits, title cards, etc), using the existing CLIP model as a visual frontend and then training a small classifier on the output of CLIP. It was pretty straightforward to do with Elixir, and I love that I can run the same exact code on my laptop and server without dealing with lots of dependencies and environment issues.
Livebook is also incredibly nice, my typical workflow has become prototyping things in Livebook with some custom visualization tools that I made and then just connecting to a livebook instance running on EC2 to do the actual training run. From there shipping and using the model is seamless, and I just publish the wrapping module as a library on our corporate github, which lets anyone else import it straight into livebook and use it.
> The whole idea of scripts is that there's nothing to install
and yet without fail, when I try to run basically any `little-python-script.py`, it needs 14 other packages that aren't installed by default and I either need to install some debian packages or set up a virtual environment.
Its up to the programmer whether they use external packages or not. I dont ser the problem with setting up a venv, but if this is packaged correctly you could do uv run.
You're hand waving away the core problem - if I'm running someone else's script, it's not up to me if they used external packages or not, and in python using someone else's script that relies on external packages is a pain in the ass because it either leaves setting up a venv and dealing with python's shortcomings up to me, or I have to juggle system packages.
I'm sure uv can handle this _if_ a given script is packaged "correctly", but most random python scripts aren't - that's the issue we're talking about in this thread.
The whole point of a scripting language IMO is that scripts should _just run_ without me doing a bunch of other crap to get the system into some blessed state. If I need to learn a bunch of python specific things, like "just add _pycache_ to all of your .gitignore files in every directory you might run this script", then it isn't a useful scripting language to me.
What tests? You can't trust the tests that the LLM writes, and if you can write detailed tests yourself you might as well write the damn software.
reply