LLMs aren't the only kind of AI, just one of the two current shiny kinds.
If a "cure for cancer" (cancer is not just one disease so, unfortunately, that's not even as coherent a request as we'd all like it to be) is what you're hoping for, look instead at the stuff like AlphaFold etc.: https://en.wikipedia.org/wiki/AlphaFold
I don't know how to tell where real science ends and PR bluster begins in such models, though I can say that the closest I've heard to a word against it is "sure, but we've got other things besides protein folding to solve", which is a good sign.
(I assume AlphaFold is also a mysterious black box, and that tools such as the one under discussion may help us demystify it too).
Is your argument that because AI can’t currently do the arbitrary things you wish it would do, it is therefore bullshit?
This perspective discounts two important things:
1. All the things it can obviously do very well today
2. Future advancements to the tech (billions are pouring in, but this takes time to manifest in prod)
I’m trying not to be one of the “guys” you’re talking about, but I just can’t comprehend your take. Do you not recognize that there is utility to current models? What makes it all bullshit?
> Is your argument that because AI can’t currently do the arbitrary things you wish it would do, it is therefore bullshit?
It is sold as a research tool, but it cannot be trusted to return facts, because it will happily recombine disconnected pieces of data. AI cannot tell truth from lies, it is good at constructing output that looks like an answer but it does not care about the factual correctness. Google search result summaries are a good example of this problem. When I searched for "what happened to the inventor of Tetris?" it took bios of two Russian-born developers, one Pajitnov and another a murderer and combined them into one presenting Pajitnov as a murderer. I thought it did not sound right and did some additional searching and sure enough it wasn't true, but how many people who were shown that answer have become convinced that he was a murderer? What if his neighbours saw it? What if such made up summaries are fed into a system that decides who can board a plane? When I bring this problem up people tell me it's not an issue and you should always go back and verify facts, but what if the sources I use have the same problem of being made up content? We are not telling people to stop, think, and verify outputs produced by AI, we are telling them AI is making them "more productive" so they use it to produce garbage content without checking the facts. Please explain to me the usefulness of a tool I cannot trust? Producing garbage faster is not something I wake up in the morning wanting to do more of.
> 2. Future advancements to the tech (billions are pouring in, but this takes time to manifest in prod)
Unlike AI VCs can count and would like to see a return in their investments. I don't think there's much to show for it so far.
Your first section is very much a limit of LLMs, but again, that's not all AI — if you want an AI to play chess, and you want to actually win, you use Stockfish or AlphaZero, because if you use an LLM it will perform illegal moves.
Why would I want to use an AI to win a game of chess? Where's the fun and challenge in it? "Go win me a chess tournament" is the wish nobody has unless we are talking about someone who wants to pretend to be a chess master. It's still a small market. Examples like these are very common in the AI community, they are solutions to problems nobody has.
Though such a question misses the point: use the right tool for the job.
(For a non-ML example, I don't know why 5/8 wrenches exist, but I'm confident of two things: (1) that's a very specific size, unlikely to be useful for a different sized problem; (2) using one as a hammer would be sub-optimal).
I'm not interested in creating a taxonomy of special purpose AI models which each do one thing well and nothing else. What I can do is give a handful of famous examples, such as chess.
Other choices at my disposal (but purely off the top of my head and in no way systematic) include the use of OCR to read and hence sort post faster than any human and also more accurately than all but the very best. Or in food processing for quality control (I passed up on a student work placement for that 20 years ago). Or the entirity of Google search, Gmail's spam filters, Maps' route finding and at least some of its knowledge of house numbers, their CAPTCHA system, and Translate (the one system on this list which is fundamentally the same an LLM). Or ANPR.
It's like you're saying "food is bad" because you don't like cabbage — the dislike is a valid preference, of course it is, but it doesn't lead to the conclusion.
There are two mindsets at play here, the cynics vs the optimists. I’m an optimist to a fault by nature, but I also think there is a kind of Pascals bet to be played here.
If you bet sensibly on the current bubble/wave and turn out to be wrong - well you’re in the same place as everyone else with maybe some time and money lost.
In what way? If the current bubble/wave turns out to be right doesn't that mean we're all out of a job? Unless by betting on it you mean buying Nvidia stock?
We might all be out of a job if the AI reached superhuman performance at everything (or even just human at much lower cost), but even without that this can still be a 10x speedup of the rate of change in the industrial revolution.
What "out of a job" means is currently unclear: do we all get UBI, or starve? Just because we could get UBI doesn't mean we will, and the transition has to be very fast (because the software won't be good enough until one day when a software update comes) and very well managed and almost simultaneous worldwide, which probably means it will go wrong.
> 1. All the things it can obviously do very well today
I'm curious what those things are.
At least to me, it isn't obvious that LLMs solve any of their many applications from the past year "very well". I worry about failures (hallucinations, misinterpretation of prompts, regurgitation of incorrect facts, violation of copyright, and more). I don't have a good sense of when they fail, how often this happens, or how to identify these failures in scenarios where I'm not a domain expert.
But maybe some subset of these are solved problems, or at least problems that are actively being worked on for the next generation of models.
Yes, there are many other kinds of AI. Stockfish is better at chess than any human. But when you start talking about emergent behavior from machine learning, the failure modes are much harder to reason about.
> At least to me, it isn't obvious that LLMs solve any of their many applications from the past year "very well". I worry about failures (hallucinations, misinterpretation of prompts, regurgitation of incorrect facts, violation of copyright, and more)
That’s a list of things that gets clicks in the popular press.
Some solutions I love:
Recording a video and creating a transcript from it. Then edit the transcript and the video gets edited.
Translating (dubbing) video and changing the lips of the speaker to match where they should be for the new audio.
Scanning invoices for mistakes (I know entire businesses build just in this one thing).
Understanding edge cases. So many things have some nice hard and fast rules that can fail, and an ML system of some sort can figure it out so the system can move on. I do this for processing SEC data, and soon for web scraping (like they change the html but visually it’s kinda the same, and the ai system can figure it, give me a new html selector, and back in business).
> That’s a list of things that gets clicks in the popular press.
Are you saying they're nonissues in practice? I agree that most of those points have shown up in the news, but they're also things that I have personally observed when interacting with LLMs.
Of your (and sibling commenters') cited use cases, I see a number of scenarios where AI is used to perform a quick first pass, and a human then refines that output (transcript generation, scanning invoices, iterating on transformations for syntax trees, etc). That's great that it works for you. My worry here is that you might heuristically observe that it worked perfectly 20 times in a row, then decide to remove that human check even as it admits more errors than is acceptable for your use case.
> Scanning invoices for mistakes
This is one of those cases where I would like to better understand the false negatives. If a human reviews the output, then okay, false positives are easy enough to override. But how bad is a false negative? Is it just unnecessary expenses to the company, or does it expose them to liability?
> Translating (dubbing) video and changing the lips of the speaker to match where they should be for the new audio.
This is useful in itself, but surely you too can see the potential for abuse? (This is literally putting words in someone else's mouth.)
> This is useful in itself, but surely you too can see the potential for abuse? (This is literally putting words in someone else's mouth.)
If I was a famous actor, I would demand it. I don’t want people hearing different voices for different movies. I want them to hear my voice. And I’d want it to be authentic. Seeing lips move to the wrong words does not help make a connection.
As for abuse, sure. Not sure anything has had worse abuse than database technology. There should definitely be an avenue for the government to shutdown any database instance anywhere (like California is doing with AI). I would have shutdown data broker databases long ago.
> This is one of those cases where I would like to better understand the false negatives. If a human reviews the output, then okay, false positives are easy enough to override. But how bad is a false negative? Is it just unnecessary expenses to the company, or does it expose them to liability?
In the companies I know about, these invoices requesting overpayment just got paid. So, worst case, it’s the same. But best case there is way way way more money to save than the cost of the service.
I find them pretty good at reasoning about tree structures that have a depth which I myself find difficult to navigate. For instance, I've been working with libcst (a syntax tree library) and I can say:
1. observe this refactor rule which I like
2. here's the starting code
3. here's the desired code
4. write me a refactor rule, in the style of 1, which transforms 2 into 3
It sometimes takes a few iterations where I show it a diff which highlights how its rule fails to construct 3 given 2, but it usually gets me the transformation I need much faster than I'd have done so by hand.
And once I'm done I have a rule which I can apply without the LLM in the loop and which is much more robust than something like a patch file (which often fail to apply for irrelevant reasons like whitespace or comments that have changed since I wrote the rule).
The key is to find cases, like this one, where you can sort of encircle the problem with context from multiple sides, one of which works as a pass/fail indicator. Hallucinations happen, but you use that indicator to ensure that they get retried with the failure text as correcting context.
It helps to design your code so that those context pieces stay small and reasonably self-describing without taking a foray deep into the dependencies, but then that's just a good idea anyway.
> But when you start talking about emergent behavior from machine learning, the failure modes are much harder to reason about.
Sure, this is basically why so many are concerned AI might kill us all:
Lots of observed emergent phenomena that are not expected (basically everything ChatGPT can do given it was trained on next token prediction), as a result of doing exactly what we said instead of what we meant (all computer bugs ever), doing it so hard that something breaks (Goodhart's law), doing it so fast that humans can't respond (stock market flash-crashes, many robotic control systems), and being so capable in smaller scale tests that they tempt people to let go of the metaphorical steering wheel (the lawyers citing ChatGPT, but also previously a T-shirt company that dictionary merged verbs into "keep calm and …" without checking, and even more previously either Amazon or eBay dictionary merging nouns into "${x}: buy it cheap on {whichever site it was}" with nouns including "plutonium" and "slaves").
If people had a good model for the AI, it wouldn't be a problem, we'd simply use them only for what they are good at and nothing else.