I came into the comments to see how the discourse is shaping up in light of the author’s claims, and I’m seeing the same old boosterism trying to dismiss it wholesale without providing conclusive evidence.
So without further ado:
* If LLMs can indeed produce wholly novel research independently without any external sources, then prove it. Cite sources, unlike the chatbot that told you it can do that thing. Show us actual results from said research or products that were made from it. We keep hearing these things exponentially increase the speed of research and development but nobody seemingly has said proof of this that’s uniquely specific to LLMs and didn’t rely on older, proven ML techniques or concepts.
* If generative AI really can output Disney quality at a fraction of the cost, prove it with clips. Show me AI output that can animate on 2s, 4s, and 1s in a single video and knows when to use any of the above for specific effects. Show me output that’s as immaculate as old Disney animation, or heck, even modern ToonBoom-like animation. Show me the tweens.
* Prove your arguments. Stop regurgitating hypeslop from CEBros, actually cite sources, share examples, demonstrate its value relative to humanity.
All people like us (myself and the author) have been politely asking for since this hype bubble inflated was for boosters to show actual evidence of their claims. Instead, we just get carefully curated sizzle reels and dense research papers making claims instead of actual, tangible evidence that we can then attempt to recreate for ourselves to validate the claims in question.
Stop insulting us and show some f*king proof, or go back to playing with LLMs until you can make them do the things you claim they can do.
> If LLMs can indeed produce wholly novel research independently without any external sources, then prove it.
I was actually thinking about it, and there could be a simple test - Remove all knowledge of X from knowledge corpus and train LLM on such corpus. Under X one can imagine anything - differential calculus, logarithms, Reimann Hypothesis, Special Theory of relativity, Fermat's theorems, ... And now ask AI questions which actually has lead to discovery of X.
If AI is able to rediscover X while not knowing about X, we can say it is proof of intelligence.
> Stop insulting us and show some f*king proof, or go back to playing with LLMs until you can make them do the things you claim they can do.
Everything revolving around these LLMs so far as been tech hype culture and similar "think of the future" vibes. IMO we never see proof of this because right now it simply doesn't exist.
>Everything revolving around these LLMs so far as been tech hype culture...
compare with a headline from today:
>OpenAI’s ChatGPT to hit 700 million weekly users, up 4x from last year
I don't think there was much that hype when ChatGPT launched. Just an awful lot of people using it because it's kind of cool.
The critics seem to do a certain amount of goal post moving, like saying it's doing well by having unprecedented user growth is met by - f*king prove LLMs can indeed produce wholly novel research. Is anyone actually claiming they produce wholly novel research?
There is a similar effect with the featured blog post where the guy makes some perfectly reasonable arguments why he doesn't really like LLMs and doesn't want to work on them and then instead of titling it "Why I hate LLMs" goes with "Why I hate AI." But there's some quite cool stuff in non LLM AI like AlphaFold and trying to cure diseases. If you are talking about LLMs why not put LLM in the title?
> Is anyone actually claiming they produce wholly novel research?
I would argue companies performing layoffs because they think LLMs can do the work of a human is practically the same thing. There was even another article posted here today saying a bunch of companies are hiring humans again to fix the terrible work LLMs are doing, so it's fair to assume C-level jerks do think LLMs can produce something to the level of novel research and are finding out LLMs can't.
It was quite convincing, and I could see lower-budget studios trying to make it work. (There is a truckload of garbage tier animation on all platforms.)
The person who submitted it is an experienced producer who used something like 600 prompts to generate the end result, so it's not exactly few-shot prompting from novices with no film experience. But it happened
Then again, the astroturfing done (presumably) by big LLM is off the charts, so who knows if this was actually what happened.
Completely nailed it. The comparison to Adderall and other drugs was exactly how I've been feeling about it all--i watch my boss excitedly post another LLM documentation PR, raving about how he never used to have time for these things, yet the actual output is unclear, ambiguous, and almost entirely useless. But it made him feel good to "get it done," and that alone has largely blinded him to how bad the output actually is, and how long it took him to get.
"If a machine is consuming and transforming incalculable amounts of training data produced by humans, discussed by humans, and explained by humans. Why would the output not look identical to human reasoning? If I were to photocopy this article, nobody would argue that my photocopier wrote it and therefore can think. But add enough convolutedness to the process, and it looks a lot like maybe it did and can."
But its not copying it. That is the entire point. Its using the training data to adjust floating point numbers. If you train on a single data piece over and over again, then yes it can replicate it, just like you can memorize lines of a school play, but its still not copied/compressed in the traditional, deterministic sense.
You can't argue "we don't know how they work, or our own brains work with any certainty" and then over-trivialize what they do on the next argument.
People suffer brain damage and come out the other side with radically different personalities. What happened to "qualia", or "sense of self", where is their "soul". Its just a mechanistic emergent property of their biological neural network.
Who is to say our brains aren't just very high parameterized biological floating point machines? That is the true Occam's Razor here, as uncomfortable as that might make people.
> Who is to say our brains aren't just very high parameterized biological floating point machines? That is the true Occam's Razor here, as uncomfortable as that might make people.
I believe it's quite possible that what is happening during training is in certain ways similar to what is happening to a child learning the world, although there are many practical differences (and I don't even mean the difference between human neurons and the ones in a neural network).
Is there anything to feel uncomfortable about? It's been a long time since people started discussing the concept of "a self doesn't exist, we're just X" where X was the newest concept popular during that time. I'm 100% sure LLMs are not the last one.
(BTW as for LLMs themselves, there are still two big engineering problems to solve: quite small context windows and hallucinations. The first requires a lot of money to solve, the second needs special approaches and a lot of trial and error to solve, and even then the last 1% might be almost impossible to get working reliably.)
I am not convinced hallucinations is a solvable problem in a single self contained model. In one hand we approach these things with the idea we are building a real intelligence modeled after our assumptions of our own brain. In the other we want them to not have any of the failings of the things they are being modeled after.
Humans mis-remember and make up things all the time, completely unintentionally. It could be a fundamental flaw in large neural networks. Impressive data compression and ability to generalize, but impossible to make "perfect".
If AI becomes cheap and fast enough, its likely a simple council of models will be enough to alleviate 99% of the problem here.
Yes, I agree the council approach is the most reasonable option. But while you're right about mistakes being inherently human, there is a huge difference in both quantity and quality. Humans often err in details, LLMs can make false claims in a very confident way.
> While Apple is still a major player in the AI space, they typically lean towards a preference for traditional AI. They’re almost certainly still doing LLM research in the background, but they’ve decided against dropping everything to go full throttle on the hype train. Ironically, this has started to make their investors, who have bought into the hype, quite upset. People are starting to question “why isn’t apply doing things with AI”.
It may very well be the case that Apple too finds themselves pressured into going all out on LLM
Besides my stance that LLMs can serve specific tasks very well and are likely going to take a place similar to spreadsheets and databases over the coming years, hasn’t Apple already? Rarely has Apple tried to appear so unified on one goal across their product stack as they did with Apple Intelligence, the vast majority of which is heavily LLM focused.
The Author appears to fully skip over their attempt and subsequent failure, which made the entire point the piece is trying to further rather unsubstantiated and made me check whether this wasn’t posted in 2022, even more for someone like myself who also is very confident that there is a large chasm between LLMs and whatever AGI may end up being.
Apple Intelligence message summary, text editing, etc. have been successfully deployed and, using them right now, have problems very much connected to the technology. The internal squabbles are independent of what they released and the state the technology is in.
This post is perfect. It summarizes my position on today's variant of 'AI' exactly. I'm sending this article to everyone that asks me how I feel about it. Thanks for writing it.
One mistake I think many make is to look at technology X and ascribe outcomes primarily to technology X.
AI is one in a long long long line of new technologies. It is generating a lot of investment, new corporate processes and directives, declarations like "new era" and "civilizational milestone," etc.
If someone thinks any of the above are wrong or misguided, it's a mistake to "blame" or look to AI as the primary cause.
The primary cause is our system: humans are actors in the US economic system and when a new technology is rolling out, usually the response is the same and differs only in magnitude.
"Yet, every time I tried to get LLMs to perform novel research, they fail because they don’t have access to existing literature on the topic. Whereas, humans, on the other hand, discovered everything humanity knows."
Just because the author was unable to wrangle LLM to do novel research doesn't mean that it's impossible. We already have examples of LLMs either doing or aiding significantly with novel research.
This comment would be more useful if you actually provided those examples.
I'm also a researcher and agree wholeheartedly with the article. LLMs can maybe help you sift through existing literature or help with creative writing, at most they can be used or background research in hypothesis generation by finding pairs of related terms in the literature which can be put together into a network of relationships. They can help with a few tasks suitable for an undergrad research assistant.
> we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility.
It's a bit better than just finding related pairs. And that's with sonnet 3.5 which is basically ancient at this point.
This paper centers "novelty" but also finds that human ideas are more feasible, and that LLM-generated ideas are not diverse and that LLMs cannot reliably evaluate ideas. None of the ideas were actually evaluated by performing experiments either.
Pretty much what I would expect. The paper also seems to be doing exactly what I described, I don't understand how the technique is better than that?
No, helping with chores is not the same as "performing research", it has limited utility for minor tasks, it is not essential and it does not even necessarily have a positive productivity impact.
To illustrate the point, when I use vim to write LaTeX, would you say that vim is "performing research"?
That part we also agree on, generating hypotheses is done on the basis of existing literature, lit review (& related reasoning) is done on the basis of existing literature. The blog author is talking about reasoning in the context of a new research topic.
Well hold on now, are the LLMs doing novel research or not? "Aiding significantly" (as in, humans doing novel research are using LLMs to aid their process) is not remotely the same -- can you show us examples of LLMs doing novel research?
Researchers using GPT to summarize papers may be helping humans create novel research, but it certainly isn't GPT doing any such thing itself.
It's a really weird claim too, because we don't magically communicate the research between minds either. You need a person to find, read, process the new research and the same applies to the LLM - either rag it or provide the whole relevant thing in context.
I don't follow. SOTA models have access to tons of existing research. Basically orders of magnitude more than a human could read during a lifetime. And yet, they fail to produce anything new. Hey, even a simple "generate me an innovative list of ten marketing ideas for X, things that nobody has ever done before" gives ridiculous results.
I see the claims being levied against LLMs, but in the generative media world these models are nothing short of revolutionary.
In addition to being an engineer, I'm also a filmmaker. This tech has so many orders of magnitude changes to the production cycle:
- Films can be made 5,000x cheaper (a $100M Disney film will be matched by small studios on budgets of $20,000.)
- Films can be made 5x faster (end-to-end, not accounting for human labor hour savings. A 15 month production could feasibly be done in 3 months.)
- Films can be made with 100x fewer people. (Studios of the future will be 1-20 people.)
Disney and Netflix are going to be facing a ton of disruptive pressure. It'll be interesting to see how they navigate.
Advertising and marketing? We've already seen ads on TV that were made over a weekend [1] for a few thousand dollars. I've talked to customers that are bidding $30k for pharmaceutical ad spots they used to bid $300k for. And the cost reductions are just beginning.
While in theory you're right - AI tools can "democratize" filmmaking - what will make them unique? There was another anecdote, someone saying "this AI can generate 30.000 screenplays in a day!". Which is cool but... who is going to read them?
In theory (idk it probably exists already) you can generate a script and feed it into an AI that generates a film. Novelty aside, who is going to watch it? And what if you generate a hundred films a day? A thousand?
This probably isn't a hypothetical scenario, as low-effort / generated content is already a thing, both writing, video and music. It's an enormous long tail on e.g. youtube, amazon, etc, relying on people passively consuming content without paying too much attention to it. The background muzak of everything.
As someone smarter than me summarized, AI generated stuff is content, not art. AI generated films will be content, not art. There may be something compelling in there, but ultimately, it'll flood the market, become ubiquitous, and disappear into the background as AI generated background noise that only few people will seek out or watch intentionally.
Do you know how many dreamers and artists and great ideas wither away on the vine?
How many?
Movies are going to be like books today.
People are already creating movies with $1k cameras that look good enough to distribute. A lot of them are horror movies because of the budget and most of them are terrible with huge glaring mistakes in editing, pacing, framing, etc. but they can still make money.
And you don't have to obey some large capital distributor and mind their oversight and meddling.
How many movies have you worked on?
Most experienced executives can help guide priorities and make sure there aren't any big overlooked problems.
> People are already creating movies with $1k cameras
Nobody wants to make an iPhone movie. They want $200k glass optics and a VFX budget like Nolan's or Villeneuve's.
I'm tired of people from outside the profession telling us we should be happy with quaint little things. That's not how the ideal world works. In the ideal world, everyone can tell the exact story in their minds and not be beset by budget.
My imagination is my ideal world. I won't listen to naysayers, because you're so far behind my dreams.
If this wasn't about to be practical, I'd relent. But the technology works. It's tangible, usable, and practical.
I've been saying that on HN since Deep Dream. My friends in IATSE also called this crazy. It's not. It's not just coming, it's here.
> A lot of them are horror movies because of the budget
Tell me about it. Been there, done that. It sucks to be so creatively stifled when what I wanted to make as a youth were fantasy movies like Lord of the Rings.
I got creative. I did mocap and VFX. It still wasn't what I dreamed of.
> How many?
Film school attendance is over 100,000 students annually. Most of them were never able to land a high autonomy role or be able to follow though on their dreams. I know hundreds of people with this story.
> A lot of them are horror movies because of the budget and most of them are terrible with huge glaring mistakes in editing, pacing, framing, etc
Sound. Sound is the biggest thing people fuck up. But the rest are real failure cases too.
> Most experienced executives can help guide priorities and make sure there aren't any big overlooked problems.
They're not as important as you think. They're just there to mind the investment.
When the cost of production drops to $30k, this entire model falls apart.
Studios were only needed for two things: (1) distribution, (2) capital. The first one is already solved.
> customers that are bidding $30k for pharmaceutical ad spots they used to bid $300k for
How does this work? If the quality ads are easier to produce, wouldn't there be more competition for the same spot with more leftover money for bidding? Why would this situation reduce the cost of a spot?
Can you point to a film that has used this technology? Can you point to anything that substantiates those numbers? Genuinely asking, my mind is open to the possibility but I don't want to take it on faith. (I watched that Kalshi ad to be clear.)
Not those numbers, but you might be interested in Netflix using generative AI:
> Using AI-powered tools, they were able to achieve an amazing result with remarkable speed and, in fact, that VFX sequence was completed 10 times faster than it could have been completed with traditional VFX tools and workflows
> The cost of [the special effects without AI] just wouldn’t have been feasible for a show in that budget
Personally, I'm not particularly impressed. Yes, I'm impressed by the technology and the fact that we've reached a point where something like this is even possible, but in my opinion, it's soulless and suffers from the same problems as other AI videos. More emphasis was placed on length than quality, and I've seen shorter, traditionally produced videos that had more heart. That's probably because these videos were created by amateurs who thought the AI would fill in all the gaps, but that only underscores the need for human artists with a keen eye.
I agree the impact on generative media is huge. But I also do not think anyone is making a $100m-equivalent film for $20k anytime soon. Disprove me by doing it!
So without further ado:
* If LLMs can indeed produce wholly novel research independently without any external sources, then prove it. Cite sources, unlike the chatbot that told you it can do that thing. Show us actual results from said research or products that were made from it. We keep hearing these things exponentially increase the speed of research and development but nobody seemingly has said proof of this that’s uniquely specific to LLMs and didn’t rely on older, proven ML techniques or concepts.
* If generative AI really can output Disney quality at a fraction of the cost, prove it with clips. Show me AI output that can animate on 2s, 4s, and 1s in a single video and knows when to use any of the above for specific effects. Show me output that’s as immaculate as old Disney animation, or heck, even modern ToonBoom-like animation. Show me the tweens.
* Prove your arguments. Stop regurgitating hypeslop from CEBros, actually cite sources, share examples, demonstrate its value relative to humanity.
All people like us (myself and the author) have been politely asking for since this hype bubble inflated was for boosters to show actual evidence of their claims. Instead, we just get carefully curated sizzle reels and dense research papers making claims instead of actual, tangible evidence that we can then attempt to recreate for ourselves to validate the claims in question.
Stop insulting us and show some f*king proof, or go back to playing with LLMs until you can make them do the things you claim they can do.