Sure but it is also reasonable to consider that the pace of progress is not alwa...

TeMPOraL · on Nov 10, 2024

Yes, but!

Exponential pace of progress isn't usually just one thing; if you zoom in, any particular thing may plateau, but its impact compounds in enabling growth of successors, variations, and related inventions. Nor is it a smooth curve, if you look closely. I feel statements like "a 405b model is not 5 times better than a 70b model" are zooming in on a specific class of models so much you can see the pixels of the pixel grid. There's plenty of open and promising research in tweaking the current architecture in training or inference (see e.g. other thread from yesterday[0]), on top of changes to architecture, methodology, methods of controlling or running inference on exiting models by lobotomizing them or grafting networks to networks, etc. The field is burning hot right now, we're counting space between incremental improvements and interesting research directions in weeks. The overall exponent of "language models" power may just well continue when you zoom out a little bit further.

--

[0] - https://news.ycombinator.com/item?id=42093112

mewpmewp2 · on Nov 10, 2024

How do you determine the multiplier. Because e.g. there are many problems that GPT4 can solve while GPT3.5 can't. In this case it is infinitely better.

ekianjo · on Nov 11, 2024

Let's say your benchmark gets you at 60% with a 70b parameter model and you get to 65% with a 405b one, it's fairly obvious that it's just incremental progress, not a sustainable growth of capabilities per added parameter. Also, most of the data used these days for trainings these very large models is synthetic data, which is probably very low quality overall compared to human-sourced data.

mewpmewp2 · on Nov 12, 2024

But so if there's a benchmark that a model scores at 60%, does it mean that it's literally impossible to make anything that could be more than 67% better?

E.g. if someone scores 60% at a high school exam, is it impossible for anyone to be more than 67% smarter than this person at that subject?

Then what if you have another benchmark where GPT3.5 scores 0%, but GPT4 scores 2%. Does it make GPT4 infinitely better?

E.g. supposedly there was one LLM that did 2% in FrontierMath.