RAG is infinitely more accessible and cheaper than finetuning. But it is true that finetuning is getting severely overlooked in situations where it would outperform alternatives like RAG.
This assumes the team deploying the RAG-based solution has equal ability to either engineer a RAG-based system or to finetune an LLM. Those are different skillsets and even selecting which LLM should be finetuned is a complex question, let alone aligning it, deploying it, optimizing inference etc.
The budget question comes into play as well. Even if text is repetitively fed to the LLM, that might happen over a long enough time compared to finetuning which is a sort of capex that it is financially more accessible.
Now bear in mind, I'm a big proponent of finetuning where applicable and I try to raise awareness to the possibilities it opens. But one cannot deny RAG is a lot more accessible to teams which are likely developers / AI engineers compared to ML engineers/researchers.
You are certainly right, managed platforms make finetuning much easier. But managed/closed model finetuning is pretty limited and in fact should be named “distribution modeling” or something.
Results with this method are significantly more limited compared to all the power open-weight finetuning gives you (and the skillset needed in return).
And in either case don’t forget alignment and evals.
> Results with this method are significantly more limited compared to all the power open-weight finetuning gives you (and the skillset needed in return).
I am not sure I understand why you are so certain that finetuned top market models, built by top researchers will be significantly worse than whatever open source model you pick.
It’s significantly harder to get right, it’s a very big stepwise increase in technical complexity over in context learning/rag.
There are now some light versions of fine tuning that don’t update all the model weights but train a small adapter layer called Lora which is way more viable commercially atm in my opinion.
i'm not an expert in either, but RAG is like dropping some 'useful' info into the prompt context, while fine tuning is more like a performing mix of retraining, appending re-interpretive model layers and/or brain surgery.
I'll leave it to you to guess which one is harder to do.
There were initial difficulties in finetuning that made it less appealing early on, and that's snowballed a bit into having more of a focus on RAG.
Some of the issues still exist, of course:
* Finetuning takes time and compute; for one-off queries using in-context learning is vastly more efficient (i.e., look it up with RAG).
* Early results with finetuning had trouble reliably memorizing information. We've got a much better idea of how to add information to a model now, though it takes more training data.
* Full finetuning is very VRAM intensive; optimizations like LoRA were initially good at transferring style and not content. Today, LoRA content training is viable but requires training code that supports it [1].
* If you need a very specific memorized result and it's costly to get it wrong, good RAG is pretty much always going to be more efficient, since it injects the exact text in context. (Bad RAG makes the problem worse, of course).
* Finetuning requires more technical knowledge: you've got to understand the hyperparameters, avoid underfitting and overfitting, evaluate the results, etc.
* Finetuning requires more data. RAG works with a handful datapoints; finetuning requires at least three orders of magnitude more data.
* Finetuning requires extra effort to avoid forgetting what the model already knows.
* RAG works pretty well when the task that you are trying to perform is well-represented in the training data.
* RAG works when you don't have direct control over the model (i.e., API use).
* You can't finetune most of the closed models.
* Big, general models have outperformed specialized models over the past couple of years; if it doesn't work now, just wait for OpenAI to make their next model better on your particular task.
On the other hand:
* Finetuning generalizes better.
* Finetuning has more influence on token distribution.
* Finetuning is better at learning new tasks that aren't as present in the pretraining data.
* Finetuning can change the style of output (e.g., instruction training).
* When finetuning pays off, it gives you a bigger moat (no one else has that particular model).
* You control which tasks you are optimizing for, without having to wait for other companies to maybe fix your problems for you.
* You can run a much smaller, faster specialized model because it's been optimized for your tasks.
* Finetuning + RAG outperforms just RAG. Not by a lot, admittedly, but there's some advantages.
Plus the RL Training for reasoning has been demonstrating unexpectedly effective improvements on relatively small amounts of data & compute.
So there's reasons to do both, but the larger investment that finetuning requires means that RAG has generally been more popular. In general, the past couple of years have been won by the bigger models scaling fast, but with finetuning difficulty dropping there is a bit more reason to do your own finetuning.
That said, for the moment the expertise + expense + time of finetuning makes it a tough business proposition if you don't have a very well-defined task to perform, a large dataset to leverage, or other way to get an advantage over the multi-billion dollar investment in the big models.
Nuance is hard. Binary choices are fast, comforting, and require less thought. Certainty feels safer than ambiguity — especially in conflict, where complexity threatens identity. And in most arenas (tech, media, politics), decisive hot takes get applause. Fence-sitters get ignored.
I worked at a startup where the CEO swore up and down that real-time fine-tuning was the future — that models would continuously update with company data. It sounded cool until you remember:
That’s not how LLMs work.
It’s not efficient.
It’s not flexible.
And it’s not even necessary — we already have RAG.
Pipedreams make good pitch decks. But they break when you hit production.
It's a fucking pipedream, this. That's not how LLMs work, it's not efficient, it's not useful (we have RAG for reference augmentation), and it’s not even desirable unless you want your model overfitting on stale, internal narratives every night.
Hope one day it will be practical to do nightly finetunes of a model per company with all core corporate data stores.
This could create a seamless native model experience that knows about (almost) everything you’re doing.