> However, the literature is unclear on how well LoRA performs relative to FullF...

adhi01 · 2025-10-04T05:02:19 1759554139

To say that the 'literature is clear on that' while citing a single paper, which has been rejected from ICLR, is a bit of an overstatement.

kouteiheika · 2025-10-04T10:35:20 1759574120

> which has been rejected from ICLR

Oh, you mean rejected just like these papers?

Efficient Estimation of Word Representations in Vector Space[1], one of the most influential papers in the space with tens of thousands of citations[2]? Or the RoBERTa[3] paper (dramatically improved upon BERT; RoBERTa and derived models currently have tens of millions of downloads on HF and still serve as a reliable industry workhorse)? Or the Mamba paper[4] (pretty much the only alternative to transformers that actually gets used)? Do you want me to keep going?

Honestly, I find that whether a paper gets rejected or not means diddly squat considering how broken the review system is, and through how much honestly terrible papers I have to wade through every time I'm looking through the conference submissions for anything good.

[1] -- https://openreview.net/forum?id=idpCdOWtqXd60

[2] -- https://scholar.google.com/scholar?cites=7447715766504981253

[3] -- https://openreview.net/forum?id=SyxS0T4tvS

[4] -- https://openreview.net/forum?id=AL1fq05o7H

moralestapia · 2025-10-05T10:57:29 1759661849

Based.

This guys knows his stuff.

muragekibicho · 2025-10-04T06:56:43 1759561003

Thanks for this comment.

p1esk · 2025-10-04T17:29:49 1759598989

Even that paper itself does not provide any "clear" conclusions about which method is better.

lelanthran · 2025-10-04T09:51:02 1759571462

> I'm surprised they didn't cite this; it's a well known paper.

I'm surprised you copied and pasted all of that without explaining what it means.

Does LoRA perform worse, better or statistically insignificantly different to FullFT?

You aren't able to tell from what you pasted, are you?

cheald · 2025-10-04T19:38:25 1759606705

Standard LoRA (W_delta = B@A with standard inits) generally underperforms FT, primarily because of "intruder dimensions" (new high-ranking singular vectors which misalign with the singular vectors of the underlying weights) as outlined in the paper.

There are techniques like PiCa and SVFT which can mitigate much of the loss, though.

tangjurine · 2025-10-05T02:22:55 1759630975

pica came out two days ago, how did you find out about it?

cheald · 2025-10-05T07:39:18 1759649958

The one I was referring to was from this paper, first published in May: https://arxiv.org/abs/2505.20211v1

I don't recall how I found out about it, but it was either paperswithcode or an LLM research session working through the intruder dimensions problem.

In my Stable Diffusion tests, it substantially improves LoRA training speed and fidelity, though I've got some experiments that seem to even further substantially improve on it by adding learnable rotations of the singular vectors.

crimsoneer · 2025-10-04T15:54:09 1759593249

If you're going to be snarky, could you at least clarify what the answer is for those of us who don't stay on top of ML research...?

lelanthran · 2025-10-04T19:44:26 1759607066

> If you're going to be snarky, could you at least clarify what the answer is for those of us who don't stay on top of ML research...?

The answer is "There's a difference, perhaps", but the GP appeared to imply that LoRA performed worse.

My understanding is that that paper found differences, but did not conclude that the differences were quantifiably better or worse, but this is not what GP's post implied.

p1esk · 2025-10-04T17:32:28 1759599148

The paper does not make any clear conclusions about LoRA vs FullFT performance, beyond "the two methods seem to be learning different things".

richardvsu · 2025-10-04T16:19:36 1759594776

Why would they cite a paper that’s not helping with their Tinker API that was released soon after? :)