LoRA: intuition w.r.t catastrophic forgetting

pdhoolia · August 21, 2023, 10:39am

PEFT was discussed as a one of the considerations to mitigate catastrophic forgetting problem.
Intuitively I am completely in sync with the scaling benefits (computational efficiency…) of PEFT.
However, I couldn’t follow the intuition behind how or why LoRA should mitigate catastrophic forgetting issue. Though we are training 2 low rank matrices with LoRA, we are multiplying them to get a same shape matrix as the original weight matrix, and then for self-attention we are using original weight matrix + (A * B). Now A * B doesn’t have any sort of skew properties, therefore the resulting matrix that we are going to apply for inference has potentially everything in it changed. So why should it be any better than the full-finetuned weights w.r.t. catastrophic forgetting?

gent.spah · August 21, 2023, 3:35pm

The A*B matrix will be an add on into the model weights not permanently residing in the weights memory. When you would want to use the LLM for another task you shall remove the lora and it will come to its original use.

pdhoolia · August 22, 2023, 1:57am

Thank you for your response!

I was thinking the same.
However, in a practical LLM application this (decision of LoRA / No-LoRA) would demand:

either a pre-classification LLM call to decide LoRA / No-LoRA
or a second (No-LoRA) call (on somehow realizing that the first call result wasn’t as expected)

This (pre-classification / post-No-LoRA-call) strategy is equally applicable for full fine-tuned models as well. Doesn’t that reduce the benefits to scaling and computational efficiency?

gent.spah · August 22, 2023, 1:51pm

If I understand right you want to automate wether to use LoRA or original model itself. I would supposed that could be done in some way.

The benefits of LoRA is that you don’t change the original model, nor you use heavy computational power to fine tune it, neither you forget the previous learned tasks. Fine tuning is definitely a better way to tune to a certain task.

Other than that you could possibly devise any other usage mechanisms.

ultimate_brando9 · September 4, 2024, 5:06pm

Right there is no guarantees that LoRA won’t change models inference so much to induce catastrophic forgetting when applied.

Only way to guarantee this is to not use the LoRA model and retain the original weights.

Topic		Replies	Views
Fine tuning using LoRA method Generative AI with Large Language Models week-module-2	8	950	September 7, 2023
PEFT-LoRA: model performance Generative AI with Large Language Models week-module-2	0	462	October 3, 2023
Week 2 Question 7 - Description of LoRA Method Generative AI with Large Language Models quiz-help , week-module-2	1	78	January 24, 2025
Clarification on LoRA Generative AI with Large Language Models week-module-2	1	26	March 23, 2025
Stacking fine-tuning Generative AI with Large Language Models week-module-3	2	398	December 6, 2023

LoRA: intuition w.r.t catastrophic forgetting

Related topics