Week 2 Question 7 - Description of LoRA Method

In the week 2 quiz, question 7 asks “Which of the following best describes how LoRA works?”. I got this question wrong, but after reviewing the suggested course material and referring to the original LoRA paper, I believe all of the answers are incorrect.

  • My Answer: LoRA freezes all weights in the original model layers and introduces new components which are trained on new data.
  • The answer I think the quiz is looking for: LoRA decomposes weights into two smaller rank matrices and trains those instead of the full model weights.

My problems with this answer:

  1. The two smaller (sic) rank matrices A and B are not decompositions of the weights. Instead, they are a decomposition in the change in weights of the existing model under new training data.
  2. These matrices A and B, injected into the model by LoRA and added to the weight matrix, could reasonably be described as “new components,” using the commonly understood definition of these words.

The following three quotes from the LoRA paper (Hu, 2021) seem to support my take on this:

  • “LoRA allows us to train some dense layers in a neural network indirectly by optimizing rank decomposition matrices of the dense layers’ change during adaptation instead, while keeping the pre-trained weights frozen” - The change, not the weights themselves.

  • “For a pre-trained weight matrix W0 ∈Rd×k, we constrain its update by representing the latter [referring to the update] with a low-rank decomposition” - The update, not the weights themselves.

  • “We use a random Gaussian initialization for A and zero for B, so ∆W= BA is zero at the beginning of training” - As opposed to a decomposition of the weights, as suggested in the quiz answer.

The lecture video also seems to agree with the LoRA paper, and disagree with the quiz answer:

  • “LoRA is a strategy that reduces the number of parameters to be trained during fine-tuning by freezing all of the original model parameters and then injecting a pair of rank decomposition matrices alongside the original weights.” - This is different than “decomposes weights” as stated in the quiz answer choice.

I understand that the term “new components” in this course is associated with additive methods, and you consider LoRA to be a reparametrization method. But the language “new components” is not precisely defined. Meanwhile, the change in weights is certainly a different thing than the weights themselves.

Given all this, I think the answer I chose is less specific but plausibly more correct (depending on how precisely we are defining the term “new components”). But really, neither answer is great. If my take here is correct, I suggest a better answer:

  • A better answer could be: “LoRA freezes the original weights, decomposes the change in weights into two lower rank matrices, and trains those instead of the full model weights.”

Thanks for your consideration.

1 Like

You probably are more accurate here, but I think they chose that answer because they did not want to complicate the learner’s understanding of the overall process, so they simplified it. My thought anyway!