How can you be confident the low rank decomposition matrices you end up choosing are indeed the ones that are going to be relevant to fine tuning the model for a particular task (such as summarization)?
Thank you for your post!
The LoRA matrices are trained by you. Your dataset should be one that helps the model learn format and style, and even a little bit of knowledge, in your desired direction. So it is not like you end up choosing a set of matrices, but you train those matrices.
Does it make sense?
Can you share your understanding of this, that leads to your question? I am curious.
Thanks for your response. Maybe my understanding of loRA was wrong then. From what you say, it sounds like LoRA goes to work only on refining a LLM model to be better at a task ( such as summarization) that it is already quite good at?
Thanks for your reply! Yes, with LoRA we try to make an LLM to be better at a given task. The initial model can be ‘generalistic’, and of course if you pick a model that is already strong at the target task and you fine-tune it even more in a more specific aspect of the task, like a particular domain, then we would expect it to get even better.
Please feel free to ask questions about this topic that you may still find not clear.
Thanks so much for your response.
My only remaining question is the following.
Suppose you are trying to fine tune an LLM which is good at two tasks: (1) sentiment analysis and (2) summarization. If your intent is to fine tune it for doing summarization better, how would you know the LoRA matrices aren’t going to cause the sentiment analysis task to perform worse than previously?
In other words, you would not know a priori which weights should be tweaked for making summarization better.
That’s a good question and I am afraid I cannot give a definitive answer.
Your model per se will not change. Remember that you are not modifying the weights of the model but instead you are training a new set of matrices that then are added to the weights to alter the model.
You can run your model with or without the LoRA Trained matrices. When you instantiate your model with the LoRA adapted matrices, the result could be better for summarization (as expected), and regarding sentiment analysis, it could be the same, or worst, or perhaps better.
And this is one of the great things of Peft+LoRA: For the same base model you can do multiple fine-tunes on different tasks, and all you have to do is instantiate the model with one of the needed tasks.
Thanks so much!
Hi @Juan_Olano -
I have an additional question about LoRA training.
Let me start out by labeling the four matrices I will be referencing:
Matrix F: The frozen weights of the original model.
Matrix A: One of the matrices we will be training and multiplying to get our additive matrix.
Matrix B: The other matrix we will be training and multiplying to get our additive matrix.
Matrix Q: The additive matrix which is derived from multiplying A * B. The dimensions of this matrix should equal the dimensions of F.
After we train the weights on A and B, we need to multiply these matrices together to get matrix Q.
How can we be sure that matrix Q when added to matrix F will “get us going in the right direction” and improve the model’s performance on a specific task?
I understand how we can train an entire model using stochastic gradient descent to ensure we move the model in the right direction. In theory, we should be able to do the same for matrices A and B. But when we multiply these matrices, we introduce a new mathematical operation. How can we be sure the weights of F + Q will be an improvement over F?
I hope that makes sense. Thanks Juan and community!
Thank you for your follow up question. I’d like to start by saying that we don’t have to do the creation of matrices A and B, nor have to specifically freeze F, nor have to calculate Q. I say this, just to make sure we are in the same page with this. All these operations are done by the library you use to apply Peft with LoRA. Of course, this is true if you are using a library. IF you are venturing into creating the process from scratch, that’s a different story, and honestly, above my level
Now, how can we know if the fine tuning (FT) is going in the right direction? by testing it, by running metrics once the FT is done. If it is getting you closer to your goals, great. If not, your place to look at is in your data. Is your data enough, clearly labeled? conducive to the task? Another option is changing your base model. If your data is good, then we can try with different models. Instead of Flan T5, may be Bert or Bloom?
But in my experience, I get success usually with the right data, and of course with the appropriate model. For classifications I usually pick BERT, for example.