In week 3 of Generative AI with LLMs course,
My question is, in cell 12 of Lab 3, is_trainable is set to True, even when we are using PEFT, but it was set to false in Lab 2, when we were fine tuning with PEFT. Why is that so?
I have not seen these notebooks but this is likely due to the different objectives and stages of fine-tuning in each lab.
In Lab 2, setting is_trainable
to False
during PEFT ensures that only a subset of the model parameters are updated. This is useful when the aim is to fine-tune only specific parts of the model while keeping the rest of the pre-trained model weights unchanged (e.g. feature extractor in ASR, etc.).
In Lab 3, setting is_trainable
to True
shows that the model needs to be more flexible and adapt more significantly to the new data. This could be necessary for integrating more complex or diverse patterns from the training data.
Hope this helps!
This cell refers to the PPO model (the reinforcement model) which is used to steer the peft model (and the entire LLM) in the proper direction. These scripts are from the Lab:
“Add the adapter to the original FLAN-T5 model. In the previous lab you were adding the fully trained adapter only for inferences, so there was no need to pass LoRA configurations doing that. Now you need to pass them to the constructed PEFT model, also putting is_trainable=True
.
2
In this lab, you are preparing to fine-tune the LLM using Reinforcement Learning (RL). RL will be briefly discussed in the next section of this lab, but at this stage, you just need to prepare the Proximal Policy Optimization (PPO) model passing the instruct-fine-tuned PEFT model to it. PPO will be used to optimize the RL policy against the reward model.”
So in this lab we need the PEFT Model to be trainable to be able to steer it in the right direction with the PPO!
So basically, in lab 2, we weren’t actually training the PEFT model, just learning how it works, and then just loaded the already trained model. But in lab 3, we were actually training the model so is_trainable was set to True. Is it?
Yeah I think thats right because in order to use PPO you need to change the PEFT model…
Okay, Thanks!