Creating the ref model in lab3 for the RLHF algorithm

#week3 lab: fine-tuning a model with RLHF

Why the frozen model used as ref model for the RLHF algo is a copy of the PPO model and not just the peft model?
ref_model = create_reference_model(ppo_model)
instead of: ref_model = peft_model

Hello Jorge, welcome to the community!

When fine-tuning a model with RLHF using PPO, it is common to use a frozen reference model, where the reward signal is computed by comparing the outputs of the updated PPO model to those of a stable baseline, known as the reference model. This method helps ensure that the updated model does not deviate significantly from its original state by referencing the outputs of the frozen copy. The reference model must be a copy of the PPO model because RLHF relies on comparing the same architecture at different training stages to assess meaningful improvements. Using the peft_model would break this comparison and result in inconsistent reward calculations.

Hope this helps!