#week3 lab: fine-tuning a model with RLHF
Why the frozen model used as ref model for the RLHF algo is a copy of the PPO model and not just the peft model?
ref_model = create_reference_model(ppo_model)
instead of: ref_model = peft_model
#week3 lab: fine-tuning a model with RLHF
Why the frozen model used as ref model for the RLHF algo is a copy of the PPO model and not just the peft model?
ref_model = create_reference_model(ppo_model)
instead of: ref_model = peft_model
Hello Jorge, welcome to the community!
When fine-tuning a model with RLHF using PPO, it is common to use a frozen reference model, where the reward signal is computed by comparing the outputs of the updated PPO model to those of a stable baseline, known as the reference model. This method helps ensure that the updated model does not deviate significantly from its original state by referencing the outputs of the frozen copy. The reference model must be a copy of the PPO model because RLHF relies on comparing the same architecture at different training stages to assess meaningful improvements. Using the peft_model
would break this comparison and result in inconsistent reward calculations.
Hope this helps!