Creating the ref model in lab3 for the RLHF algorithm

ojodesauron · October 17, 2024, 1:06pm

#week3 lab: fine-tuning a model with RLHF

Why the frozen model used as ref model for the RLHF algo is a copy of the PPO model and not just the peft model?
ref_model = create_reference_model(ppo_model)
instead of: ref_model = peft_model

nadtriana · October 19, 2024, 11:21am

Hello Jorge, welcome to the community!

When fine-tuning a model with RLHF using PPO, it is common to use a frozen reference model, where the reward signal is computed by comparing the outputs of the updated PPO model to those of a stable baseline, known as the reference model. This method helps ensure that the updated model does not deviate significantly from its original state by referencing the outputs of the frozen copy. The reference model must be a copy of the PPO model because RLHF relies on comparing the same architecture at different training stages to assess meaningful improvements. Using the peft_model would break this comparison and result in inconsistent reward calculations.

Hope this helps!

Topic		Replies	Views
Reinforcement learning LLMs Generative AI with Large Language Models week-module-3	1	120	July 6, 2024
Models cofigurations for RLHF Generative AI with Large Language Models week-module-3	1	321	November 11, 2023
PPO Config model parameter (which model) Generative AI with Large Language Models week-module-3	3	409	November 11, 2023
PEFT during avoidance of reward hacking Generative AI with Large Language Models week-module-3	4	349	November 7, 2023
Week3 lab, the part given to the reward model using human feedback Generative AI with Large Language Models week-module-3 , faq	18	309	June 4, 2024

Creating the ref model in lab3 for the RLHF algorithm

Related topics