Models cofigurations for RLHF

Pixies · November 11, 2023, 2:42am

I have a doubt about the lab of week 3. I am not sure what the role of AutoModelSeq2SeqLMWithValueHead is in RLHF. Is it the model to train, or the instructor (used in KL divergence to avoid reward hacking), or the classifier (which I don’t think so, since there is AutoModelForClassification)? And I don’t know why it has a Value Head. What is its role?

And here ref_model = create_reference_model(ppo_model) shouldn’t be ref_model = create_reference_model(peft_model)?, since I thought that the reference model is the model without the new LoRA matrix and just the fine tuned in lab 2 with LoRA.

Thanks!

gent.spah · November 11, 2023, 8:00am

That seems to be right.

Topic		Replies	Views
Details about ValueHead used for RLHF Generative AI with Large Language Models week-3	0	350	September 22, 2023
Creating the ref model in lab3 for the RLHF algorithm Generative AI with Large Language Models lab-help	1	20	October 19, 2024
Reinforcement learning LLMs Generative AI with Large Language Models week-3	1	68	July 6, 2024
I have a question about the content of the lecture Generative AI with Large Language Models week-3	0	401	August 14, 2023
Week 3: Video RLHF Reward Model Generative AI with Large Language Models week-3	0	316	November 18, 2023

Models cofigurations for RLHF

Related topics