Week 3: Video RLHF Reward Model

Ashwin_Sateesh_Kumar · November 18, 2023, 8:55pm

Hi All,

I was able to understand the training process of the reward model. I want to know how do we give/ feed (Prompt X, Y_j) and (Prompt X, Y_k) and labels [0,1] to data loaders and how are these preprocessed and given to the reward model.

Since we have like 3 sets of pairwise completions for each prompt X, I am not able to comprehend how we give them as input and labels to the reward model. Please do let me know in detail, if possible with a code snippet.
Thank you

Topic		Replies	Views
Question about reward model in RLHF Generative AI with Large Language Models week-module-3	7	471	January 7, 2024
RLHF: how many labeler results per prompt are input to reward model? Generative AI with Large Language Models	3	20	February 13, 2025
Week 3 general question Generative AI with Large Language Models	3	48	December 1, 2024
Week3 lab, the part given to the reward model using human feedback Generative AI with Large Language Models week-module-3 , faq	18	276	June 4, 2024
Does reward model need retraining with domain specific inputs? Generative AI with Large Language Models week-module-3	2	303	November 5, 2023

Week 3: Video RLHF Reward Model

Related topics