Week3 lab, the part given to the reward model using human feedback

When creating a post, please add:

Hello, I am taking week3 lab.
I am working in a lab that practices RLHF. Is there a part where human feedback is delivered to the reward model?

From what I heard in the lecture, I understood that Human feedback was received, passed on to the reward model, and then the reward model learned it. But I don’t know where this part is in the lab.

Or, just use something called scaling-human-feedback in this exercise?

1 Like

The reward model (as far as I remember now) is already trained with human feedback outside the lab!

1 Like

To complement the answer, yes the reward model is a pre-trained model that used human feedback to prepare it. You can actually choose any reward model suited for your needs when performing reinforcement learning, and can be even train your own reward model if you want to further customize your results.

2 Likes