Question on the loss function of reward model

Hi,
I’m taking week 3 lecture " [RLHF: Obtaining feedback from humans](https://www.coursera.org/learn/generative-ai-with-llms/lecture/lQBGW/rlhf-obtaining-feedback-from-humans#".

In the lecture, teacher says that the loss function of reward model is log(sigmoid(r_j-r_k)). But i think the loss function should be -log(sigmoid(r_j-r_k)) when preferred completion is y_j. Because sigmoid function should be closer to 1 when r_j - r_k > 0 and then if we want to minimize the log function we should add minus sign, which should be -log(sigmoid(r_j-r_k)).

I don’t know if this understanding is correct, or if there’s an alternative explanation for the formulation of the loss function as presented in the lecture?

Hello, I watched the video and couldn’t find where is she giving this formula?

Maybe this post can help you!

1 Like