Question on the loss function of reward model

Hello, I watched the video and couldn’t find where is she giving this formula?

Maybe this post can help you!

1 Like