Is it a typo in the loss function of Reward model in Week3?

edwardyu · August 28, 2023, 1:33am

As far as I know, the loss function of reward model should be negative log probability rather than positive log probability.

gent.spah · August 28, 2023, 9:48am

I think that is self-understood, right!

AlexSun · September 15, 2023, 5:48am

In the InstructGPT paper they wrote it correctly as negative log-sigmoid:

But in the Stiennon et al. 2020 they wrote loss=log(sigmoid(r_j-r_k)) but interpreted it as “j better than k”.

It’s self-understood for experienced learners. But it’s a mistake for sure.

edwardyu · September 15, 2023, 2:11pm

Sorry! my fault. I should add a question mark, b/c I just suspect it.
Intuitively, we want r_j > r_k, i.e., maximize the log probability of (r_j - r_k). That’s why I suspect the loss function should be negative.
Besides, in the Stiennon et al. 2020, sec 3.4, the author wrote the loss function:

However, in the paper, Figure 2, . It’s really confuse me. Hopefully someone can answer my doubts.

Topic		Replies	Views
Question on the loss function of reward model Generative AI with Large Language Models week-module-3	1	55	July 15, 2024
There might be one error regarding the loss function in the slice on page 21 Generative AI with Large Language Models week-module-3	1	413	August 8, 2023
Why Log Sigmoid log(σ(r_j - r_k)) as loss function to train reward model? GenAI with LLMs Resources	12	873	September 26, 2024
W3 - RLHF Reward Model - loss of reward model Generative AI with Large Language Models week-module-3	1	363	October 1, 2023
Minor error in video - Course 1, Week 2 Neural Networks and Deep Learning coursera-platform	3	537	March 15, 2022

Is it a typo in the loss function of Reward model in Week3?

Related topics