Why Log Sigmoid log(σ(r_j - r_k)) as loss function to train reward model?

Hello Gen AI community,

I’m struggling to understand why this loss function is used to train the reward model. I’m referring to the “RLHF: Reward model” video at min 1:00

Given we want to train the model to favor completion y_j as opposed to y_k, my understanding is that we want to maximize y_j - y_k. So we want to get σ(r_j - r_k) as close to 1 as possible. But log(σ(r_j - r_k)) goes from -∞ to 0 so minimizing it seems to me like it yields the opposite effect of what we want.

I would understand if we were let’s say taking log(σ(r_k - r_j)) or taking -log(σ(r_j - r_k))

Anyone able to clarify this for me?


1 Like

I think we can find support to one of our suggestions from the paper listed in the lower left corner of those slides.

1 Like

That’s a good observation @cedricvidal. In the paper the loss is defined as -E_{x\sim D}[log(\sigma(r_j-r_k))] where x is a summary input.

1 Like

Yes, I looked at the paper but I’m not familiar with the E_{x\sim D}[f] notation, could you explain to me? Especially what E, D and what is the relation between E and what’s in the brackets?

Note: how do you write equations in a post here?

1 Like

The loss has to be computed for all samples in the dataset. Therefore, in practice, it is just an expectation (average) over the loss computed for each sample (summaries) in the dataset D. Therefore, you can move this negative sign inside the E[ \cdot ], which result in -\log(\cdot) for individual sample.

To write equations, just use Tex code inside the dollar signs like $E_{x\sim D}$, this will be displayed as E_{x\sim D}

1 Like

yes, i believe the course slide should be updated to indicate that the loss (being minimized) is the negative log sigmoid of the rj - rk

1 Like

Agree to this point.

I think the course material is coming from ‘Figure 2’ of the paper

And this seems not aligned with the mathematical formula in page 6 of the paper, where there is a negative sign there.

1 Like