W3 - RLHF Reward Model - loss of reward model

Why is the loss log(σ(r_j - r_k))? shouldn’t it be -log(σ(r_j - r_k))?

1 Like

The choice between these two loss functions depends on the specific task and objective of your machine learning or deep learning model.
The former log(σ(r_j - r_k)))is more of a continuous measure of prediction quality,
while the latter -log(σ(r_j - r_k)) is typically used in binary classification problems where you want to penalize the model for making incorrect predictions.