W3 - RLHF Reward Model - loss of reward model

Mohamed_Akram · September 30, 2023, 1:27pm

Why is the loss log(σ(r_j - r_k))? shouldn’t it be -log(σ(r_j - r_k))?

Mongonshagai · October 1, 2023, 10:55am

The choice between these two loss functions depends on the specific task and objective of your machine learning or deep learning model.
The former log(σ(r_j - r_k)))is more of a continuous measure of prediction quality,
while the latter -log(σ(r_j - r_k)) is typically used in binary classification problems where you want to penalize the model for making incorrect predictions.

Topic		Replies	Views
Why Log Sigmoid log(σ(r_j - r_k)) as loss function to train reward model? GenAI with LLMs Resources	12	873	September 26, 2024
Question on the loss function of reward model Generative AI with Large Language Models week-module-3	1	55	July 15, 2024
Is it a typo in the loss function of Reward model in Week3? Generative AI with Large Language Models week-module-3	3	406	September 15, 2023
I have a question about the content of the lecture Generative AI with Large Language Models week-module-3	0	407	August 14, 2023
[Week 2][Quiz] Logistic Loss Neural Networks and Deep Learning coursera-platform	4	617	January 14, 2024

W3 - RLHF Reward Model - loss of reward model

Related topics