RNN Cost Function

Rodolfo_Novarini · April 19, 2023, 8:05pm

In the video “Backpropagation through time” (see last slide) Andrew uses the cost function [-y . log yhat - (1-y) . log(1-yhat)].

But in the video “Language model and sequence generation” (see last slide) he uses the cost function [-SUM( y . log yhat) ].

Is there an error in this last video or I am missing something?

paulinpaloalto · April 20, 2023, 2:54am

It looks like in the first case the output is from a sigmoid activation function (binary classification) whereas in the second case, it’s a softmax output (multiclass classification).

The loss functions are really the same if you think about the effect: in both cases only one term will be non-zero, corresponding to the label for that sample.

Rodolfo_Novarini · April 20, 2023, 3:09pm

Hi Paul, thanks for your reply. I can see now the difference and that in the case of the Softmax function we already have a vector ‘y’ with zeros and ones that make this one-term version of the cost function to behave in a similar way that the two terms cost function does for the one-scalar result of the sigmoid function.

paulinpaloalto · April 20, 2023, 3:53pm

Exactly. In the softmax case the y values are “one hot” vectors. In the binary (sigmoid) case, the loss formula takes the single 0 or 1 label and manually converts it to a 2 element “one hot” vector.

Topic		Replies	Views
Week 1 questions Sequence Models coursera-platform	1	526	December 26, 2021
What is the Cost Function for Softmax? Advanced Learning Algorithms week-2	121	372	May 18, 2025
Loss function of RNN NLP with Sequence Models week-1	2	81	July 13, 2024
Misunderstanding of softmax loss in RNNs Sequence Models coursera-platform	2	360	September 10, 2023
Confused in the gradient descent of the logistic log loss function Supervised ML: Regression and Classification week-3	9	696	January 12, 2023

RNN Cost Function

Related topics