Week 1 questions

I have some questions on this weeks material:

Loss function for RNN
(1) the loss function in page 15 of the slides should be a vector of the same size of the vocabulary. shouldn’t that be a sum over all entries?

(2) later in page 24, there’s a different formula for the loss function, this time only for one element from the logistic loss function. what happened to the other? [+ (1-y)*log(1-y_hat) ]
It implies that the loss only takes note of one element of the vocabulary, and that it’s only one element in this sum over i (y is a one-hot vector).

Implementation of RNN (ex1)
(3) in the first exercise (1.1 RNN Cell) it is noted that T_x is taken from the longest sentence. what happens in the calculation for the other shorter sentences? doesn’t it impact the model when “training” on Nulls (no words)?


The first form of cross entropy loss is for the binary classification case (sigmoid as the output activation). The second form is the one that covers that case that softmax is the output activation (multiclass classification).

Notice that the first one is the sum over the label = 1 and label = 0 terms, whereas the second is the sum over all the classes. In the multiclass case, only one of the y_i^{<t>} terms will be non-zero (the correct label for the particular sample). Of course that is true in the binary case as well.