Hey.
I have some questions on this weeks material:
Loss function for RNN
(1) the loss function in page 15 of the slides should be a vector of the same size of the vocabulary. shouldn’t that be a sum over all entries?
(2) later in page 24, there’s a different formula for the loss function, this time only for one element from the logistic loss function. what happened to the other? [+ (1-y)*log(1-y_hat) ]
It implies that the loss only takes note of one element of the vocabulary, and that it’s only one element in this sum over i (y is a one-hot vector).
Implementation of RNN (ex1)
(3) in the first exercise (1.1 RNN Cell) it is noted that T_x is taken from the longest sentence. what happens in the calculation for the other shorter sentences? doesn’t it impact the model when “training” on Nulls (no words)?
Thanks!