What is the loss function in the dinosaur assignment?
You can determine this by reading the rnn_forward() function in the utils.py file.
Yes, I have seen that but its not clear to me.[It does not look like the one described in the lecture] Can you please write the exact mathematical formula.
Forward propagation uses softmax at the output layer, so it is the standard cross entropy loss that is used with softmax. At each timestep it is:
L(y,\hat{y}) = -y * log(\hat{y})
The only other subtlety is that they are summing the loss across all the timesteps. The comment could be a little clearer there: they call it subtraction, but the terms are negative so you’re really adding them. Well, they actually call it “substraction” (sic).
All the indexing business is just selecting the element of the vector that corresponds to the one hot label at that timestep. It all boils down to -log(\hat{y}) for the element that corresponds to the “true” label for that timestep. Which is absolutely “flavor vanilla” cross entropy loss …