Hi, i hope you doing well,
in the first week of sequence models course, I think there is a mistake on how we calculate the loss for the softamx function in RNNs.
Since i means the i th training example, and t means the t word in the sentence, i thing the right way to calculate the loss is:
first we should sum over t not i, it's like we calculate the loss for each word and sum it to take the loss for the sentence
second we shoul sum over i to sum all loss for all the senteces in the set