Misunderstanding of softmax loss in RNNs


Hi, i hope you doing well,
in the first week of sequence models course, I think there is a mistake on how we calculate the loss for the softamx function in RNNs.

Since i means the i th training example, and t means the t word in the sentence, i thing the right way to calculate the loss is:

  1. first we should sum over t not i, it's like we calculate the loss for each word and sum it to take the loss for the sentence
  2. second we shoul sum over i to sum all loss for all the senteces in the set
please someone clarify to me if I'm wrong.

Hello @Mohamed-Amine_BENHIM,

From your screenshot, \mathcal{L} has two summations: the outer one sums over t and the inner one sums over i. Let me know if you disagree.

Then, the two summations are commutatable. You can see this by expanding the sums.

Since they are commutatble, the ordering by the slide is correct, and the one by your suggestion is also correct.

Cheers,
Raymond

yeah yeah you’re right, thanks for the clarification