Misunderstanding of softmax loss in RNNs

Mohamed-Amine_BENHIM · September 10, 2023, 11:04am

Hi, i hope you doing well,
in the first week of sequence models course, I think there is a mistake on how we calculate the loss for the softamx function in RNNs.

Since i means the i th training example, and t means the t word in the sentence, i thing the right way to calculate the loss is:

first we should sum over t not i, it's like we calculate the loss for each word and sum it to take the loss for the sentence
second we shoul sum over i to sum all loss for all the senteces in the set

please someone clarify to me if I'm wrong.

rmwkwok · September 10, 2023, 11:18am

Hello @Mohamed-Amine_BENHIM,

From your screenshot, \mathcal{L} has two summations: the outer one sums over t and the inner one sums over i. Let me know if you disagree.

Then, the two summations are commutatable. You can see this by expanding the sums.

Since they are commutatble, the ordering by the slide is correct, and the one by your suggestion is also correct.

Cheers,
Raymond

Mohamed-Amine_BENHIM · September 10, 2023, 11:22am

yeah yeah you’re right, thanks for the clarification

Topic		Replies	Views
Week 1 questions Sequence Models	1	525	December 26, 2021
RNN Cost Function Sequence Models	3	503	April 20, 2023
Loss function of RNN NLP with Sequence Models week-1	2	74	July 13, 2024
Course 4 week 4: Question about triplet loss Convolutional Neural Networks	4	529	August 31, 2022
A question about lecture: Language Model and Sequence Generation Sequence Models	1	513	December 2, 2021

Misunderstanding of softmax loss in RNNs

Related topics