A question about lecture: Language Model and Sequence Generation

yezhang.hydro · August 13, 2021, 12:16am

Hi,

I have a question about the lecture " Language Model and Sequence Generation", at 10:19 min, where a loss function for time step has a sub index ‘i’. See below screenshot. I’m first guessing ‘i’ is summing over all the training examples, although the index is a subscript rather than a superscript defined in one of the earlier lectures for training examples. Another more likely guess is that it is summing over all 10,003 probabilities, because the predicted y is a softmax with 10,003 values. Any explanation is welcome.

arosacastillo · December 2, 2021, 12:48pm

Hi Ye,

Thanks for your question. Yes, I would say so too. From the week assignment of that lecture:

𝑦̂ is a 3D tensor of shape (𝑛𝑦,𝑚,𝑇𝑦)

𝑛𝑦 : number of units in the vector representing the prediction
𝑚: number of examples in a mini-batch
𝑇𝑦: number of time steps in the prediction

So you need to compute the loss for all the examples in the batch.

Best and happy learning,

Rosa

Topic		Replies	Views
C5 , W1 Dinosaur Island-Character-Level Language Modeling Sequence Models	2	318	December 4, 2023
Lecture Notebook: Vanishing Gradients; Confusion NLP with Sequence Models week-3	11	500	January 6, 2023
Dinosaurus_Island_Character_level_language_model Exercise 4 Sequence Models week-1	3	32	October 17, 2024
Course 5 Week 4 Assignment Exercise 8 code comment typo? Sequence Models	1	591	July 21, 2021
[W1A2E4] Model() - Wrong expected output: Loss drops too quickly Sequence Models	3	562	September 14, 2022

A question about lecture: Language Model and Sequence Generation

Related topics