so the y<3> is the combinations of all prevous words, right? In this case:
y<3> = cats average 15
No. y^{<3>} is the third word only. But it is influenced by all previous words.
1 Like
My mistake, I meant y<2> [See the attached].
Now if y<2> is the second word, and we know from slide 5 that x<3> is the notation for the third word, then how x<3> = y<2>? kindly see the attached.
After predicting y^{<2>} at time step 2, the model uses this prediction as the input for time step 3, which is why X^{<3>} = y^{<2>}.