Week 3 - Attention Model Lecture

jaypatel_01 · February 7, 2022, 7:24pm

Hi,

In the week 3 lecture titled ‘Attention Model’, it is said that a small neural network is trained for the computation of e^{<t, t^’>}. I wanted to know some details about it. Even if we ‘trust’ the backpropagation and gradient descent to find the correct value for e^{<t, t^’>}, what are we training against as the target, as in the ‘ground truth’? The ‘ground truth’ value of e^{<t, t^’>} isn’t known beforehand, so I am failing to understand how the training works.

balaji.ambresh · June 4, 2022, 6:40am

In the lecture, the attention model is used for translating a sentence from french (input) to english (output).
Training data consists of french sentences along with their english translations since this is a supervised learning problem.

Topic		Replies	Views
Attention model: training data for finding e^<t, t'> Sequence Models coursera-platform	1	534	June 25, 2021
C5W3 \| Attention model: how to compute alpha<t,t'> Sequence Models week-3 , coursera-platform	3	146	April 29, 2024
Understanding Transformer Network Sequence Models coursera-platform	1	558	July 29, 2021
Week2 Practice Quiz question 4 NLP with Attention Models week-2	1	326	October 30, 2023
Attention sequence model week3(make post attention steps output depend on prior step) Deep Learning Resources	1	91	October 7, 2022

Week 3 - Attention Model Lecture

Related topics