C5W3 | Attention model: how to compute alpha<t,t'>

aledesa · April 27, 2024, 4:50pm

In the Attention model class, minutes 7-8, when explaining the equation for \alpha:

\alpha^{<t,t'>} = \frac{\exp(e^{<t,t'>})}{\sum_{t'=1}^{T_x}\exp(e^{<t,t'>})}

How do you compute the e^{<t,t'>}?

It is said that it comes from a NN which takes as inputs the previous state s^{<t-1>} and the activation a^{<t'>}, but what is the target of this NN? And how do you get e^{<t,t'>} from that?

attention_weights

Thank you.

TMosh · April 27, 2024, 8:41pm

The embeddings are pre-computed by a sequence model that uses a large corpus of sample text as the training set.

aledesa · April 28, 2024, 3:37pm

Thank you for your reply. I see from the paper Neural Machine Translation by Jointly Learning to Align and Translate that this part refers to the alignment model.

If I understand correctly, the value of e^{<t,t'>} it’s not pretrained but computed within the model as e^{<t,t'>} = v_a^T \text{tanh}(W_a s^{<t-1>} + U_a a^{<t'>}), where v_a, W_a, U_a are additional parameters to be estimated.

I understand that as explained the key is that e^{<t,t'>} depends on (s^{<t-1>}, a^{<t'>}).

TMosh · April 29, 2024, 12:08am

Thanks for the reference.

Topic		Replies	Views
Typos (wrong variable name) in exercise "Neural Machine Translation" Sequence Models coursera-platform	1	517	November 11, 2021
Week 3 - Attention Model Lecture Sequence Models coursera-platform	1	522	June 4, 2022
Attention model: training data for finding e^<t, t'> Sequence Models coursera-platform	1	543	June 25, 2021
Understanding of basic Attention code NLP with Attention Models week-module-1	3	583	August 13, 2023
Video: NMT Model with Attention NLP with Attention Models week-module-1	5	415	December 21, 2023

C5W3 | Attention model: how to compute alpha<t,t'>

Related topics