C5W3 | Attention model: how to compute alpha<t,t'>

In the Attention model class, minutes 7-8, when explaining the equation for \alpha:

\alpha^{<t,t'>} = \frac{\exp(e^{<t,t'>})}{\sum_{t'=1}^{T_x}\exp(e^{<t,t'>})}

How do you compute the e^{<t,t'>}?

It is said that it comes from a NN which takes as inputs the previous state s^{<t-1>} and the activation a^{<t'>}, but what is the target of this NN? And how do you get e^{<t,t'>} from that?

attention_weights

Thank you.

The embeddings are pre-computed by a sequence model that uses a large corpus of sample text as the training set.

Thank you for your reply. I see from the paper Neural Machine Translation by Jointly Learning to Align and Translate that this part refers to the alignment model.

If I understand correctly, the value of e^{<t,t'>} it’s not pretrained but computed within the model as e^{<t,t'>} = v_a^T \text{tanh}(W_a s^{<t-1>} + U_a a^{<t'>}), where v_a, W_a, U_a are additional parameters to be estimated.

I understand that as explained the key is that e^{<t,t'>} depends on (s^{<t-1>}, a^{<t'>}).

1 Like

Thanks for the reference.