It is said that it comes from a NN which takes as inputs the previous state s^{<t-1>} and the activation a^{<t'>}, but what is the target of this NN? And how do you get e^{<t,t'>} from that?
If I understand correctly, the value of e^{<t,t'>} it’s not pretrained but computed within the model as e^{<t,t'>} = v_a^T \text{tanh}(W_a s^{<t-1>} + U_a a^{<t'>}), where v_a, W_a, U_a are additional parameters to be estimated.
I understand that as explained the key is that e^{<t,t'>} depends on (s^{<t-1>}, a^{<t'>}).