Training Attention Weights

Hi,

I had a question about training attention weights. In the lecture we are told that the weights are trained using a single hidden layer that uses S and a<t’> as inputs. Is this network part of the larger attention network and does information pass through it during forward and backward prop?

Thank you.

1 Like

Do you still need an answer to this question?