Training Attention Weights

AfoDubhashi · May 30, 2021, 5:26pm

Hi,

I had a question about training attention weights. In the lecture we are told that the weights are trained using a single hidden layer that uses S and a<t’> as inputs. Is this network part of the larger attention network and does information pass through it during forward and backward prop?

Thank you.

TMosh · June 27, 2021, 6:22am

Do you still need an answer to this question?

Topic		Replies	Views
Attention model: training data for finding e^<t, t'> Sequence Models	1	534	June 25, 2021
Clarification regarding attention and self attention Sequence Models	4	595	August 22, 2021
Week 3 - Attention Model Lecture Sequence Models	1	513	June 4, 2022
Course 5 Week 4 Assignment: Why are attention weights returned in DecoderLayer Sequence Models	1	739	June 29, 2022
Course 5 Week 4 - Transformer Networks mechanics Sequence Models	1	507	April 21, 2022

Training Attention Weights

Related topics