In C4_W1_Ungraded_Lab_1_Basic_Attention, it says the dimensions of W_a, U_a are (n x m), where n: hidden state size and m: layer size in alignment network.
I’m confused as to what “n” and “m” are exactly. Could you please explain with the help of an actual example ?
Hi @Anivader
I’ve reproduced this lab in an Excel sheet, when I was learning. I can share it if it’ll help:
- Here are the
in the aligment
as in this part:

- Then there is a Linear transformation:
here are the weights (Note: that they’re already transposed for convenience (as in the code) - (2n, m) or (2*hidden_size, attention_size)
, (2*16, 10) . Here they are stacked version of W_a and U_a. (because W_a \cdot s_{i-1} + U_a \cdot h_j is equivalent to W_a | U_a \cdot s_{i-1}| h_j)
when you dot product these matrices you get:
as in here:

- Then you apply tanh and get
as in here:

- Then comes the second layer v_a. Here are the weights:
When you dot product activations
with this layer’s weights, you get “alignment scores”:

as in here:

Just to complete the lab here are the remaining calculations
5. Then you apply softmax to get the “attention weights” (variable weights
in the attention

Then you just multiply (Hadamard product) “Encoder states” with these “attention weights” to get the weighted encoder states (weighted_scores
in the attention
Lastly you sum these encoder states along axis 0 to get the “context”: