Hello,
In C4_W1_Ungraded_Lab_1_Basic_Attention, it says the dimensions of W_a, U_a are (n x m), where n: hidden state size and m: layer size in alignment network.
I’m confused as to what “n” and “m” are exactly. Could you please explain with the help of an actual example ?
Thanks
Ani
Hi @Anivader
I’ve reproduced this lab in an Excel sheet, when I was learning. I can share it if it’ll help:
- Here are the
inputs
in the aligment
function:
as in this part:

- Then there is a Linear transformation:
here are the weights (Note: that they’re already transposed for convenience (as in the code) - (2n, m) or (2*hidden_size, attention_size)
, (2*16, 10) . Here they are stacked version of W_a and U_a. (because W_a \cdot s_{i-1} + U_a \cdot h_j is equivalent to W_a | U_a \cdot s_{i-1}| h_j)
when you dot product these matrices you get:
as in here:

- Then you apply tanh and get
activations
:
as in here:

- Then comes the second layer v_a. Here are the weights:
When you dot product activations
with this layer’s weights, you get “alignment scores”:

as in here:

Just to complete the lab here are the remaining calculations
5. Then you apply softmax to get the “attention weights” (variable weights
in the attention
function):

-
Then you just multiply (Hadamard product) “Encoder states” with these “attention weights” to get the weighted encoder states (weighted_scores
in the attention
function):
-
Lastly you sum these encoder states along axis 0 to get the “context”:
Cheers
2 Likes