Dimension of weight matrices

Hello,

In C4_W1_Ungraded_Lab_1_Basic_Attention, it says the dimensions of W_a, U_a are (n x m), where n: hidden state size and m: layer size in alignment network.

I’m confused as to what “n” and “m” are exactly. Could you please explain with the help of an actual example ?

Thanks
Ani

Hi @Anivader

I’ve reproduced this lab in an Excel sheet, when I was learning. I can share it if it’ll help:

  1. Here are the inputs in the aligment function:

as in this part:
image

  1. Then there is a Linear transformation:
    here are the weights (Note: that they’re already transposed for convenience (as in the code) - (2n, m) or (2*hidden_size, attention_size), (2*16, 10) . Here they are stacked version of W_a and U_a. (because W_a \cdot s_{i-1} + U_a \cdot h_j is equivalent to W_a | U_a \cdot s_{i-1}| h_j)

when you dot product these matrices you get:

as in here:
image

  1. Then you apply tanh and get activations:

as in here:
image

  1. Then comes the second layer v_a. Here are the weights:
    image

When you dot product activations with this layer’s weights, you get “alignment scores”:
image

as in here:
image


Just to complete the lab here are the remaining calculations
5. Then you apply softmax to get the “attention weights” (variable weights in the attention function):
image

  1. Then you just multiply (Hadamard product) “Encoder states” with these “attention weights” to get the weighted encoder states (weighted_scores in the attention function):

  2. Lastly you sum these encoder states along axis 0 to get the “context”:

Cheers

2 Likes