Hi All,
To get better understanding how Attention model works, I tried to run assignment code in my local environment, and run into few issues.
One of them is with encoder output dimensions in exercise code:
Test your function!
key_dim = 12
n_heads = 16
decoderLayer_test = DecoderLayer(embedding_dim=key_dim, num_heads=n_heads, fully_connected_dim=32)
q = np.ones((1, 15, key_dim))
encoder_test_output = tf.convert_to_tensor(np.random.rand(1, 7, 8))
In that code, last Q dimension is 12, but encoder_test_output , that is passed to decoder as K and V values, has last dimension 8.
This generates error in Tensorflow. Documentation specifically states that last dimensions of Q, K, V must be the same, but that code runs OK in lecture notebook.
Am I missing something?
Thank you!
hi @albertumyarov
I don’t think the issue would just lie in decoder test cell as numpy.random.rand() function creates an array of specified shapes fills it with random values and generates random numbers with Numpy
So if your Q as has taken the embedding dimension value as 12 but K and V hasn’t got the same value as you used the q value to be np.ones(1, 15, key_dim) this create a new array with assigned shape and datatype with every element set to 1.
fully connected dimension is 32, so check using the #print does the environment value for each step matches in your local environment as the randomness of value created in decoder test in your local environment may differ from the course provided environment.
Hope this helps!!!
Also whenever sharing such query, make sure not to post any of the codes but what you could do is provide a screenshot comparison of outputs in your environment and coursera environment without sharing a codes in the screenshot for better understanding and response.
Regards
DP
Code that I mentioned is not my code, but test cell provided by course.
As you can see encoder_test_output has hardcoded dimension (1,7,8).
That tensor is passed to decoderLayer_test function and later as K,V parameters to MultiHeadAttension . And that is where I get error.
I kind of understand why I get error in my environment, and, BTW, changing 8 to 12 makes it work perfectly, I do not understand why it works in course notebook.
please share the screenshot of the error you are getting in your environment.
in decoderlayer call(), padding mask is assigned as None, and it seems to causing the error as in the self attention, attention mask is as padding mask. make sure you are passing the values correctly.
actually you can check by adding print statement to each layer and compare with course provided print statement of each statement first in self.attention, check where the values varies, as it is creating query shape and value shape to have different value.
Did some more digging. Exception is raised by Build method of MultiheadAttention class. That code was recently changed in repo. Looks like I use newest version, while Course notebook uses older version. I do not really understand how would it work if shapes are different, I guess that was the reason why that check was added. Most likely assignment code will fail ones tensorflow library updated to latest version. I attached screenshot of corresponding change in GitHub. Thank you!
1 Like
i thought you are using the latest version of notebook!!!
yes there were some changes done not very recently but probably July or August.
Thank you @albertumyarov for bringing this up so other learners can learn too.