Week 2 assignment. Encoder dimensions

albertumyarov · October 7, 2024, 6:23pm

Hi All,

To get better understanding how Attention model works, I tried to run assignment code in my local environment, and run into few issues.
One of them is with encoder output dimensions in exercise code:

Test your function!

key_dim = 12
n_heads = 16

decoderLayer_test = DecoderLayer(embedding_dim=key_dim, num_heads=n_heads, fully_connected_dim=32)

q = np.ones((1, 15, key_dim))
encoder_test_output = tf.convert_to_tensor(np.random.rand(1, 7, 8))

In that code, last Q dimension is 12, but encoder_test_output , that is passed to decoder as K and V values, has last dimension 8.

This generates error in Tensorflow. Documentation specifically states that last dimensions of Q, K, V must be the same, but that code runs OK in lecture notebook.

Am I missing something?

Thank you!

Deepti_Prasad · October 7, 2024, 6:43pm

hi @albertumyarov

I don’t think the issue would just lie in decoder test cell as numpy.random.rand() function creates an array of specified shapes fills it with random values and generates random numbers with Numpy

So if your Q as has taken the embedding dimension value as 12 but K and V hasn’t got the same value as you used the q value to be np.ones(1, 15, key_dim) this create a new array with assigned shape and datatype with every element set to 1.
fully connected dimension is 32, so check using the #print does the environment value for each step matches in your local environment as the randomness of value created in decoder test in your local environment may differ from the course provided environment.

Hope this helps!!!

Also whenever sharing such query, make sure not to post any of the codes but what you could do is provide a screenshot comparison of outputs in your environment and coursera environment without sharing a codes in the screenshot for better understanding and response.

Regards
DP

albertumyarov · October 7, 2024, 7:05pm

Code that I mentioned is not my code, but test cell provided by course.
As you can see encoder_test_output has hardcoded dimension (1,7,8).
That tensor is passed to decoderLayer_test function and later as K,V parameters to MultiHeadAttension . And that is where I get error.

I kind of understand why I get error in my environment, and, BTW, changing 8 to 12 makes it work perfectly, I do not understand why it works in course notebook.

Deepti_Prasad · October 7, 2024, 7:16pm

please share the screenshot of the error you are getting in your environment.

albertumyarov · October 8, 2024, 12:39am

Deepti_Prasad · October 8, 2024, 1:38pm

in decoderlayer call(), padding mask is assigned as None, and it seems to causing the error as in the self attention, attention mask is as padding mask. make sure you are passing the values correctly.

actually you can check by adding print statement to each layer and compare with course provided print statement of each statement first in self.attention, check where the values varies, as it is creating query shape and value shape to have different value.

albertumyarov · October 8, 2024, 5:54pm

Did some more digging. Exception is raised by Build method of MultiheadAttention class. That code was recently changed in repo. Looks like I use newest version, while Course notebook uses older version. I do not really understand how would it work if shapes are different, I guess that was the reason why that check was added. Most likely assignment code will fail ones tensorflow library updated to latest version. I attached screenshot of corresponding change in GitHub. Thank you!

Deepti_Prasad · October 8, 2024, 6:06pm

i thought you are using the latest version of notebook!!!

yes there were some changes done not very recently but probably July or August.

Thank you @albertumyarov for bringing this up so other learners can learn too.

Topic		Replies	Views
Transformer: dimensions of encoder output and decoder Q matrix Sequence Models coursera-platform	1	582	April 21, 2022
Wrong comments in the assignment of C4W2 NLP with Attention Models	3	77	June 19, 2024
Natural Language Processing Specialization - C4W2_Assignment - Transformer Summarizer 14-June-2024 version NLP with Attention Models	1	253	June 14, 2024
NMT with attention : Failed test case: w1_unittest.test_encoder(Encoder) NLP with Attention Models week-1	9	27	December 19, 2024
C4W2 programming assignment: small question about Encoder layer input x dimension NLP with Attention Models week-2	12	330	March 18, 2024

Week 2 assignment. Encoder dimensions

Test your function!

Related topics