C5W4 EncoderLayer test Error


I am having a hard time understanding and getting the code work. After trying several times, below is the current error message I have got, which seems to be a tensorflow version problem. Has anyone got the same problem? Thanks a lot!

AttributeError Traceback (most recent call last)
----> 2 EncoderLayer_test(EncoderLayer)

~/work/W4A1/public_tests.py in EncoderLayer_test(target)
88 assert tf.is_tensor(encoded), “Wrong type. Output must be a tensor”
—> 89 assert tuple(tf.shape(encoded).numpy()) == (1, q.shape[1], q.shape[2]), f"Wrong shape. We expected ((1, {q.shape[1]}, {q.shape[2]}))"
91 assert np.allclose(encoded.numpy(),

AttributeError: ‘KerasTensor’ object has no attribute ‘numpy’

Are you running the notebook in the Coursera labs environment, or did you install it on your own local or cloud system?

I am running in the Coursera labs environment.

This is interesting output.
In this EncoderLayer exercise, output from each function (like self.mha, self.layernorm1, …) is tf.Tensor.
But, if we look at your output, it is KerasTensor, which is typical output from a Keras layer. KerasTensor does not have an attribute numpy as we see in out traceback.

As the next step, my recommendation is to check (print) the output of all steps in EncoderLayer and see how the output is changed from tf.Tensor to KerasTensor. There are only 5 functions that you call. So, adding 5 print statements should guide you what you should do next.
And, please share the result with us.

Um… Is this the correct way to call self.mha?

    attn_output = self.mha(query = query, value = value, training = training, attention_mask = mask)

And how do we initialize query and value?

This is for Encoder. So, the role of Multi-head-attention is a self-attention.
If you look at tensorflow document, MHA takes (query, value, key, attention_mask, return_attention_scores, training) as parameters. And, as the guide says, we can use default values for “return_attention_scores” and “training”.
So, what we need is “query”, “value”, “key”, and “attention_mask”. And, “attention_mask” is the easiest one, which is passed as “mask”, as you wrote.
Then, in the case of self-attention, we use “x” to all “query”, “value” and “key” to create a self-attention weight. That’s the key point.

Hope this helps.

I see. So we set query = x, value = x, key = x, and training = training. This seems to work. Thanks!