C4W1_Assignment - Translator

I’m getting the following error when running the w1_unittest.test_translator() cell
All the previous tests are passing and checking the code, it looks correct.
The error seems to indicate there is a problem with attention layer in the Decoder, but i can’t seem to work out what the issue is.
Reviewing the course videos again (especially “NMT Model with Attention”), i noted that the input to the Attention layer should be the hidden states from the previous layers but not sure if that is the issue - i can see how to obtain the hidden state from the pre-attention decoder but not for the encoder output. So not sure that is the issue.

Any suggestions welcome.

Error Message is:

Cell In[159], line 69, in Decoder.call(self, context, target, state, return_state)
66 x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=state)
68 # Perform cross attention between the context and the output of the LSTM (in that order)
—> 69 x = self.attention(context, x)
71 # Do a pass through the post attention LSTM
72 x = self.post_attention_rnn(x)

Cell In[140], line 39, in CrossAttention.call(self, context, target)
25 “”“Forward pass of this layer
26
27 Args:
(…)
32 tf.Tensor: Cross attention between context and target
33 “””
34 ### START CODE HERE ###
35
36 # Call the MH attention by passing in the query and value
37 # For this case the query should be the translation and the value the encoded sentence to translate
38 # Hint: Check the call arguments of MultiHeadAttention in the docs
—> 39 attn_output = self.mha(
40 query=target,
41 value=context
42 )
44 ### END CODE HERE ###
46 x = self.add([target, attn_output])

InvalidArgumentError: Exception encountered when calling layer ‘key’ (type EinsumDense).

{{function_node _wrapped__Einsum_N_2_device/job:localhost/replica:0/task:0/device:GPU:0}} Expected dimension 8 at axis 0 of the input shaped [256,1,12000] but got dimension 256 [Op:Einsum] name:

Call arguments received by layer ‘key’ (type EinsumDense):
• inputs=tf.Tensor(shape=(4, 7, 8), dtype=float32)

Hi @Bill_Matthews

Clearly the problem is because the dimensions do not match. Make sure your previous outputs match the expected outputs (not only the tests).

Also, make sure that you return state in your pre attention rnn (because it is needed for inference) but it is not needed in the post attention rnn.

It’s probably not, since the TensorFlow handles the states for you if you correctly specified parameters.

You encoder should have return_sequences=True, that way it returns all the hidden states (of every step) in the output. Same thing for decoder’s pre and post attention rnns.
If your encoder and decoder outputs match the expected outputs, then the Translator should have no problem, since you’re using them.

Let me know if that make sense
Cheers

Thanks for the response - I could see it was a mismatch with dimensions but couldn’t understand how. The issue turned out to be a typo in the Translator call method. I’d used “decoder(…)” instead of “self.decoder(…)” so it was picking up an earlier instance of the decoder where the embedding layer was set to 256. Once corrected it ran fine.

Thanks again for your input.

2 Likes