I’m getting the following error when running the w1_unittest.test_translator() cell
All the previous tests are passing and checking the code, it looks correct.
The error seems to indicate there is a problem with attention layer in the Decoder, but i can’t seem to work out what the issue is.
Reviewing the course videos again (especially “NMT Model with Attention”), i noted that the input to the Attention layer should be the hidden states from the previous layers but not sure if that is the issue - i can see how to obtain the hidden state from the pre-attention decoder but not for the encoder output. So not sure that is the issue.
Any suggestions welcome.
Error Message is:
Cell In[159], line 69, in Decoder.call(self, context, target, state, return_state)
66 x, hidden_state, cell_state = self.pre_attention_rnn(x, initial_state=state)
68 # Perform cross attention between the context and the output of the LSTM (in that order)
—> 69 x = self.attention(context, x)
71 # Do a pass through the post attention LSTM
72 x = self.post_attention_rnn(x)
Cell In[140], line 39, in CrossAttention.call(self, context, target)
25 “”“Forward pass of this layer
26
27 Args:
(…)
32 tf.Tensor: Cross attention between context and target
33 “””
34 ### START CODE HERE ###
35
36 # Call the MH attention by passing in the query and value
37 # For this case the query should be the translation and the value the encoded sentence to translate
38 # Hint: Check the call arguments of MultiHeadAttention in the docs
—> 39 attn_output = self.mha(
40 query=target,
41 value=context
42 )
44 ### END CODE HERE ###
46 x = self.add([target, attn_output])
InvalidArgumentError: Exception encountered when calling layer ‘key’ (type EinsumDense).
{{function_node _wrapped__Einsum_N_2_device/job:localhost/replica:0/task:0/device:GPU:0}} Expected dimension 8 at axis 0 of the input shaped [256,1,12000] but got dimension 256 [Op:Einsum] name:
Call arguments received by layer ‘key’ (type EinsumDense):
• inputs=tf.Tensor(shape=(4, 7, 8), dtype=float32)