I have implemented the class properly as far as I understand, but perhaps there is some issue in some inputs that I am passing to the layers.
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
Cell In[94], line 11
8 encoder_test_output = tf.convert_to_tensor(np.random.rand(1, 7, 8))
9 look_ahead_mask = create_look_ahead_mask(q.shape[1])
---> 11 out, attn_w_b1, attn_w_b2 = decoderLayer_test(q, encoder_test_output, False, look_ahead_mask, None)
13 print(f"Using embedding_dim={key_dim} and num_heads={n_heads}:\n")
14 print(f"q has shape:{q.shape}")
File /usr/local/lib/python3.8/dist-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
Cell In[93], line 67, in DecoderLayer.call(self, x, enc_output, training, look_ahead_mask, padding_mask)
61 Q1 = self.layernorm1(x+mult_attn_out1)
63 # BLOCK 2
64 # calculate self-attention using the Q from the first block and K and V from the encoder output.
65 # Dropout will be applied during training
66 # Return attention scores as attn_weights_block2 (~1 line)
---> 67 mult_attn_out2, attn_weights_block2 = scaled_dot_product_attention(Q1, enc_output, enc_output, padding_mask)
69 # # apply layer normalization (layernorm2) to the sum of the attention output and the Q from the first block (~1 line)
70 mult_attn_out2 = self.layernorm2(Q1+mult_attn_out2)
Cell In[65], line 23, in scaled_dot_product_attention(q, k, v, mask)
3 """
4 Calculate the attention weights.
5 q, k, v must have matching leading dimensions.
(...)
18 output -- attention_weights
19 """
20 ### START CODE HERE ###
21
22 # Multiply q and k transposed.
---> 23 matmul_qk = tf.matmul(q, k, transpose_b=True)
25 # scale matmul_qk with the square root of dk
26 dk = tf.cast(len(k), tf.float32)
InvalidArgumentError: Exception encountered when calling layer 'decoder_layer_11' (type DecoderLayer).
cannot compute BatchMatMulV2 as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:BatchMatMulV2] name:
Call arguments received by layer 'decoder_layer_11' (type DecoderLayer):
• x=tf.Tensor(shape=(1, 15, 12), dtype=float32)
• enc_output=tf.Tensor(shape=(1, 7, 8), dtype=float64)
• training=False
• look_ahead_mask=tf.Tensor(shape=(1, 15, 15), dtype=float32)
• padding_mask=None
This is what I get, and I am unable to figure out what is causing this. Thanks in advance for any help.