This error indicates that the shape of the mask is not the same as shape of the inputs (in the scaled_dot_product_attention
case the inputs are scaled_attention_logits
).
So the most probable place to look for a mistake is below the code comment:
# add the mask to the scaled tensor.
Here you would calculate the masked “inputs” for the softmax. (The hint suggests:
Reminder: The boolean mask parameter can be passed in as
none
or as either padding or look-ahead.
- Multiply (1. - mask) by -1e9 before adding it to the scaled attention logits.
)
The other probable mistake could be how the softmax is calculated (TensorFlow intricacies…). Are you using:
tf.keras.activations.softmax(..)
for the attention_weights
computation?
Let me know if you found the problem.
Cheers