Hi

I am in the scaled_dot_product_attention() and i am getting error as wrong masked weights

Below is the code for the function

Let me know how to proceed

matmul_qk = tf.matmul(q, k, transpose_b = True) # (…, seq_len_q, seq_len_k)

```
# scale matmul_qk
dk = tf.cast(tf.shape(k)[-1], tf.float32)
scaled_attention_logits = matmul_qk / tf.math.sqrt(dk)
# add the mask to the scaled tensor.
if mask is not None: # Don't replace this None
scaled_attention_logits += (mask * -1e9)
# softmax is normalized on the last axis (seq_len_k) so that the scores
# add up to 1.
attention_weights = tf.nn.softmax(scaled_attention_logits, axis = -1) # (..., seq_len_q, seq_len_k)
output = tf.matmul(attention_weights, v) # (..., seq_len_q, depth_v)
```