C5 W4 A1 Wrong Weights for scaled_dot_product_attention (excercise 3)

Weights differ from the expected weights (tried with all diff axis values)

The most likely errors are:

  • your value for the scaled_attention_logits are incorrect.
  • or your call to tf.keras.activations.softmax() is incorrect.

For the logits:
The value of dk is the number of rows in k. You can get that with np.shape(k).

You need the correct dk in order to compute the scaled_attention_logits and it’s also the axis= parameter in tf.keras.activations.softmax().