Hi, I have tried all the solutions applied in the two threads link1 and link2 but none worked.
to be specific I did following:
applied -1e9 to the mask before adding up to scaled tensor.
used tf.keras.activations.softmax(scaled_attention_logits, axis=-1)
used tf.cast(tf.shape(k)[-1], tf.float32) to get dk
used tf.matmul(q, k, transpose_b=True)
but I am still getting error: AssertionError: Wrong masked weights
I also refreshed my notebook but same issue happened again.
Since I can’t post my full code so I am posting parts only. ( I can send my code in message if you want to have a look at it)
Reminder : … . Multiply (1. - mask) by -1e9 before applying the softmax.
Adding the wrong quantity to the scaled_attention_logits might be the problem.