How do I calculate dk by scaling matmul_qk?

I’m trying to do this: `dk = tf.keras.layers.experimental.preprocessing.Normalization()(matmul_qk)`

But something is going wrong.

How do I calculate dk by scaling matmul_qk?

I’m trying to do this: `dk = tf.keras.layers.experimental.preprocessing.Normalization()(matmul_qk)`

But something is going wrong.

The equation is given in the instructions:

The “scaling” part I have circled in red.

Ya that part, I did get right. I’m struggling to understand what scaling means here. Is it like a min-max scaling on QK.T ? If so, is `tf.keras.layers.experimental.preprocessing.Normalization()`

the right way to do it?

Scaling just means “divide by some value”.

No, don’t use Normalization(). That’s a totally different concept.