C5_W4_A1_Transformer_Subclass_v1 - UNQ_C3

How do I calculate dk by scaling matmul_qk?

I’m trying to do this: dk = tf.keras.layers.experimental.preprocessing.Normalization()(matmul_qk)

But something is going wrong.

The equation is given in the instructions:


The “scaling” part I have circled in red.

Ya that part, I did get right. I’m struggling to understand what scaling means here. Is it like a min-max scaling on QK.T ? If so, is tf.keras.layers.experimental.preprocessing.Normalization() the right way to do it?

Scaling just means “divide by some value”.
No, don’t use Normalization(). That’s a totally different concept.