How do I calculate dk by scaling matmul_qk?
I’m trying to do this: dk = tf.keras.layers.experimental.preprocessing.Normalization()(matmul_qk)
But something is going wrong.
How do I calculate dk by scaling matmul_qk?
I’m trying to do this: dk = tf.keras.layers.experimental.preprocessing.Normalization()(matmul_qk)
But something is going wrong.
The equation is given in the instructions:
The “scaling” part I have circled in red.
Ya that part, I did get right. I’m struggling to understand what scaling means here. Is it like a min-max scaling on QK.T ? If so, is tf.keras.layers.experimental.preprocessing.Normalization()
the right way to do it?
Scaling just means “divide by some value”.
No, don’t use Normalization(). That’s a totally different concept.