Vidush
July 28, 2021, 2:27pm
1
My model is built as follows,
I start with a matmul of the query with the key.
Then I take the diminsion of k as suggested in notes before taking its square root.
But because the sqrt is of shape (2,) while qk is of shape (3,4) and k has shape (4,4), something seems to be going wrong and I get
InvalidArgumentError: Incompatible shapes: [3,4] vs. [2] [Op:RealDiv]
when I calculate sacled_attention_logits
TMosh
July 28, 2021, 7:01pm
2
It looks like your forgot to transpose the ‘k’ matrix.
TMosh
July 28, 2021, 7:02pm
3
Also, ‘dk’ must be a scalar. np.shape(k) will not return a scalar.
You also did not apply the mask correctly. Please read the instructions closely.
Vidush
July 28, 2021, 9:01pm
4
Hi @TMosh
Thank you for the help, I didn’t realise I had made those small, silly mistakes. I changed them around and read the answer on “order of operands” and managed to fix my function