Vidush
July 28, 2021, 2:27pm
1
My model is built as follows,

I start with a matmul of the query with the key.

Then I take the diminsion of k as suggested in notes before taking its square root.

But because the sqrt is of shape (2,) while qk is of shape (3,4) and k has shape (4,4), something seems to be going wrong and I get

InvalidArgumentError: Incompatible shapes: [3,4] vs. [2] [Op:RealDiv]

when I calculate `sacled_attention_logits`

TMosh
July 28, 2021, 7:01pm
2
It looks like your forgot to transpose the ‘k’ matrix.

TMosh
July 28, 2021, 7:02pm
3
Also, ‘dk’ must be a scalar. np.shape(k) will not return a scalar.
You also did not apply the mask correctly. Please read the instructions closely.

Vidush
July 28, 2021, 9:01pm
4
Hi @TMosh

Thank you for the help, I didn’t realise I had made those small, silly mistakes. I changed them around and read the answer on “order of operands” and managed to fix my function