Model expects tensor output

My model is built as follows,

I start with a matmul of the query with the key.

Then I take the diminsion of k as suggested in notes before taking its square root.

But because the sqrt is of shape (2,) while qk is of shape (3,4) and k has shape (4,4), something seems to be going wrong and I get

InvalidArgumentError: Incompatible shapes: [3,4] vs. [2] [Op:RealDiv]

when I calculate sacled_attention_logits

It looks like your forgot to transpose the ‘k’ matrix.

Also, ‘dk’ must be a scalar. np.shape(k) will not return a scalar.
You also did not apply the mask correctly. Please read the instructions closely.

Hi @TMosh

Thank you for the help, I didn’t realise I had made those small, silly mistakes. I changed them around and read the answer on “order of operands” and managed to fix my function