Error I get:
ValueError: non-broadcastable output operand with shape (3,4) doesn't match the broadcast shape (1,3,4)
Result I get:
matmul_qk [[2. 3. 1. 1.]
[2. 2. 2. 1.]
[2. 2. 0. 1.]]
dk 4
scaled_attention_logits [[1. 1.5 0.5 0.5]
[1. 1. 1. 0.5]
[1. 1. 0. 0.5]]
mask [[[-0.e+00 -0.e+00 -1.e+09 -0.e+00]
[-0.e+00 -0.e+00 -1.e+09 -0.e+00]
[-0.e+00 -0.e+00 -1.e+09 -0.e+00]]]
Where problem seems to be:
{mentor edit: code removed}
In this line of code, the program is unable to add scaled_attention_logits
and mask
because they are of different dimensions ((3,4) and (1,3,4)) respectively
I am unsure about how I can fix this.
The way I calculate scaled_attention_logits
is with:
{mentor edit: code removed}
How do I make scaled_attention_logits
have dimensions (1,3,4)?