C5 W4 A1 E3 - "ValueError: non-broadcastable output operand with shape (3,4) doesn't match the broadcast shape (1,3,4)"

Error I get:

ValueError: non-broadcastable output operand with shape (3,4) doesn't match the broadcast shape (1,3,4)

Result I get:

matmul_qk [[2. 3. 1. 1.]
 [2. 2. 2. 1.]
 [2. 2. 0. 1.]]
dk 4
scaled_attention_logits [[1.  1.5 0.5 0.5]
 [1.  1.  1.  0.5]
 [1.  1.  0.  0.5]]
mask [[[-0.e+00 -0.e+00 -1.e+09 -0.e+00]
  [-0.e+00 -0.e+00 -1.e+09 -0.e+00]
  [-0.e+00 -0.e+00 -1.e+09 -0.e+00]]]

Where problem seems to be:

{mentor edit: code removed}

In this line of code, the program is unable to add scaled_attention_logits and mask because they are of different dimensions ((3,4) and (1,3,4)) respectively

I am unsure about how I can fix this.

The way I calculate scaled_attention_logits is with:

{mentor edit: code removed}

How do I make scaled_attention_logits have dimensions (1,3,4)?

I decided to use np.squeeze() to turn the mask array into shape (3,4) instead and it works

mentor edit: code removed

If anyone thinks this is the wrong way to go please do tell me

The only difference between your code and mine is that I followed the instructions that said to use tf.matmul(…), not np.matmul(…)

1 Like