C5 W4 A1 E3 - "ValueError: non-broadcastable output operand with shape (3,4) doesn't match the broadcast shape (1,3,4)"

Jaime_Gonzalez · February 24, 2022, 4:51pm

Error I get:

ValueError: non-broadcastable output operand with shape (3,4) doesn't match the broadcast shape (1,3,4)

Result I get:

matmul_qk [[2. 3. 1. 1.]
 [2. 2. 2. 1.]
 [2. 2. 0. 1.]]
dk 4
scaled_attention_logits [[1.  1.5 0.5 0.5]
 [1.  1.  1.  0.5]
 [1.  1.  0.  0.5]]
mask [[[-0.e+00 -0.e+00 -1.e+09 -0.e+00]
  [-0.e+00 -0.e+00 -1.e+09 -0.e+00]
  [-0.e+00 -0.e+00 -1.e+09 -0.e+00]]]

Where problem seems to be:

{mentor edit: code removed}

In this line of code, the program is unable to add scaled_attention_logits and mask because they are of different dimensions ((3,4) and (1,3,4)) respectively

I am unsure about how I can fix this.

The way I calculate scaled_attention_logits is with:

{mentor edit: code removed}

How do I make scaled_attention_logits have dimensions (1,3,4)?

Jaime_Gonzalez · February 24, 2022, 5:03pm

I decided to use np.squeeze() to turn the mask array into shape (3,4) instead and it works

mentor edit: code removed

If anyone thinks this is the wrong way to go please do tell me

TMosh · February 24, 2022, 7:19pm

The only difference between your code and mine is that I followed the instructions that said to use tf.matmul(…), not np.matmul(…)

Topic		Replies	Views
C5 W4 A1 E3 help me I don't understand the dimensions of scaled_dot_product_attention Sequence Models week-4	3	263	February 5, 2024
C5W4A1E3 Transformers Architecture with TensorFlow scaled_dot_product_attention- Sequence Models	5	388	September 22, 2023
Week 4 A1 problem with scaled_dot_product_attention Sequence Models week-4	6	58	September 6, 2024
C5_W4_A1_Transformer_Subclass_v1 Scaled_dor_product_attention Sequence Models	11	809	August 23, 2021
Week 4 Scaled Dot Product Attention Sequence Models	10	801	October 31, 2021

C5 W4 A1 E3 - "ValueError: non-broadcastable output operand with shape (3,4) doesn't match the broadcast shape (1,3,4)"

Related topics