C5 W4 A1 E7 Why are we able to add shapes (2,3,4) and (1,3,512)

My code for this exercise works, but I’m confused about this addition operation:

        x += self.pos_encoding[:, :seq_len, :] 

If we print the shapes of x and self.pos_encoding[:, :seq_len, :] we get:

x.shape BEFORE addition (2, 3, 4)
pos_encoding[:, :seq_len, :].shape (1, 3, 512)
x.shape AFTER addition (2, 3, 4)

My reasoning is: in order for this addition to occur broadcasting needs to be happening in the background of the operation. So x.shape is stretched to (2,3,512) and pos_encoding[:, :seq_len, :].shape is stretched to (2, 3, 512) so that you can add them.

But then I realised that x.shape AFTER the addition has shape (2,3,4) so this broadcasting is not occuring, leaving me confused.

I then tried this code on the side:

a = np.ones((2,3,4))
print(a)
print('-----------------------')
b = np.ones((1,3,512))*3
print(b)
print('-----------------------')
print(a+b)

and got this result:

[[[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]

 [[1. 1. 1. 1.]
  [1. 1. 1. 1.]
  [1. 1. 1. 1.]]]
-----------------------
[[[3. 3. 3. ... 3. 3. 3.]
  [3. 3. 3. ... 3. 3. 3.]
  [3. 3. 3. ... 3. 3. 3.]]]
-----------------------
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-62-5262f54e85b0> in <module>
      5 print(b)
      6 print('-----------------------')
----> 7 print(a+b)

ValueError: operands could not be broadcast together with shapes (2,3,4) (1,3,512) 

What’s going on? How is this addition taking place if it’s not supposed to be able to happen?

Hi Jaime_Gonzales,

If I change the relevant code of ex. 7 to

    print("x.shape=",x.shape)
    print("self.pos_encoding[:, :seq_len, :].shape=",self.pos_encoding[:, :seq_len, :])
    x += self.pos_encoding[:, :seq_len, :]
    print(("x.shape_after=",x.shape))

I get the following output from the unit test:

x.shape= (2, 3, 4)
self.pos_encoding[:, :seq_len, :].shape= tf.Tensor(
[[[ 0. 1. 0. 1. ]
[ 0.84147096 0.5403023 0.00999983 0.99995 ]
[ 0.9092974 -0.41614684 0.01999867 0.9998 ]]], shape=(1, 3, 4), dtype=float32)
(‘x.shape_after=’, TensorShape([2, 3, 4]))
x.shape= (2, 3, 4)
self.pos_encoding[:, :seq_len, :].shape= tf.Tensor(
[[[ 0. 1. 0. 1. ]
[ 0.84147096 0.5403023 0.00999983 0.99995 ]
[ 0.9092974 -0.41614684 0.01999867 0.9998 ]]], shape=(1, 3, 4), dtype=float32)
(‘x.shape_after=’, TensorShape([2, 3, 4]))
x.shape= (2, 3, 4)
self.pos_encoding[:, :seq_len, :].shape= tf.Tensor(
[[[ 0. 1. 0. 1. ]
[ 0.84147096 0.5403023 0.00999983 0.99995 ]
[ 0.9092974 -0.41614684 0.01999867 0.9998 ]]], shape=(1, 3, 4), dtype=float32)
(‘x.shape_after=’, TensorShape([2, 3, 4]))

1 Like