My code for this exercise works, but I’m confused about this addition operation:
x += self.pos_encoding[:, :seq_len, :]
If we print the shapes of x
and self.pos_encoding[:, :seq_len, :]
we get:
x.shape BEFORE addition (2, 3, 4)
pos_encoding[:, :seq_len, :].shape (1, 3, 512)
x.shape AFTER addition (2, 3, 4)
My reasoning is: in order for this addition to occur broadcasting needs to be happening in the background of the operation. So x.shape
is stretched to (2,3,512) and pos_encoding[:, :seq_len, :].shape
is stretched to (2, 3, 512) so that you can add them.
But then I realised that x.shape AFTER the addition has shape (2,3,4) so this broadcasting is not occuring, leaving me confused.
I then tried this code on the side:
a = np.ones((2,3,4))
print(a)
print('-----------------------')
b = np.ones((1,3,512))*3
print(b)
print('-----------------------')
print(a+b)
and got this result:
[[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]
[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]]
-----------------------
[[[3. 3. 3. ... 3. 3. 3.]
[3. 3. 3. ... 3. 3. 3.]
[3. 3. 3. ... 3. 3. 3.]]]
-----------------------
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-62-5262f54e85b0> in <module>
5 print(b)
6 print('-----------------------')
----> 7 print(a+b)
ValueError: operands could not be broadcast together with shapes (2,3,4) (1,3,512)
What’s going on? How is this addition taking place if it’s not supposed to be able to happen?