thank you for your answer. But how can the second dimension be 100?
First dimension agrees in u and v, and second dimension is not existing in u and 1 in v.
How can this be broadcasted to 100 in the second dimension?
Yes, Mubsi’s guess must be what’s happening. The type coercion rules turn the 1D vector with 100 elements into a 1 x 100 2D row vector and then the broadcasting happens to make the addition work and you end up with 100 x 100.
Maybe it’s better just to avoid nasty surprises by not playing that game of mixing 1D and 2D objects.
Here’s a better experiment to really see what happens:
u = np.ones(10)
v = np.ones((10, 1)) * 2
uplusv = u + v
print(f"u.shape = {u.shape}")
print(f"u = {u}")
print(f"v.shape = {v.shape}")
print(f"v = {v}")
print(f"uplusv.shape = {uplusv.shape}")
print(f"uplusv = {uplusv}")
yes, that makes senses, still I feel this is rather counter-intuitive broadcasting behavior of Numpy .
I remember, somewhere in this specialization Andrew Ng warned of stranger things happening when working with 1D objects.
I should have listened to him.
I don’t disagree that it seems like strange behavior. Seems like in the “type coercion” step where they convert the 1D vector to a 2D object, it would have made more sense to match the shape of the other object, but I’ll bet the way the algorithm works is that it only knows that it needs a 2D object at that point and doesn’t really compare to the other operand’s shape. But once you have a row vector and a column vector and you do an “elementwise” operation between them, what we’re seeing there is the definition of “broadcasting”.
My background is in math and I found the whole idea of broadcasting offensive when I first ran into. It thought it should simply throw an error if you did an elementwise operation with objects of different shapes, meaning that it was the programmer’s job to convert the vector to a matrix. E.g. by manually doing something like this:
But once you get used to the idea, it does save you a lot of work. E.g. it would be pain to have to do that expansion manually when we do forward propagation:
A = W \cdot X + b
So we can all agree on the bottom line that we should follow Prof Ng’s advice when he recommended against mixing 1D objects into our computations here.