The confusion comes from a unique vector handling in numpy, I think.
I changed the shape of x, y, and a from 1-dimensional array of size (vocab_size)
If you initialize x
like np.zeros((vocab_size,))
then, x
is a vector. You see its shape is (vocab_size,)
. It’s a column vector and its size is “vocab_size
”. If you initialize with np.zeros((vocab_size,1))
then, it is a 2D array. This difference causes several problems. I wrote some details in this thread.
As all parameters passed to this function is 2D array. So, your approach is right. Reshaping vectors into 2-D array is the better way to avoid unnecessary broadcasting.