Welcome to the community.
I think this is a good question for understanding how numpy handles vector and matrix (and ndarray), which is, I think, unique.
And, this is related to “How do you control dimensions through your project”.
Let’s start with numpy basics.
a = np.ones((3,))
print(type(a)) ; print(a.ndim) ; print(a.shape)
b = np.ones((3,1))
print(type(b)) ; print(b.ndim) ; print(b.shape)
c = a+b
print(type(c)) ; print(c.ndim) ; print(c.shape)
Then, the results are;
a : <class ‘numpy.ndarray’>, dim=1, shape=(3,)
b: <class "numpy.ndarray’>, dim=2, shape=(3,1)
c: <class "numpy.ndarray’>, dim=2, shape=(3,3)
“c” does not become (3,1), but becomes (3,3). This is consistent to what you see here.
(Wax @ x + Waa @ a_prev + b) → (100,100)
Different from other major tools like Matlab, as I told, numpy handles “vector” differently. (Others are simple… It is handled as 1D matrix like (m,1))
The vector (m,) means,… a row vector with the size of “m”. So, it can be seen like (1,m), but, not a matrix, since it can not be “transposed”. If we try to transpose it like a.T, but the result is same shape.
In your case. (Wax@x).shape = (100,)…
This also depends on the definition of x. If you define x using (vocab_size,), then, that’s the result. If you explicitly define x using (vocab_size,1), then, the result of (Wax@x).shape = (100,1).
(Wax @ x + Waa @ a_prev + b) → (100,100)
This is not expected, but should happen. Both “Wax @ x” and “Waa @ a_prev” in you case, are (100,). But, given “b” is a matrix, (100,1). Then, the result is just like what I showed. It’s (100,100), since it’s like (1,100) + (100,1). In this case, by a broadcasting function of numpy, it becomes (100,100) unexpectedly.
So, all depend on how you want to control the dimension.
As all entries in a dictionary “parameters” like Wax, Waa, Wya, b, by, are Matrix (2D array), then, I prefer to control everything as 2D array not a vector. But, it’s up-to you.
In any cases, we need to reshape either (b/by) or (y) into a vector from 2D array with using either ravel() or flatten().
If you want to go with a vector for local variables like “x”, “a_prev”, etc, then, you may want to transform “b” and “by” from 2D array to a vector in the early phase.
If you want to control as 2D arrays, then, you just need to transform “y” when, you pass it for np.random.choice().
Of course, depending to your choice, the shape to initialize “x” and “a_prev” will be different. (like (vocab_size,) or (vocab_size,1)).
Hope this clarifies.