Week 1 Dinosaur Island Sample Method

This is my first time posting on discourse so I apologize in advance if I’m not requesting help as expected!

For some reason, the shape of my a vector isn’t as expected. I’m getting a 100 x 100 vector for a when I presume it should be (27,1) or something of that sort. Since I can’t share my code, I’ll describe the shapes of each operand below:
Wax @ x → (100,)
Waa @ a_prev → (100,)
(Wax @ x + Waa @ a_prev) → (100,)
(Wax @ x + Waa @ a_prev + b) → (100,100)
b → (100, 1)

a → (100,100)

Why is it that b is increasing the shape? I thought if we’re broadcasting and adding b, a column vector, it shouldn’t be augmenting the shape. Some help would be appreciated. Thanks!

Welcome to the community.

I think this is a good question for understanding how numpy handles vector and matrix (and ndarray), which is, I think, unique.

And, this is related to “How do you control dimensions through your project”.

Let’s start with numpy basics.

a = np.ones((3,))
print(type(a)) ; print(a.ndim) ; print(a.shape)
b = np.ones((3,1))
print(type(b)) ; print(b.ndim) ; print(b.shape)
c = a+b
print(type(c)) ; print(c.ndim) ; print(c.shape)

Then, the results are;

a : <class ‘numpy.ndarray’>, dim=1, shape=(3,)
b: <class "numpy.ndarray’>, dim=2, shape=(3,1)
c: <class "numpy.ndarray’>, dim=2, shape=(3,3)

c” does not become (3,1), but becomes (3,3). This is consistent to what you see here.

(Wax @ x + Waa @ a_prev + b) → (100,100)

Different from other major tools like Matlab, as I told, numpy handles “vector” differently. (Others are simple… It is handled as 1D matrix like (m,1))

The vector (m,) means,… a row vector with the size of “m”. So, it can be seen like (1,m), but, not a matrix, since it can not be “transposed”. If we try to transpose it like a.T, but the result is same shape.

In your case. (Wax@x).shape = (100,)…
This also depends on the definition of x. If you define x using (vocab_size,), then, that’s the result. If you explicitly define x using (vocab_size,1), then, the result of (Wax@x).shape = (100,1).

(Wax @ x + Waa @ a_prev + b) → (100,100)

This is not expected, but should happen. Both “Wax @ x” and “Waa @ a_prev” in you case, are (100,). But, given “b” is a matrix, (100,1). Then, the result is just like what I showed. It’s (100,100), since it’s like (1,100) + (100,1). In this case, by a broadcasting function of numpy, it becomes (100,100) unexpectedly.

So, all depend on how you want to control the dimension.
As all entries in a dictionary “parameters” like Wax, Waa, Wya, b, by, are Matrix (2D array), then, I prefer to control everything as 2D array not a vector. But, it’s up-to you.

In any cases, we need to reshape either (b/by) or (y) into a vector from 2D array with using either ravel() or flatten().

If you want to go with a vector for local variables like “x”, “a_prev”, etc, then, you may want to transform “b” and “by” from 2D array to a vector in the early phase.
If you want to control as 2D arrays, then, you just need to transform “y” when, you pass it for np.random.choice().
Of course, depending to your choice, the shape to initialize “x” and “a_prev” will be different. (like (vocab_size,) or (vocab_size,1)).

Hope this clarifies.

1 Like

Thank you so much for this!

To clarify, explicitly saying x = np.zeros((vocab_size, 1)) would be going down the 2D array route, whereas simply np.zeros((vocab_size)) is the vector route? Also, I just need to remain consistent between the two to avoid the row vector added to a column array?

Thanks

To clarify, explicitly saying x = np.zeros((vocab_size, 1)) would be going down the 2D array route, whereas simply np.zeros((vocab_size)) is the vector route?

Yes. that’s right.

Also, I just need to remain consistent between the two to avoid the row vector added to a column array?

Right. The fact is, all variables in “parameters” dictionary is 2D-array. You need to select either way.

And, your dimension analysis is the right way for the problem determination. Keep going !

1 Like