Hello, I’m having trouble sampling using np.random.choice. It says to use the distribution y, which it admits is 2D. I’ve noticed the columns of this array sum to 1, and the reason it’s 2D is because when you add the bias ‘b’ to ‘a,’ ‘b’ is a (100,100) array so the other 2 1D-arrays you’re summing broadcast onto it. I have no idea how to determine if I’m supposed to choose a specific column from b…or if I’m supposed to flatten it somehow. How am I supposed to extract a 1D vector/array from the 2D array to be used as p for np.random.choice?
You are just sampling an index in the range of the vocabulary size. The input probability distribution is the softmax output y. If you print the shape of that, you’ll find it’s 27 x 1. Not surprising since 27 is the vocabulary size, right? They gave you the hint of using numpy ravel to unroll that into a 1D array in the instructions and even show you a demo of how to use it. It should work just as well if you want to use reshape.
The problem is my shape of y is not (27,1) and is instead (27,100)
Sorry if this sounds like I’m just stating the obvious, but that means there is something wrong with your code. The next step in debugging is to figure out why the dimension of y turns out that way.
I added print statements to show the various dimensions of the intermediate values:
vocab_size = 27
Wax (100, 27) x (27, 1) Waa (100, 100) a_prev (100, 1)
Wya (27, 100) x (100, 1) + by (27, 1)
y.shape (27, 1)
len(y.ravel()) 27
It’s ok, I know that there’s something wrong, but I can’t seem to figure it out. Can I show my code for a, z, and y here and ask for advice?
I feel like the problem is that when I add b (100,1) to a, it broadcasts the 1 dimensional component W_{ax}x+W_{aa}a (100,) onto it, giving it a shape of (100,100).
in the equations for a,z, and y I’ve tried reshaping b/by and the other parts of the equations…but nothing seems to be working.
Hello, it seems I had the same problem
In my case, it was indeed something with ‘b’, and I was able to get it working by reshaping some arrays so their dimensions were specifically defined from (n,) to (n,1), so that when ‘b’ was broadcast to them they would be computed properly and give y of shape (27, 1). If I recall correctly, this was even advised in past course for reshaping so to prevent errors
Hope this helps
Thank you for the help. I realized the problem was in my initial instantiation of a_prev, and both initial and later declarations of x.