Course 5 Week 1 Assignment 2 Exercise 2

Hi, i’m hitting an error related to the numpy.random.choice function that seems to have to do with probabilities that are slightly above the tolerance for this function. I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-78-a13d660e654a> in <module>
     19     print("\033[92mAll tests passed!")
     20 
---> 21 sample_test(sample)

<ipython-input-78-a13d660e654a> in sample_test(target)
      7 
      8 
----> 9     indices = target(parameters, char_to_ix, 0)
     10     print("Sampling:")
     11     print("list of sampled indices:\n", indices)

<ipython-input-77-ff3cc52aa630> in sample(parameters, char_to_ix, seed)
     57         # (see additional hints above)
     58         probs = y.ravel()
---> 59         idx = np.random.choice(range(len(probs)), p = probs)
     60 
     61         # Append the index to "indices"

mtrand.pyx in numpy.random.mtrand.RandomState.choice()

ValueError: probabilities do not sum to 1

I’ve tried normalizing the distribution first but it still seems to run into the same error. What am I missing?

See the hints provided above the function’s cell. The y array is two dimensional. You need to flatten it using ravel function.

I’ve flattened it with y.ravel() already - see line 58 in the code above.

When i check the shape of y it is (27, 100) and when i check the sum of y.ravel() i get ~100.0000000024.

Since the random.choice function requires that the probabilities sum to 1, i’m trying to normalize the values. When i do that - the id selected is greater than size of x.

range(len(probs))

You’re using the wrong variable for the length.

The Step 3 instructions say “…from the probability distribution y”

That might not be the problem, I tried your method and it seems to be OK.
But I recommend you look if you used softmax() correctly to compute y.

Because the error message says your probabilities don’t add to 1, but that’s what softmax() should do.

I think it comes down to a numpy summation error because of rounding the values do not sum appropriately, so i had to scale it manually. The softmax() works fine but the summation of 100 sets of ‘1’ comes out to a +/- 1 and exceeds the tolerance of the random.choice function.

Do not scale the results manually. That is not necessary if you wrote the correct code.

OP did you find a solution? I am having the same problem.

I figured it out. It’s because you have to give x and a_prev proper 2D dimensions when initializing them so as to get y of shape (vocab_size,1). Currently you are getting y of shape (vocab_size, n_a). In such a case, while each column sums to 1, the sum of all elements of y will be approximately = 1 * number of cols = 1 * n_a = 100.

4 Likes

then dimensions of x,a_prev,y should we keep in order to remove the valueerror?

Hey All,

Just adding my input in case other people encounter the same issue.
I had a problem with the probability reaching 1.
As mentioned by @Suhail_Amiri, this was resolved by explicitly assigning strict vector shapes when initializing x & a_prev. Basically I changed the code from x = np.zeros(27) → x = np.zeros((27,1)).

3 Likes

Dear All,

I think the problem is not related to the dimension of x and a_prev. It should be related to the softmax function defined in the course.

Below is the proof. You could see that after using scipy softmax, even using 1D x and a_prev, we could still input into the random.choice function without error.

Notice it’s not related the numerical round up error. The proof below show that random.choice do accept the summation very close to 1, not exactly 1.

Then, what is the conclusion?
Is it an issue with the softmax implemented in the course?

Hello @bblanc! This thread is too old, so, please create a new post and share your full error.

If x is initialized with shape (vocab_size) then y will have shape (vocab_size,n_a), but if you make it a 2-D matrix (vocab_size,1) it will give y.shape = (vocab_size,1)

Details:
x:(27,), Wax@x:(100,), a_prev:(100, 1), Waa@a_prev:(100, 1) => a:(100, 100), Wya@a:(27, 100), z:(27, 100), y:(27, 100)
but
x:(27, 1), Wax@x:(100, 1), a_prev:(100, 1), Waa@a_prev:(100, 1) => a:(100, 1), Wya@a:(27, 1), z:(27, 1), y:(27, 1)

In the first case Wax@x gets broadcast up in Waa@a_prev + Wax@x + b to (100,100).

Just know that there are two places to initialize x, before the while loop and in the while loop. Just make sure both are 2D!