Week 1 Assignment 2 Exercise 2 - sample

Hello there,

I have an issue with this exercise. My code gets only first 4 indexes right
[23, 16, 26, 26, 19, 25, 4, 1, 15, 6, 4, 25, 23, 23, 10, 14, 16, 12, 11, 3, 20, 16, 22, 23, 14, 15, 6, 9, 26, 2, 6, 1, 26, 11, 2, 21, 0]
instead of
[23, 16, 26, 26, 24, 3, 21, 1, 7, 24, 15, 3, 25, 20, 6, 13, 10, 8, 20, 12, 2, 0]
It also appears to output more indexes that expected.

Trying to debug this code, I want to verify if my assumptions are correct:

  1. x is vector of the same size as the dictionary - 27
  2. y is of the shape (27, 100)
  3. each of 100 entries add up to 1, which I understand is a probability for each of 100 characters in the word
  4. the input to np.random.choice is a vector of length 27. If I pass all 100 vectors at once I get an error that probabilities do not add up to 1. How do I chose which of 100 vectors goes to np.random.choice function? I tried using idx and counter.
  5. In step 4 why do we have x = None and x[idx] = None. Isn’t x should be set to y? Looking at the Figure 3 x<t+1> is y. So why do we want to update particular element in the vector and what it should be set to? I tried setting it to idx but then I only get one index right.

Thank you,
Alex

2 Likes

The shape of y is incorrect: it should be (27, 1). So it would be a good idea to check your calculations there. I added prints to my code to show the shapes of everything and here’s what I get:

vocab_size = 27
Wax (100, 27) x (27, 1) Waa (100, 100) a_prev (100, 1)
Wya (27, 100) x (100, 1) + by (27, 1)
y.shape (27, 1)
len(y) 27
len(y.ravel()) 27
10 Likes

thank you @paulinpaloalto! I did fix that issue and now y is (27,1). However, it didn’t fix the output. I still only get first 4 indexes right. I wonder if that’s because I update x incorrectly. My understanding that x should be set to y (x = y). I don’t understand why we need to complete x[idx].

It is the array indices that is the actual answer. At each iteration, we take the input x (0 for the first iteration) and create the x for the next time step. It’s not directly equal to y, but is a random choice based on y to make it more interesting (less predictable). The other point about setting x[idx] = 1 is just that x is formatted as a “one hot” vector, right?

1 Like

I set x to 0 vector of form (27,1) and x[idx] to 1 and it worked for me. Thank you! My understanding is that x in one hot encoding form carries over to the next step index of the predicted symbol in the previous step. Those setting idx to 1 tells next cell that this is the index that was selected in the previous step end everything else wasn’t those it is 0.

3 Likes

Yes, that is a good description of how one hot encoding works.

I have a similar problem, sampletest() finds that sample() produces different values than expected:
list of sampled indices:
[17, 13, 26, 23, 24, 19, 7, 17, 7, 17, 15, 3, 26, 8, 18, 18, 24, 1, 17, 14, 11, 10, 21, 22, 0]
list of sampled characters:
[‘q’, ‘m’, ‘z’, ‘w’, ‘x’, ‘s’, ‘g’, ‘q’, ‘g’, ‘q’, ‘o’, ‘c’, ‘z’, ‘h’, ‘r’, ‘r’, ‘x’, ‘a’, ‘q’, ‘n’, ‘k’, ‘j’, ‘u’, ‘v’, ‘\n’]

I noticed that the documentation above the code, just after step 3, mentioned the function ravel(), which I did not use. None of the objects in step 3 (or beyond) seem to need ravel(). I wonder what the documentation has in mind.
Thanks!

You need to make sure that the “probability distribution” argument to np.random.choice is a 1 dimensional object. np.ravel is one way to do that, but there are plenty of others. I tried using np.squeeze instead and the tests pass just fine with that. Note that y is a (27, 1) 2D array. If you just use that directly as p, it throws an error.

So my guess is that absence of ravel in your solution is not the issue. There must be something else wrong with your logic. Please check everything against the instructions again and the comments that give directions.

1 Like

Thanks! I used squeeze, not ravel, and the function just simply wasn’t working. Finding out that ravel or squeeze works the same, I looked much more closely at the code, and found that I had a spurious application of “tanh” in step 2, the statement setting “z”. It’s working now.

A thing to keep in mind (I keep forgeting) is to reshape the zero vectors, e.g. np.zeros(3).reshape((3,1)).

Note that np.zeros takes a “tuple” as an argument, so you could have achieved the same result more simply this way:

np.zeros((3,1))

Thanks Paul! Keeping track of dimensions is always the hardest for meand this was a key hint. And a extra hint to reader: when initializing arrays, do np.zeros((M,1)), not np.zeros(M). This will save some debugging time!

1 Like

That’s a great point! The difference between a 2D array and a 1D array is key here.