C5W1E2: Dinosaur Island Sample() Wrong Values

Step 1 is simple. Step 2, I’m sure I got the formulas correct because I got outputs with an acceptable shape. Step 4 is pretty straightforward. But at Step 3, I am given this:

  • Example of how to use np.random.choice():
probs = np.array([0.1, 0.0, 0.7, 0.2])
idx = np.random.choice(range(len(probs)), p = probs)
  • This means that you will pick the index (idx) according to the distribution: 𝑃(𝑖𝑛𝑑𝑒π‘₯=0)=0.1, 𝑃(𝑖𝑛𝑑𝑒π‘₯=1)=0.0, 𝑃(𝑖𝑛𝑑𝑒π‘₯=2)=0.7, 𝑃(𝑖𝑛𝑑𝑒π‘₯=3)=0.2
  • Note that the value that’s set to p should be set to a 1D vector.
  • Also notice that 𝑦̂ βŸ¨π‘‘+1⟩, which is y in the code, is a 2D array.

So y is the probability are the probabilities in a 2D array, but I need to input a 1D array in np.random.choice of the same size. I noticed that the probabilities of the same second index add up to 1. So I indexed into the counter at first, but it gave wrong values.

list of sampled indices:
 [23, 7, 15, 26, 25, 23, 21, 14, 23, 23, 7, 16, 26, 24, 18, 14, 10, 0]
list of sampled characters:
 ['w', 'g', 'o', 'z', 'y', 'w', 'u', 'n', 'w', 'w', 'g', 'p', 'z', 'x', 'r', 'n', 'j', '\n']
AssertionError                            Traceback (most recent call last)
<ipython-input-110-5ed45dfbad4e> in <module>
     19     print("\033[92mAll tests passed!")
---> 21 sample_test(sample)

<ipython-input-110-5ed45dfbad4e> in sample_test(target)
     15     assert indices[-1] == char_to_ix['\n'], "All samples must end with \\n"
     16     assert min(indices) >= 0 and max(indices) < len(char_to_ix), f"Sampled indexes must be between 0 and len(char_to_ix)={len(char_to_ix)}"
---> 17     assert np.allclose(indices[0:6], [23, 16, 26, 26, 24, 3]), "Wrong values"
     19     print("\033[92mAll tests passed!")

AssertionError: Wrong values

How are you supposed to aggregate a bunch of probability distributions into one probability distribution for the np.random.choice?

1 Like

probs is logically a 1D array per our model architecture since we predict probability of next character based on the characters seen so far. To provide another hint, the shape of probs should be (27, 1) for each timestep. You can use ravel / squeeze to get rid of the dummy dimension since as you explained, numpy accepts a 1D array for p parameter.

To be clear, I have a matrix of (27, 100) and which are 100 distributions of [a-z\n], each of which sum up to 1, and I need to turn it into (27,1) which should also sum up to 1. So should I take the average of the 100 distributions? I still don’t get it

In method sample, if y.shape is not (27, 1), you’ve made a coding error.

Feedback on your code:
The way you’ve initialized x and a_prev is incorrect. Always make tensors 2D to avoid surprises.
For instance, if you want to initialize zeros for a vocabulary of 27 charcters, make zeros of shape (27, 1) and not (27,)


I implemented your suggestion on all initializations. Thank you so much. I was going insane looking at this :face_holding_back_tears:

1 Like