Dinosaurus_Island_Character_level_language_model - sample()

In the sample(), the notes say that

  • Also notice that 𝑦̂⟨𝑡+1⟩, which is y in the code, is a 2D array.
    Isn’t dimension of yhat would usually be equal to the size of your character vocabulary. How is it a 2D array and what is its shape?


You’re right that the size of \hat{y}^{<t>} will be the size of your vocabulary, which consists of characters in this case. The second dimension is 1, meaning that it is a column vector. That’s because we frequently want to store multiple such entries in a matrix, by “stacking” the vectors as the columns of the matrix.

I added some print statements to my sample() function to print the various shapes and here’s what I see:

vocab_size = 27
Wya (27, 100) x (100, 1) + by (27, 1)
y.shape (27, 1)
len(y) 27
len(y.ravel()) 27
type(y.ravel()) <class 'numpy.ndarray'>

thanks for the clarification. I have one more question in this function:

# Step 4: Overwrite the input x with one that corresponds to the sampled index idx.
Do I need to reset x to zeros and then set x[idx] to 1?

Yes, the goal is for x to be the “one hot” representation of the letter.

got it, thanks. I have one more related question:

When implementing:
# Step 2: Forward propagate x using the equations (1), (2) and (3)
for equation 1, How can I do a dot product of Waa(n_a, n_a) and a_prev(vocab_size,1)

I think a_prev is (n_a,1)?

RIght, the dimension of the hidden state (“memory state”) is independent of the size of the data input vectors (vocab_size in this instance).


Hopefully, this is the last question for this assignment. When I run sample(), in the second iteration, it get an error:
ValueError: ‘a’ and ‘p’ must have same size
with line
idx = np.random.choice(vocablist, p = y.ravel())

Calculating “a” for equation 1, first time I get a.shape as (100,1) and second time I get (100,100) and hence get this error.

Any clues what I am doing wrong?

I figured it out. thanks for the help.

I’m glad to hear you figured it out, but a is not involved in the line that you are showing there. That can’t be the line that throws that error, right? In my previous post, I showed the dimensions of y.

Sorry, I can’t help myself. If we’re acting like scientists here (which I always think of as the goal), then we can’t just let it pass when our theory doesn’t agree with the evidence presented. Either the evidence is wrong or the theory is wrong and we need to understand which it is. :nerd_face:

Yeah, the arrow in the error output was pointing to that line which gave me the clue. I verified that the two arguments to np.random.choice() were not equal and later figured out that root cause of the problem.