DLS Course 5 W2 Assignment 2, Dinosaurs, Cant get probabilities to 1

Hello guys,

I have been trying to implement this function for quite a while. It would be nice if someone could explain the shapes of these objects which we receive as parameters:

Wax.shape (100, 27)
Waa.shape (100, 100)
Wya.shape (27, 100)
by.shape (27, 1)
b.shape (100, 1)

I understand that we have 27 characters, but why we use 100 all the time. If we just forward propagate one step we need one weight number for each step not 100. The formula is quite simple a = np.tanh(np.dot(π‘Šπ‘Žπ‘₯, x) + np.dot(Waa, a_prev) + b). So we would pass one weight for Wax, 27 weights for Waa (each character), one weight for b.



def sample(parameters, char_to_ix, seed):
Sample a sequence of characters according to a sequence of probability distributions output of the RNN

parameters -- Python dictionary containing the parameters Waa, Wax, Wya, by, and b. 
char_to_ix -- Python dictionary mapping each character to an index.
seed -- Used for grading purposes. Do not worry about it.

indices -- A list of length n containing the indices of the sampled characters.

# Retrieve parameters and relevant shapes from "parameters" dictionary
Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
vocab_size = by.shape[0]
n_a = Waa.shape[1]

# Step 1: Create the a zero vector x that can be used as the one-hot vector 
# Representing the first character (initializing the sequence generation). (β‰ˆ1 line)
x = np.zeros((27,))
# Step 1': Initialize a_prev as zeros (β‰ˆ1 line)
a_prev = np.zeros((100,100))

# Create an empty list of indices. This is the list which will contain the list of indices of the characters to generate (β‰ˆ1 line)
indices = []

# idx is the index of the one-hot vector x that is set to 1
# All other positions in x are zero.
# Initialize idx to -1
idx = -1 

# Loop over time-steps t. At each time-step:
# Sample a character from a probability distribution 
# And append its index (`idx`) to the list "indices". 
# You'll stop if you reach 50 characters 
# (which should be very unlikely with a well-trained model).
# Setting the maximum number of characters helps with debugging and prevents infinite loops. 
counter = 0
newline_character = char_to_ix['\n']

while (idx != newline_character and counter != 50):
    # Step 2: Forward propagate x using the equations (1), (2) and (3)
    # π‘ŽβŸ¨π‘‘+1⟩=tanh(π‘Šπ‘Žπ‘₯π‘₯βŸ¨π‘‘+1⟩+π‘Šπ‘Žπ‘Žπ‘ŽβŸ¨π‘‘βŸ©+𝑏)
    #print("n_a shape", n_a.shape)
    #print("vocab_size shape", vocab_size.shape)
    #print("char_to_ix shape", char_to_ix.shape)

    ff = np.dot(Waa, a_prev)
    print("ff is", ff.shape)
    a = np.tanh(np.dot(π‘Šπ‘Žπ‘₯, x)  + np.dot(Waa, a_prev) + b)
    print("a is", a)
    print("a shape is", a.shape)
    z = np.dot(Wya, a) + by
    print("z is ", np.sum(z))
    #𝑦̂ βŸ¨π‘‘+1⟩=π‘ π‘œπ‘“π‘‘π‘šπ‘Žπ‘₯(π‘§βŸ¨π‘‘+1⟩)(3)
    y = softmax(z)
    print("y is ", np.sum(y))
    # For grading purposes
    np.random.seed(counter + seed) 
    ys = np.ravel(y)
    print("ys shape is", ys.shape)
    yr = np.ravel(y)
    print("yr is", np.sum(yr))
    xx2 = len(np.ravel(y))
    print("xx2 shape is", xx2)
    # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
    # (see additional hints above)
    idx = np.random.choice(range(len(np.ravel(y))), p =  np.ravel(y))

    # Append the index to "indices"
    # Step 4: Overwrite the input x with one that corresponds to the sampled index `idx`.
    # (see additional hints above)
    x = idx
    x[idx] = y
    # Update "a_prev" to be "a"
    a_prev = a
    # for grading purposes
    seed += 1
    counter +=1

if (counter == 50):

return indices

At the end I get the error that my probabilities doesnt sum up to 1 at p idx = np.random.choice(range(len(np.ravel(y))), p = np.ravel(y))

When I check each of the variables I got this:

a shape is (100, 100)
z is 7729.441975516234
y is 99.99999999999997
ys shape is (2700,)
yr is 99.99999999999997
xx2 shape is 2700

so p is 99.99999999999997 which of course is not one. But if we are using this weird 100 shape then how it can be one?

Best regards,

Please don’t post your code unless a mentor asks to see it. Posting your code isn’t allowed by the course Honor Code.

100 is the number of units in the LSTM layer, so there are that many weights to learn for each label.
The reason your sum is 99 and some change is because there’s an error in your code.

Do not use any hard-coded values in your code, such as 27 or 100, because the grader is going to test your code using a different size of LSTM layer, and it won’t have 27 labels, or 100 units.

For example, for initial x you use (vocab_size,1), and for initial a_prev you use (n_a,1).

One more tip: In the random choice, you should use the range of the vocab_size, not the range of y ravel.

One more:
β€˜x’ does not equal idx. You’re creating x as a one-hot vector, so you first create a vector of zeros, then set one element to 1.


Dear Tom,

Thank you very much for your kind explanations.
Finally got it working!

Best regards,

I stumbled upon this issue as well.
Despite the error that the probabilities don’t sum up to one, this is actually NOT the issue in my case. It was the shape issue with some variables that I forgot to correct/fix all along.

I was curious how fixing the shape would fix the β€œsum not equal to 1” issue. Then I print the sum of y, and it sometimes still has a SUM of 0.99999998 or 1.00000002.

Hope this finding helps future learners.