DLS Course 5 W2 Assignment 2, Dinosaurs, Cant get probabilities to 1

Franko999 · October 9, 2021, 4:45am

Hello guys,

I have been trying to implement this function for quite a while. It would be nice if someone could explain the shapes of these objects which we receive as parameters:

Wax.shape (100, 27)
Waa.shape (100, 100)
Wya.shape (27, 100)
by.shape (27, 1)
b.shape (100, 1)

I understand that we have 27 characters, but why we use 100 all the time. If we just forward propagate one step we need one weight number for each step not 100. The formula is quite simple a = np.tanh(np.dot(𝑊𝑎𝑥, x) + np.dot(Waa, a_prev) + b). So we would pass one weight for Wax, 27 weights for Waa (each character), one weight for b.

UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

GRADED FUNCTION: sample

def sample(parameters, char_to_ix, seed):
“”"
Sample a sequence of characters according to a sequence of probability distributions output of the RNN

Arguments:
parameters -- Python dictionary containing the parameters Waa, Wax, Wya, by, and b. 
char_to_ix -- Python dictionary mapping each character to an index.
seed -- Used for grading purposes. Do not worry about it.

Returns:
indices -- A list of length n containing the indices of the sampled characters.
"""

# Retrieve parameters and relevant shapes from "parameters" dictionary
Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
vocab_size = by.shape[0]
n_a = Waa.shape[1]

### START CODE HERE ###
# Step 1: Create the a zero vector x that can be used as the one-hot vector 
# Representing the first character (initializing the sequence generation). (≈1 line)
x = np.zeros((27,))
# Step 1': Initialize a_prev as zeros (≈1 line)
a_prev = np.zeros((100,100))

# Create an empty list of indices. This is the list which will contain the list of indices of the characters to generate (≈1 line)
indices = []

# idx is the index of the one-hot vector x that is set to 1
# All other positions in x are zero.
# Initialize idx to -1
idx = -1 

# Loop over time-steps t. At each time-step:
# Sample a character from a probability distribution 
# And append its index (`idx`) to the list "indices". 
# You'll stop if you reach 50 characters 
# (which should be very unlikely with a well-trained model).
# Setting the maximum number of characters helps with debugging and prevents infinite loops. 
counter = 0
newline_character = char_to_ix['\n']

while (idx != newline_character and counter != 50):
    
    # Step 2: Forward propagate x using the equations (1), (2) and (3)
    
    
    # 𝑎⟨𝑡+1⟩=tanh(𝑊𝑎𝑥𝑥⟨𝑡+1⟩+𝑊𝑎𝑎𝑎⟨𝑡⟩+𝑏)
    
    print(Wax.shape)
    print(Waa.shape)
    print(Wax)
    print()
    print()
    print(Waa)
    #print("n_a shape", n_a.shape)
    #print("vocab_size shape", vocab_size.shape)
    #print("char_to_ix shape", char_to_ix.shape)

    #print(x.shape)
    
    
    ff = np.dot(Waa, a_prev)
    print("ff is", ff.shape)
    
    a = np.tanh(np.dot(𝑊𝑎𝑥, x)  + np.dot(Waa, a_prev) + b)
    
    print("a is", a)
    print("a shape is", a.shape)
    
    #𝑧⟨𝑡+1⟩=𝑊𝑦𝑎𝑎⟨𝑡+1⟩+𝑏𝑦(2)
    z = np.dot(Wya, a) + by
    print("z is ", np.sum(z))
    
    #𝑦̂ ⟨𝑡+1⟩=𝑠𝑜𝑓𝑡𝑚𝑎𝑥(𝑧⟨𝑡+1⟩)(3)
    y = softmax(z)
    print("y is ", np.sum(y))
    
    # For grading purposes
    np.random.seed(counter + seed) 
    
    ys = np.ravel(y)
    print("ys shape is", ys.shape)
    
    yr = np.ravel(y)
    print("yr is", np.sum(yr))
    
    xx2 = len(np.ravel(y))
    print("xx2 shape is", xx2)
    
    # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
    # (see additional hints above)
    idx = np.random.choice(range(len(np.ravel(y))), p =  np.ravel(y))

    # Append the index to "indices"
    indices.append(idx)
    
    # Step 4: Overwrite the input x with one that corresponds to the sampled index `idx`.
    # (see additional hints above)
    x = idx
    x[idx] = y
    
    # Update "a_prev" to be "a"
    a_prev = a
    
    # for grading purposes
    seed += 1
    counter +=1
    
### END CODE HERE ###

if (counter == 50):
    indices.append(char_to_ix['\n'])

return indices

At the end I get the error that my probabilities doesnt sum up to 1 at p idx = np.random.choice(range(len(np.ravel(y))), p = np.ravel(y))

When I check each of the variables I got this:

a shape is (100, 100)
z is 7729.441975516234
y is 99.99999999999997
ys shape is (2700,)
yr is 99.99999999999997
xx2 shape is 2700

so p is 99.99999999999997 which of course is not one. But if we are using this weird 100 shape then how it can be one?

Best regards,
Roberts

TMosh · October 9, 2021, 5:01am

Please don’t post your code unless a mentor asks to see it. Posting your code isn’t allowed by the course Honor Code.

100 is the number of units in the LSTM layer, so there are that many weights to learn for each label.
The reason your sum is 99 and some change is because there’s an error in your code.

Do not use any hard-coded values in your code, such as 27 or 100, because the grader is going to test your code using a different size of LSTM layer, and it won’t have 27 labels, or 100 units.

TMosh · October 9, 2021, 5:03am

For example, for initial x you use (vocab_size,1), and for initial a_prev you use (n_a,1).

TMosh · October 9, 2021, 5:06am

One more tip: In the random choice, you should use the range of the vocab_size, not the range of y ravel.

One more:
‘x’ does not equal idx. You’re creating x as a one-hot vector, so you first create a vector of zeros, then set one element to 1.

Franko999 · October 9, 2021, 8:26am

Dear Tom,

Thank you very much for your kind explanations.
Finally got it working!

Best regards,
Roberts

Syabith_Umar_Ahdan · September 6, 2022, 12:01am

I stumbled upon this issue as well.
Despite the error that the probabilities don’t sum up to one, this is actually NOT the issue in my case. It was the shape issue with some variables that I forgot to correct/fix all along.

I was curious how fixing the shape would fix the “sum not equal to 1” issue. Then I print the sum of y, and it sometimes still has a SUM of 0.99999998 or 1.00000002.

But IT PASSED THE TEST REGARDLESS.
Hope this finding helps future learners.

Topic		Replies	Views
DLS Course 5 Week 1 Assignment 2 Dinosaur shape input problem Sequence Models	3	623	November 6, 2021
W1A2 - How Are Shapes Determined Sequence Models week-1	1	120	May 25, 2024
Sequence Models Week 1 Sequence Models	2	527	April 26, 2023
Help Understanding Rnn Backprop Exersize Week 1 Assignment 1 Sequence Models	4	408	August 12, 2023
Course 5 W1 Assignment 2 Exercise 2 Sequence Models	6	655	January 24, 2022

DLS Course 5 W2 Assignment 2, Dinosaurs, Cant get probabilities to 1

UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)

GRADED FUNCTION: sample

Related topics