Hello guys,
I have been trying to implement this function for quite a while. It would be nice if someone could explain the shapes of these objects which we receive as parameters:
Wax.shape (100, 27)
Waa.shape (100, 100)
Wya.shape (27, 100)
by.shape (27, 1)
b.shape (100, 1)
I understand that we have 27 characters, but why we use 100 all the time. If we just forward propagate one step we need one weight number for each step not 100. The formula is quite simple a = np.tanh(np.dot(πππ₯, x) + np.dot(Waa, a_prev) + b). So we would pass one weight for Wax, 27 weights for Waa (each character), one weight for b.
UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
GRADED FUNCTION: sample
def sample(parameters, char_to_ix, seed):
ββ"
Sample a sequence of characters according to a sequence of probability distributions output of the RNN
Arguments:
parameters -- Python dictionary containing the parameters Waa, Wax, Wya, by, and b.
char_to_ix -- Python dictionary mapping each character to an index.
seed -- Used for grading purposes. Do not worry about it.
Returns:
indices -- A list of length n containing the indices of the sampled characters.
"""
# Retrieve parameters and relevant shapes from "parameters" dictionary
Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
vocab_size = by.shape[0]
n_a = Waa.shape[1]
### START CODE HERE ###
# Step 1: Create the a zero vector x that can be used as the one-hot vector
# Representing the first character (initializing the sequence generation). (β1 line)
x = np.zeros((27,))
# Step 1': Initialize a_prev as zeros (β1 line)
a_prev = np.zeros((100,100))
# Create an empty list of indices. This is the list which will contain the list of indices of the characters to generate (β1 line)
indices = []
# idx is the index of the one-hot vector x that is set to 1
# All other positions in x are zero.
# Initialize idx to -1
idx = -1
# Loop over time-steps t. At each time-step:
# Sample a character from a probability distribution
# And append its index (`idx`) to the list "indices".
# You'll stop if you reach 50 characters
# (which should be very unlikely with a well-trained model).
# Setting the maximum number of characters helps with debugging and prevents infinite loops.
counter = 0
newline_character = char_to_ix['\n']
while (idx != newline_character and counter != 50):
# Step 2: Forward propagate x using the equations (1), (2) and (3)
# πβ¨π‘+1β©=tanh(πππ₯π₯β¨π‘+1β©+ππππβ¨π‘β©+π)
print(Wax.shape)
print(Waa.shape)
print(Wax)
print()
print()
print(Waa)
#print("n_a shape", n_a.shape)
#print("vocab_size shape", vocab_size.shape)
#print("char_to_ix shape", char_to_ix.shape)
#print(x.shape)
ff = np.dot(Waa, a_prev)
print("ff is", ff.shape)
a = np.tanh(np.dot(πππ₯, x) + np.dot(Waa, a_prev) + b)
print("a is", a)
print("a shape is", a.shape)
#π§β¨π‘+1β©=ππ¦ππβ¨π‘+1β©+ππ¦(2)
z = np.dot(Wya, a) + by
print("z is ", np.sum(z))
#π¦Μ β¨π‘+1β©=π πππ‘πππ₯(π§β¨π‘+1β©)(3)
y = softmax(z)
print("y is ", np.sum(y))
# For grading purposes
np.random.seed(counter + seed)
ys = np.ravel(y)
print("ys shape is", ys.shape)
yr = np.ravel(y)
print("yr is", np.sum(yr))
xx2 = len(np.ravel(y))
print("xx2 shape is", xx2)
# Step 3: Sample the index of a character within the vocabulary from the probability distribution y
# (see additional hints above)
idx = np.random.choice(range(len(np.ravel(y))), p = np.ravel(y))
# Append the index to "indices"
indices.append(idx)
# Step 4: Overwrite the input x with one that corresponds to the sampled index `idx`.
# (see additional hints above)
x = idx
x[idx] = y
# Update "a_prev" to be "a"
a_prev = a
# for grading purposes
seed += 1
counter +=1
### END CODE HERE ###
if (counter == 50):
indices.append(char_to_ix['\n'])
return indices
At the end I get the error that my probabilities doesnt sum up to 1 at p idx = np.random.choice(range(len(np.ravel(y))), p = np.ravel(y))
When I check each of the variables I got this:
a shape is (100, 100)
z is 7729.441975516234
y is 99.99999999999997
ys shape is (2700,)
yr is 99.99999999999997
xx2 shape is 2700
so p is 99.99999999999997 which of course is not one. But if we are using this weird 100 shape then how it can be one?
Best regards,
Roberts