Understanding rnn_forward in utils.py of W1 A2 sampling dinosaur names

yc2984 · October 26, 2021, 11:16am

This is the code provided by the exercise, but I don’t understand it at all.
What’s the difference between the X and x?
What is len(X), is it the number of time steps or the length of features, i.e. the possible number of characters in this case 27??

def rnn_forward(X, Y, a0, parameters, vocab_size=27):
    # Initialize x, a and y_hat as empty dictionaries
    x, a, y_hat = {}, {}, {}

    a[-1] = np.copy(a0)

    # initialize your loss to 0
    loss = 0

    for t in range(len(X)):

        # Set x[t] to be the one-hot vector representation of the t'th character in X.
        # if X[t] == None, we just have x[t]=0. This is used to set the input for the first timestep to the zero vector.
        x[t] = np.zeros((vocab_size, 1))
        if (X[t] != None):
            x[t][X[t]] = 1

        # Run one step forward of the RNN
        a[t], y_hat[t] = rnn_step_forward(parameters, a[t - 1], x[t])

        # Update the loss by substracting the cross-entropy term of this time-step from it.
        loss -= np.log(y_hat[t][Y[t], 0])

    cache = (y_hat, a, x)

    return loss, cache

paulinpaloalto · October 26, 2021, 4:57pm

We wrote code very similar to this in the RNN Step by Step exercise that was the first assignment in Week 1 of Sequence Models, right? It might be worth comparing this code to what you wrote earlier.

Yes, len(X) is the number of “timesteps” or elements in the sequence, not the number of features. That is vocab_size, right?

The difference between x and X is explained in the comments: x is the “one hot” version of X initially.

yc2984 · November 1, 2021, 8:07am

Thanks for your reply Paul. Yes, I did compare with the function I wrote, but I still don’t understand this one. So does it mean that the X here is not yet one hot coded? In the previous exercise it was already one hot coded. X has the dimension of (n_x, m, T_x), right? n_x here is vocab_size.

Since in the comment, it says “x[t] to be the one-hot vector representation of the t’th character in X.”, so it looks like t is the number of training examples m.
So X in this case has dimension of (m, T_x), so X[t] has one dimension (1, T_x).
x[t] has dimension of (n_x, 1).
But I’m lost again here: X[t] != None. how can you judge if a vector is None or not?

paulinpaloalto · November 1, 2021, 3:02pm

Yes, they tell you that in the comments: X is not one-hot encoded. They are literally writing out the logic for you to create x as the one-hot encoded version of X one timestep at a time. You can print the shapes of the inputs if you want to confirm what shapes they are. The business about testing X[t] for None is also explained in the comments: the first element is apparently not set for timestep 0.

yc2984 · November 3, 2021, 10:19am

Thanks so much Paul, I think I understand now. So we only consider one training example here, X is a single dimension vector containing the positions (0-26) of each of its character, could be something like [2, 6, 25, 16]. The length of X is the number of timesteps of the RNN. x is a dict with each key being each timestep of the RNN, and value being one-hot coded version of that timestep, 2 would become [0, 0, 1, 0, …, 0], a list of 27 length.
But why in the first exercise the input has a dimension of (n_x, m, T_x), which m is the number of training examples in the mini-batch, and in this exercise we only use one example at a time?

paulinpaloalto · November 3, 2021, 4:15pm

They specifically say in the comments that they are using Stochastic Gradient Descent here, so that means you only need to handle one sample at a time. Maybe they did that for simplification? Or maybe they have a priori knowledge that it works better than Minibatch for this particular case? I don’t know why they did it that way here. Of course the previous exercise was for the fully general case …

Topic		Replies	Views
Week1 building RNN step by step assignment - questions about input data dimension Sequence Models coursera-platform	7	667	July 6, 2021
Week 1 Assignment 2 Exercise 4 Sequence Models coursera-platform	5	334	November 6, 2023
Shape of mini-batches of input x Sequence Models week-module-1 , coursera-platform	6	267	May 2, 2024
C5 W1 A1 RNN_Step_by_Step; rnn_forward Sequence Models coursera-platform	2	562	January 25, 2022
Week 1 RNN Concepts Sequence Models coursera-platform	5	572	May 29, 2023

Understanding rnn_forward in utils.py of W1 A2 sampling dinosaur names

Related topics