In the basic RNN assignment, input x is one hot encoded. For the Dino Island one, the input x are integer representations of the characters. Is the one hot encoding function inside the helper functions like rnn_forward?
Another question, in basic RNN, forward propagation is done by looping through time steps with each time step having m examples. For Dino Island, it is done with 1 example at 1 time. This is because stochastic gradient descent is used instead right?
Hi @yeoh_zhewei ,
The rnn_forward() function is taking care of the one hot encoding. You can see the source code by clicking:
The model is given a collection of dino names as training examples, and the learning is done on a real dino name one at a time. There is nothing from the model() function that uses stochastic gradient descent. No sure how stochastic gradient descent would help the generation of dino names if we want the model to generate as close to the real dino name as possible.
Yes, notice that there are two completely separate things going on in this exercise:
The gradient descent to train the model (the
optimize function) and that works on the full batch of training data.
Then there is the
sample function which uses the trained model to generate one name at a time.
In the optimize() code; gradient decent, it takes in X and Y where:
X – list of integers, where each integer is a number that maps to a character in the vocabulary.
Y – list of integers, exactly the same as X but shifted one index to the left.
In the model() code:
x is a single example, 1 word.
y is the ground truth of the 1 example.
using the for loop with j in num_iterations, the optimize function is used which its input is 1 example. with each loop iteration, the parameters are updated using the optimize function. In this sense, isn’t it doing gradient descent every 1 example? (I might have midunderstood stochatic gradient descent)
Anyway, my question is why is it updating parameters every 1 example instead of on the whole batch of training data?
so Only reply here if:
- You have additional details
- The solution doesn’t work for you
If you have an unrelated issue, please start a new topic instea
I think it’s what you said in your parenthetical comment: they are doing the RNN version of Stochastic GD and updating the parameters after each sample.