I understand how to determine the size of n_x. If the input x is a 5000-dimensional vector, then n_x would also be 5000. However, how do we determine the sizes of n_a and n_y for both Basic RNN and LSTM? Is it just a random choice? Can I select any number I want?
Thank you so much in advance!
It depends on understanding the meaning of those dimensions. n_a
is the size of the “hidden state” of the RNN cell. In the case of a GRU or LSTM, that state has more than one component, but the base hidden state is a. Choosing the size of that is a hyperparameter choice that is analogous to choosing the number of layers and sizes of layers in a FCN or filter sizes in a CNN. You want the hidden state to be complex enough to learn the details of state to perform whatever the task is of your RNN. But if it’s too big, then that just costs more to run the training. As with other hyperparameter choices, we can start by studying worked examples that have been successful in the past at problems that are at least somewhat similar to what we need to accomplish with a new RNN that we are designing.
Then n_y
depends on what the output is for your RNN. There are lots of different types of RNNs that Prof Ng shows us in Week 1 and we’ll see even more as we go through the rest of C5. For example, if you are predicting words from a vocabulary, then it will be one hot vector the dimension of the size of the vocabulary. But it all depends on what your output looks like. You’ll see several examples in the exercises in W1. In the Dinosaur Names exercise, we are predicting letters plus a few delimiters, so it’s 26 or 28 I think. In the Jazz Improvisation assignment, the output is musical notes chosen from a scale with 90 notes, if I’m remembering correctly.
Awesome! Thank you for your clarification!