In week 1 Recurrent Neural Network Model video, what does the first dimension of Wa of 100 represent?

Same question for Waa matrix and Wax which are 100x100 and 100x10000 respectively?

In week 1 Recurrent Neural Network Model video, what does the first dimension of Wa of 100 represent?

Same question for Waa matrix and Wax which are 100x100 and 100x10000 respectively?

You filed this under DLS Course 4, so I moved it to Course 5 by using the little “edit pencil” on the title.

The a^{<t>} values there are the “hidden state” of the RNN “cell”. Just as with the number of neurons in a layer of a DNN or the number of channels in the output of a CNN layer, this is a hyperparameter that you need to select. In this instance, the hidden state has been specified to have 100 units, which you can think of as neurons. The first expression on the upper left is the full forward activation of that cell to produce the output hidden state. If you are taking in 100 elements and producing a 100 outputs with a matrix multiplication, then the W_{aa} weight matrix needs to be 100 x 100, right? There are two inputs to the cell at each “timestep”: the previous hidden state of with 100 elements and the new x^{<t>} value which in this particular case has 10k elements, meaning (e.g.) that it’s a NLP model with a vocabulary with 10k elements. In that case, the x^{<t>} values will be one hot vectors with 10k elements. Thus the weight matrix W_{ax} needs to be 100 x 10000, since it takes 10k inputs and produces 100 outputs.

On the right side of the diagram, Prof Ng is just showing you another way to “package” all the data for the hidden state calculation by “stacking” the inputs so that you can just do one larger matrix multiply instead of two. It’s just a different way to express the same computation shown on the upper left.

Just as there are two inputs to each timestep, there are also two outputs: the new hidden state and the y^{<t>} output value. The second expression on the lower left is the activation calculation for y^{<t>}.

This was all explained by Prof Ng in great detail in the lectures. You might want to watch them again with the description above in mind.