Matrix size of Previous timestamp Waa vs Matrix size of input X Wax

Hello There,

In week 1, When Prof talk about the matrix size of the previous timestamp and current matrix size of input X, Why Waa is 100/100 while Wax is 100/10000.
Does 10000 represent the total number of words we input to RNN model?

Yes, the input vectors x^{<t>} are one hot vectors representing the words in the vocabulary which has 10000 entries. So each one will be a (10000, 1) column vector.

The key formula is this one which computes the new “cell state” of the RNN at time t:

a^{<t>} = g_1(W_{aa}a^{<t-1>} + W_{ax}x^{<t>} + b_a)

If you look at the graph of how the network works, you see that there are two inputs:

a^{<t-1>} which is the cell state of the previous timestep.

x^{<t>} the next entry in the sequence of actual inputs for time t.

The dimension of the cell state is 100 in the example Professor Ng is showing here, so the a vectors are (100, 1).

The actual input vectors x are one hot vectors representing the words in the vocabulary which has 10000 entries, so they are (10000, 1) column vectors.

Now look at the two dot products there and consider what the dimensions of the W_{aa} and W_{ax} need to be given the size of the a and x values and the fact that the output needs to have 100 elements (the new a value).

2 Likes

Thanks for your answer