In the RNN model video, it is shown that the Waa is a matrix of 100,100 when a is a 100 dimensional vector. Can someone explain why Waa is a 100,100 matrix ?
Basically, it is bc of element-wise summation of cell-state and weighted input state
[m,m] dot [m] + [m, n] dot [n]
Some details, hopefully helps (my English words):
First, 100 here is just an example, it can be arbitrary, short to say: any number, it based on how good the model can perform.
Let’s crash them step by step:
Given: m = 100, n = 1000
x_t , one input at t with shape: [n]
W_ax with shape: [m, n]
- m (arbitrary): number of units(neuron) s to weight input
x_t
. - n: the number of parameters that are used as the coefficients of
x_t
.
eq_1:
a_t = (W_ax DOT x_t ) with shape [m]
it will be passed to the next time step as a_t-1, so cell-state, which is a weighted input
a_t-1 is with shape [m].
According to the architect of RNN the hidden-state (for origin pure RNN is output-state) is produced (element-wise summation) by weighted input a_t and the weighted cell-state a_t-1 which basically came from the last time step
Agree ?
so this part is used:
W_aa is used as the coefficients of a_t-1, other words a_t-1 will be weighted by W_aa:
so we have
eq_0:
(W_aa DOT a_t-1 )
Then
finally, hidden-state or output-state is (ignore g() to ease explaining) is
eq_0 + eq_1:
(W_aa DOT a_t-1 ) + (W_ax DOT x_t )
Conclusion:
(W_ax DOT x_t ) with shape: m
a_t-1 with shape m
Then W_aa must be a shape [m,m], because
it is
[m,m] dot [m] + [m, n] dot [n]