Why is dimension of Waa (100,100) in RNN example

In the RNN model video, it is shown that the Waa is a matrix of 100,100 when a is a 100 dimensional vector. Can someone explain why Waa is a 100,100 matrix ?

@Rohaan_Manzoor

Basically, it is bc of element-wise summation of cell-state and weighted input state

[m,m] dot [m] + [m, n] dot [n]

Some details, hopefully helps (my English words): :stuck_out_tongue:

First, 100 here is just an example, it can be arbitrary, short to say: any number, it based on how good the model can perform.

Let’s crash them step by step:

Given: m = 100, n = 1000

x_t , one input at t with shape: [n]

W_ax with shape: [m, n]

  • m (arbitrary): number of units(neuron) s to weight input x_t.
  • n: the number of parameters that are used as the coefficients of x_t.

eq_1:
a_t = (W_ax DOT x_t ) with shape [m]

it will be passed to the next time step as a_t-1, so cell-state, which is a weighted input

a_t-1 is with shape [m].

According to the architect of RNN the hidden-state (for origin pure RNN is output-state) is produced (element-wise summation) by weighted input a_t and the weighted cell-state a_t-1 which basically came from the last time step :point_up:

Agree ? :point_up:

so this part is used:

W_aa is used as the coefficients of a_t-1, other words a_t-1 will be weighted by W_aa:

so we have

eq_0:

(W_aa DOT a_t-1 )

Then

finally, hidden-state or output-state is (ignore g() to ease explaining) is

eq_0 + eq_1:

(W_aa DOT a_t-1 ) + (W_ax DOT x_t )

Conclusion:

(W_ax DOT x_t ) with shape: m

a_t-1 with shape m

Then W_aa must be a shape [m,m], because

it is

[m,m] dot [m] + [m, n] dot [n]