Why the parameters(Waa,Wax) arranged horizontally and ( a, x) arranged vertically?

It is from an efficiency for calculation by using “dot product”.

If we stack W_{aa} and W_{ax} horizontally, and a^{<t-1>} and x^{<t>} vertically, we only need one instruction, “dot product”.

The above figure is to calculate one element in a matrix, but you see two elements from W_{aa} and two elements from a^{<t-1>} are element-wise multiplied, and 6 elements from W_{ax} and 6 elements from x^{<t>} are element-wise multiplied. As a dot product is 'sum" of those element-wise multiplication, eventually, the result becomes W_{aa}\cdot a^{<t-1>} + W_{ax}\cdot x^{<t>}

1 Like

There are 2 reasons for doing this:

- As mentioned in the lecture, you get to track 1 weight matrix (i.e. the stacked version) instead of two individual weight matrices (W_{aa} and W_{ax})
- Performance. When using packages like numpy, the stacked version runs faster than the unstacked version.