Difference in GRULM implementation and LSTM

Hi @Ming_Wei_H

There is a common mistake of understanding what is a “layer” what is a “unit”. So, in GRULM there are 2 “layers” of GRU with 512 “units” each. While in the LSTM there is 1 “layer” with 50 “units”.
The terminology is confusing and if you want to find more about it, you can read this post, if it’s confusing - don’t worry.

Another common misconception is that RNN’s number of units is dependent on sentence length. That is not true. RNN’s number of units is the number of how many inputs and outputs does the RNN layer receives and outputs. In the GRULM case, the inputs are 512 dimensional vectors and the outputs are also 512 dimensional vectors (each “unit” produces its own output). So, the number of units is just the number of outputs you want from the layer.

I recently answered a similar question where you can find concrete calculations which might be informative or confusing :slight_smile:

Cheers