Difference in GRULM implementation and LSTM

I wanted to compare C3_W3 assignment (LSTMs) and C3_W2 assignment (GRUs). With the GRULM
model, the GRU unit was repeated for 2 units.

and in the LSTM assignment the LSTM layer only had 1 unit

One has 2 units, and the other only has 1. I’m curious about why the number of units differs. Additionally, shouldn’t the number of units be at least as many as max length of input sentence? Why are there only 1 and 2 units for the models above? I’m basing both the GRU and LSTM model off of vanilla RNN which should look like this model, where each unit is responsible for 1 input word:

Hi @Ming_Wei_H

There is a common mistake of understanding what is a “layer” what is a “unit”. So, in GRULM there are 2 “layers” of GRU with 512 “units” each. While in the LSTM there is 1 “layer” with 50 “units”.
The terminology is confusing and if you want to find more about it, you can read this post, if it’s confusing - don’t worry.

Another common misconception is that RNN’s number of units is dependent on sentence length. That is not true. RNN’s number of units is the number of how many inputs and outputs does the RNN layer receives and outputs. In the GRULM case, the inputs are 512 dimensional vectors and the outputs are also 512 dimensional vectors (each “unit” produces its own output). So, the number of units is just the number of outputs you want from the layer.

I recently answered a similar question where you can find concrete calculations which might be informative or confusing :slight_smile: