Questions on inputs for GRU model

So I am a little confused about how embeddings are used in this model. Are input characters first transformed into embeddings and then within the GRU the hidden units just happen to equal the embedding dimensions? Or are the hidden units themselves use embeddings. However, in the latter case I would be even more confused since I thought weights are the same throughout the sequence.

Hi @Jose_James

I’m not sure I fully understand the question. I would suggest to browse these related answers first:

Feel free to ask what parts are confusing.


  • LSTM(n_units) Builds an LSTM layer with hidden state and cell sizes equal to n_units. In trax, n_units should be equal to the size of the embeddings d_feature.

I guess to clarify I will use the description of LSTM provided in week 3. I think my question may be why does dimensions of the hidden units equal the size of the embeddings? In all the examples from the course the input dimensions were different from the dimensions in the hidden unit.

Could you clarify what do you mean? Could you point to one particular example? Maybe you’re comparing with regular neural networks (MLPs like Week 1 in this course 3)?

It does not have to be, but usually it is that way. For the math to work, the dimensions of the weights have to match the dimensions of the inputs (features). In NLP, RNNs’ inputs are usually embeddings (words or characters converted to sequence of numbers - vectors). So to multiply these embeddings by a matrix (weights) the dimensions have to align.

For example, (that I linked earlier (the simple RNN), word “Refrigerator” is the input to the RNN:

  1. it has to be converted to numbers, for example [3.14, 2.5, -0.2, 0.1, 1.2] (embedding dimension 5 - 5 numbers).
  2. if this is the input to RNN (like usually it is) with the previous hidden state, the RNN’s weight matrix has to match the input dimension (5), in the example W_xh.shape is (5, 4) (here, 4 - is the hidden state dimension = the output dimension = 4 numbers).
  3. after passing through RNN the output will be for example [0.7956, 0.781, 0.179, -0.99] (dimension 4)

You could insert a simple Dense layer after the Embedding layer, right before the RNN (to change the dimensions of inputs to the RNN), but that would not make much sense (since you could just modified the embedding dimension and that would be equivalent). So usually in NLP the Embedding dimension matches the RNN dimension.

Hey sorry for the late reply. I think your answer makes sense and thank you for the response.

When I compare pytorch and trax documentation for LSTMs, in pytorch the hidden dimensions can be explicitly defined as different from the input dimensions.

For example in pytorch: nn.LSTM(embedding_dim, hidden_dim, num_layers)

vs in Trax trax.layers.rnn. LSTM (n_units , mode=‘train’ , return_state=False , initial_state=False ) input assumes the hidden dimensions to be equal to the embedding dimension. Is there a way to have different embedding and hidden dimensions in Trax?

Hope this makes more sense

I don’t think there is a logical reason for that (in NLP). Do you have something particular in mind?

But to answer your question, if you would want to have different embedding layer dimensions and hidden dimensions for the subsequent LSTM layer, you could use the Dense layer in between and achieve the same thing. This of course is redundant - you could have changed the embedding dimension.

On the other hand, if you want to change the output dimensions, you could use the Dense layer right after the LSTM layer (as this is usually the case).

Previously I too had the same question - why trax does not allow different input and output dimensions for RNNs and I could not find the definite answer. The reasons could be design choices for performance or simplicity of code… who knows? :slight_smile: