Questions on inputs for GRU model

Jose_James · February 24, 2023, 8:15am

So I am a little confused about how embeddings are used in this model. Are input characters first transformed into embeddings and then within the GRU the hidden units just happen to equal the embedding dimensions? Or are the hidden units themselves use embeddings. However, in the latter case I would be even more confused since I thought weights are the same throughout the sequence.

arvyzukai · February 24, 2023, 9:02am

Hi @Jose_James

I’m not sure I fully understand the question. I would suggest to browse these related answers first:

Here is a related answer to what the Embedding layer does;
A simple RNN sequence (same for GRU);

Feel free to ask what parts are confusing.

Cheers

Jose_James · February 25, 2023, 5:04am

LSTM(n_units) Builds an LSTM layer with hidden state and cell sizes equal to n_units. In trax, n_units should be equal to the size of the embeddings d_feature.

I guess to clarify I will use the description of LSTM provided in week 3. I think my question may be why does dimensions of the hidden units equal the size of the embeddings? In all the examples from the course the input dimensions were different from the dimensions in the hidden unit.

arvyzukai · February 25, 2023, 8:20am

Could you clarify what do you mean? Could you point to one particular example? Maybe you’re comparing with regular neural networks (MLPs like Week 1 in this course 3)?

It does not have to be, but usually it is that way. For the math to work, the dimensions of the weights have to match the dimensions of the inputs (features). In NLP, RNNs’ inputs are usually embeddings (words or characters converted to sequence of numbers - vectors). So to multiply these embeddings by a matrix (weights) the dimensions have to align.

For example, (that I linked earlier (the simple RNN), word “Refrigerator” is the input to the RNN:

it has to be converted to numbers, for example [3.14, 2.5, -0.2, 0.1, 1.2] (embedding dimension 5 - 5 numbers).
if this is the input to RNN (like usually it is) with the previous hidden state, the RNN’s weight matrix has to match the input dimension (5), in the example W_xh.shape is (5, 4) (here, 4 - is the hidden state dimension = the output dimension = 4 numbers).
after passing through RNN the output will be for example [0.7956, 0.781, 0.179, -0.99] (dimension 4)

You could insert a simple Dense layer after the Embedding layer, right before the RNN (to change the dimensions of inputs to the RNN), but that would not make much sense (since you could just modified the embedding dimension and that would be equivalent). So usually in NLP the Embedding dimension matches the RNN dimension.

Jose_James · March 8, 2023, 8:16pm

Hey sorry for the late reply. I think your answer makes sense and thank you for the response.

When I compare pytorch and trax documentation for LSTMs, in pytorch the hidden dimensions can be explicitly defined as different from the input dimensions.

For example in pytorch: nn.LSTM(embedding_dim, hidden_dim, num_layers)

vs in Trax trax.layers.rnn. LSTM (n_units , mode=‘train’ , return_state=False , initial_state=False ) input assumes the hidden dimensions to be equal to the embedding dimension. Is there a way to have different embedding and hidden dimensions in Trax?

Hope this makes more sense

arvyzukai · March 9, 2023, 8:05am

I don’t think there is a logical reason for that (in NLP). Do you have something particular in mind?

But to answer your question, if you would want to have different embedding layer dimensions and hidden dimensions for the subsequent LSTM layer, you could use the Dense layer in between and achieve the same thing. This of course is redundant - you could have changed the embedding dimension.

On the other hand, if you want to change the output dimensions, you could use the Dense layer right after the LSTM layer (as this is usually the case).

Previously I too had the same question - why trax does not allow different input and output dimensions for RNNs and I could not find the definite answer. The reasons could be design choices for performance or simplicity of code… who knows?

Topic		Replies	Views
C4W1: Quick question - Number of LSTM units in the model NLP with Attention Models week-1	1	415	March 2, 2024
Model architecture: Embedding dimension size and GRU number of cells NLP with Sequence Models week-2	8	1142	January 3, 2023
Difference in GRULM implementation and LSTM NLP with Sequence Models week-3	1	434	October 1, 2023
Number of LSTM units in Trax NLP with Sequence Models week-3	12	1296	January 12, 2023
RNN Concepts too confusing NLP with Sequence Models week-2	3	730	July 31, 2023

Questions on inputs for GRU model

Related topics