In the last Programming Exercise (Jazz Improvisation), it says that the input has 30 text sequences. So, there is 30 hidden state. Therefore, I tought that the hidden state dimension is 30 (1 for each sequence).
But, there is this argument:
# number of dimensions for the hidden state of each LSTM cell.
n_a = 64
and it is passing to the LSTM in keras as units.
What is the dimension for the hidden state indeed?
Thank you!
is the hidden state of LSTM limited to the length of vector input?
If I have 30 time points, will the lstm have at most 30 hidden state?
The hidden state is a vector that the model learns how to update based on inputs so that it can predict whatever it is that the goal for your model is. There is one hidden state vector and it gets updated with each timestep for each input. The choice of the number of elements in the hidden state is a “hyperparameter”, meaning a value that you simply have to choose. The size reflects the level of complexity of whatever it is your model needs to detect and predict. The dimension (size) of the hidden state is not related to the number of timesteps in the inputs. Nor is it really directly related to the number or size of the inputs to your model. Thinking back to earlier types of networks we studied in DLS C1 and C2, choosing the size of the hidden state is like choosing the number of neurons in one of the layers of a Feed Forward Fully Connected network.
It’s been a while since I watched the lectures here in DLS C5 W1, but I’ve got to believe that Professor Ng spent some time discussing both the purpose of the hidden state and how to go about choosing the size.
hi @guilhermeparreira
The number of LSTM hidden states at each time step is equal to the units argument passed to the Keras LSTM layer. this dimension is independent of the input vector’s length or the number of time points.
no the dimension of the hidden state is not limited by the length of the features in your input vector.
Input Dimension - input data usually has a shape like (batch_size, timesteps, input_dim) where input_dim is the length of the vector at each time point.
Hidden State Dimension whereas the hidden state’s dimension (units) is a hyperparameter you choose, which can be larger than, smaller than, or equal to the input_dim
No you will have a hidden state for each time point as paul mentioned, and each of these states will have the dimension defined by units
If you have 30 time points and units=64
- The LSTM processes all 30 time points sequentially.
- At each time point, a hidden state of dimension 64 is calculated.
- If you set
return_sequences=True, the layer will output a tensor of shape (batch_size, 30, 64).which means 30 hidden states (one for each time point) and each with a dimension of 64.
1 Like
Right! Just to make sure that the terminology that Deepti and I are using above is clear, note that there is really only one hidden state vector, but its values change at every time step. So if you have 30 timesteps, then for each input that is processed the hidden state will get updated 30 times, but you don’t need to save all 30 values. They are just intermediate values that are used in calculating the actual outputs (\hat{y} values) of the network.
Note that the hidden state is updated based on learned (trained) weight and bias values, the previous hidden state and the inputs at each timestep, which can vary depending on the architecture of your particular RNN.
1 Like
Do you have any reference where Professor Andrew Ng mention this during the slides?
Yes, he talks about it the first time in the third lecture in Week 1 entitled Recurrent Neural Network Model. Note that at that point, he doesn’t use the term “hidden state” yet: he just calls that quantity the “activation” and denotes it a^{<t>}.
Things get more complicated with more sophisticated architectures like GRU and LSTM. In the case of LSTM, he specifies that there are two separate “state” vectors: the normal “activation” or “hidden state” a^{<t>} and also the “memory cell” or “memory state” denoted c^{<t>}. See the Week 1 lectures titled Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM).
Thank you!
Now I understand!
1 Like