Concept behind gates

gkouro · August 19, 2022, 11:39am

I am trying to get my head around the GRU and LSTM concepts.
I understand that using weights that are trainable to get Gamma one can control the memory from one time-step to another. In the cat example the memory of the number of the cats is expected to be captured.
What if there are other things that need to be memorized?
Would these be captured in the gates of other neurons in the layer?
Can one unit capture one thing only?

Elemento · August 19, 2022, 4:23pm

Hey @gkouro,
Let’s consider the example of GRU here, and then you can extend the below description to LSTM pretty easily.

Also, I am assuming that you are well aware of the fact that the memory cell holds a vector quantity, for instance, 32, 64, 128, units etc, and not just a single unit. In other words, if the memory cell has say 64-dimensions, then it can store 64 different values, and assuming that each unit can memorize one human interpretable feature, the memory cell will be able to memorize 64 different features simultaneously.

Moreover, even a single unit in the memory cell doesn’t memorize only a single feature essentially. For instance, it could club 2-3 different features (which are interpretable by humans individually) into a single feature, and then a single unit can learn that clubbed feature instead.

Let me know if this helps.

Cheers,
Elemento

gkouro · August 22, 2022, 8:54am

Thanks. For clarification, when you say memory cell you mean a layer and the length of that vector is actually the number of units. Correct?

Elemento · August 22, 2022, 10:55am

Hey @gkouro,
We refer to RNN/GRU/LSTM as a layer, so I don’t know whether it would be correct to refer to something inside that (in our case, memory cell) as a layer too. Nonetheless, here you can find the tensorflow documentation for a LSTM layer. In this, units determine the dimensions of the memory cell as well as dimensions of the output. A simple piece of code can help you validate that.

import numpy as np
import tensorflow as tf
import tensorflow.keras.layers as tfl

lstm = tfl.LSTM(units = 45)
inputs = tf.random.normal([32, 10, 90])
a0 = tf.keras.Input(shape=(45,)) 
c0 = tf.keras.Input(shape=(45,)) 
output = lstm(inputs, [a0, c0])
print(output.shape)

For this code, try to change the shape of a0 and/or c0 to anything but 45, which is the value of units, and this will give an error. I hope this helps.

Cheers,
Elemento

gkouro · August 23, 2022, 11:06am

Right. So the LSTM or GRU layers don’t have units same as a simple RNN layer does. The units are actually dimensions for a rather single memory cell

Elemento · August 23, 2022, 11:18am

Hey @gkouro,

This statement would be incorrect. If we check out the docs of a Simple RNN layer, you will find that it also has an argument units, that denote the dimensionality of the output space, just like it denotes for the GRU and LSTM layers as well. The only difference is that in the case of GRU and LSTM layers, units also denote the dimensionality of the memory cell.

Cheers,
Elemento

gkouro · August 23, 2022, 1:11pm

a Dense layer also has an argument “units”. Would that be different to an RNN layer?

Elemento · August 23, 2022, 3:40pm

Hey @gkouro,
You can easily find the answers to this in the docs of Tensorflow itself. In fact, that will also help you gain confidence in finding answers by yourself. And if you want, you can easily write simple pieces of code like I wrote above to back your understanding further. I hope this helps.

Cheers,
Elemento

gkouro · August 23, 2022, 4:12pm

My question is about the concept and not about the programming

Elemento · August 23, 2022, 5:20pm

Hey @gkouro,
But you are referring to the argument “units” if I am not wrong. And if you are clear on the theoretical difference (which I am assuming you are since Prof Andrew explained it quite clearly in their dedicated videos) between a dense layer and a RNN layer, then all you need to know is how Tensorflow uses the “units” arguments in both the layers.

Cheers,
Elemento

gkouro · August 24, 2022, 8:33am

I was referring to units from a theoretical perspective. Are units/neurons in a dense layer the same as units in RNN? What I mean would you consider the units in RNN neurons just like in a Dense layer (ANN)?

Elemento · August 25, 2022, 3:17pm

Hey @gkouro,

First of all, the number of “units” is nothing but the number of “neurons”, i.e., a single unit is a single neuron. So, I don’t know what are you trying to refer to when you say “units in RNN neurons”. But assuming you are using them interchangeably, let me try to give my 2 cents.

One way to think is that indeed “units” or the number of neurons in both a RNN layer and a Dense layer, decide the dimensionality of the output, and hence, I would consider them the same.

But at the same time, the computation that happens in a single neuron in a Dense layer and the computation that happens in a single neuron in a RNN layer, to produce the output for an input, are pretty different (which I am sure you are well aware of), and hence, I wouldn’t consider them the same.

So, I have presented both the perspectives that came to my mind, and now you can choose whichever you like. I hope this helps.

Cheers,
Elemento

gkouro · August 25, 2022, 3:49pm

I meant units in RNN layer. And of course the calculations are different. Anyway, I think I got my answer. Thanks

Pudja_Gemilang · December 6, 2022, 2:31pm

Hi @Elemento,
So could we say that ‘units’ hyperparameter in LSTM() argument set the dimmension of C_t and H_t below?

Prior to see your replies, i’ve been thought the ‘units’ determine the number of timesteps in y_hat predictions.

Pudja_Gemilang · December 6, 2022, 2:39pm

and continue from here, how do we design the model architecture in tensorflow keras to output y_hat with desire number of timesteps. For instance, we want to make a sentiment classifier model, so we would like to only output 1 timesteps y_hat (Tx = len_sentences, Ty = 1). In other application if we want to make neural machine translation model so that we would like to output Tx timesteps y_hat (Tx = Ty).

Elemento · December 7, 2022, 5:02am

Hey @Pudja_Gemilang,

In a RNN/LSTM/GRU layer, the argument “units” determine the dimension of the hidden activations h. In the case of LSTM, it also determines the dimension of the memory cell state c. This argument doesn’t have to do anything with the time-steps. I hope we are clear up to this point.

Now, the number of time-steps in y_hat predictions is either determined by the number of time-steps in the input or by an argument “return_sequences”. Consider the SimpleRNN layer of Tensorflow, which you can find here.

In the case of Sentiment Classification (Ty = 1) when you only want a single input corresponding to all the time-steps collectively, you can set return_sequences = False.
In the case of Named Entity Recognition or Language Modelling (Ty = Tx) when you want an output corresponding to every time-step, you can set return_sequences = True.
And finally, in the case of applications like Neural Machine Translation (Ty != Tx), you can use an Encoder-Decoder based architecture.

I hope this helps.

Cheers,
Elemento

Topic		Replies	Views
Week 1 - GRU, Why is hidden state and cell memory always same Sequence Models week-1 , coursera-platform	7	390	January 20, 2024
GRU relevant word to store in memory Sequence Models coursera-platform	1	305	November 4, 2023
Question about GRU Sequence Models coursera-platform	1	419	July 22, 2023
RNN Architecture, Why not multi-layer NN inside the cell? Sequence Models coursera-platform	8	273	December 12, 2023
Week 1 - Quiz Problem Sequence Models week-1 , coursera-platform	1	297	January 20, 2024

Concept behind gates

Related topics