RNN Architecture, Why not multi-layer NN inside the cell?

John_Pan · December 12, 2023, 5:38am

It is my understanding that in RNN, GRU and

That the TF RNN/GRU/LSTM parameter “unit” referred the dimensionality (number of neurons) in a single Dense hidden layer inside an individual RNN/GRU/LSTM cell.

Is there any reason why this single hidden layer NN can’t itself be a multi-layer NN?

It doesn’t seem to be possible to define the RNN/GRU/LSTM unit this way in the TF/Keras documentation. The only parameter available appears to be units.

Further, is it possible for the forget and update gates to different sized NN or shaped multi-layer NN then the output gate?

gent.spah · December 12, 2023, 6:28am

If I am understanding this right, you can use several RNN layers instead a multi-layer RNN. Is it maybe that it makes the network more complex…

balaji.ambresh · December 12, 2023, 6:31am

Please read up on Elman networks where we have 1 hidden layer and 1 output layer.

Most NN packages implement this solution with same number of hidden and output units. While you’re welcome to play around with an RNN architecture anyway you like, leaving the library implementation untouched and stacking RNN cells is a better approach.

John_Pan · December 12, 2023, 7:39am

Ah, I guess that didn’t click for me.

I think you’re saying that adding a RNN layer is equivalent to having a Multi-layer NNs for each timestep.

I think I see now.

John_Pan · December 12, 2023, 7:43am

From the other mentor’s comment i think I understand now that stacking layers like this is approximately equivalent to having a multi-layer NN within the RNN cell.

I say approximately because I think y[1]<1> of layer 1 is partially dependent on cell state gates between “hidden layers”, whereas what I proposed up top is not.

balaji.ambresh · December 12, 2023, 8:24am

Your 1st image on the top is a time unrolled view of an RNN. It’s the same RNN layer that gets used for all time steps.

The image above refers to chaining / stacking multiple RNN layers which looks like stacking hidden layers.

John_Pan · December 12, 2023, 8:39am

The image above refers to chaining / stacking multiple RNN layers which looks like stacking hidden layers.

yes, I am aware.

In my initial question I was pondering if it was possible to put a multi-layer NN model inside an individual RNN cell time step, instead of just a single hidden layer of n units.

With the second image, I was posting about my new understanding from the other mentor that chain/stacking RNN layers effectively creates a multi-layer NN model for each time step.

balaji.ambresh · December 12, 2023, 10:48am

Got it.

paulinpaloalto · December 12, 2023, 4:05pm

Have you watched all the lectures in DLS C5 Week 1? Prof Ng specifically discusses a number of the different possible RNN architectures including the “stacked” multilayer approach that you show in your second diagram. He doesn’t go into a lot of detail specifically on that multi-layer architecture, but he definitely mentions it. I have never pursued that any further or tried to implement that in TF or PyTorch.

Topic		Replies	Views
About RNN parameter Convolutional Neural Networks in TensorFlow week-2	6	529	December 16, 2021
Concept behind gates Sequence Models	15	565	December 7, 2022
GRU unit for RNN Sequence Models	1	524	June 2, 2022
When to use Deep RNNs and intuition behind Deep RNNs? Sequence Models	4	611	May 18, 2021
Difference in GRULM implementation and LSTM NLP with Sequence Models week-3	1	433	October 1, 2023

RNN Architecture, Why not multi-layer NN inside the cell?

Related topics