RNN Architecture, Why not multi-layer NN inside the cell?

It is my understanding that in RNN, GRU and

That the TF RNN/GRU/LSTM parameter “unit” referred the dimensionality (number of neurons) in a single Dense hidden layer inside an individual RNN/GRU/LSTM cell.

  1. Is there any reason why this single hidden layer NN can’t itself be a multi-layer NN?

It doesn’t seem to be possible to define the RNN/GRU/LSTM unit this way in the TF/Keras documentation. The only parameter available appears to be units.

  1. Further, is it possible for the forget and update gates to different sized NN or shaped multi-layer NN then the output gate?
1 Like

If I am understanding this right, you can use several RNN layers instead a multi-layer RNN. Is it maybe that it makes the network more complex…

1 Like

Please read up on Elman networks where we have 1 hidden layer and 1 output layer.

Most NN packages implement this solution with same number of hidden and output units. While you’re welcome to play around with an RNN architecture anyway you like, leaving the library implementation untouched and stacking RNN cells is a better approach.


Ah, I guess that didn’t click for me.

I think you’re saying that adding a RNN layer is equivalent to having a Multi-layer NNs for each timestep.

I think I see now.


From the other mentor’s comment i think I understand now that stacking layers like this is approximately equivalent to having a multi-layer NN within the RNN cell.

I say approximately because I think y[1]<1> of layer 1 is partially dependent on cell state gates between “hidden layers”, whereas what I proposed up top is not.

1 Like

Your 1st image on the top is a time unrolled view of an RNN. It’s the same RNN layer that gets used for all time steps.

The image above refers to chaining / stacking multiple RNN layers which looks like stacking hidden layers.

The image above refers to chaining / stacking multiple RNN layers which looks like stacking hidden layers.

yes, I am aware.

In my initial question I was pondering if it was possible to put a multi-layer NN model inside an individual RNN cell time step, instead of just a single hidden layer of n units.

With the second image, I was posting about my new understanding from the other mentor that chain/stacking RNN layers effectively creates a multi-layer NN model for each time step.

1 Like

Got it.

1 Like

Have you watched all the lectures in DLS C5 Week 1? Prof Ng specifically discusses a number of the different possible RNN architectures including the “stacked” multilayer approach that you show in your second diagram. He doesn’t go into a lot of detail specifically on that multi-layer architecture, but he definitely mentions it. I have never pursued that any further or tried to implement that in TF or PyTorch.