When to use Deep RNNs and intuition behind Deep RNNs?

jcleung11 · May 16, 2021, 6:46pm

Hi,

I just finished Week 1 of Course 5 covering Recurrent Neural Networks. I am wondering when in practice one would use Deep RNNs vs. something like a single layer BRNN. The video mentioned more complex functions might require the use of multiple layers, but I was hoping to see some practical examples.

Also, intuitively, what are the deeper layers learning in this case? Making an analogy to CNNs, I would expect deeper layers to learn more complex features like grammar structures, combinations of words, etc., but I have a hard time visualizing how that would be the case just from stacking multiple RNNs on top of each other.

branderhorst · May 16, 2021, 8:01pm

Hi,

I think a simple answer to your first question would be that, when you have high bias, and increasing the RNN activations isn’t providing good returns, you might want to try stacking multiple layers.

With respect to the intuition question, when I think of an RNN layer, I sometimes think of what one of the output vectors would look like written as an explicit function of the sequence of input vectors. Over a long input sequence, you end up something that looks like a series of nested functions --f( f( f( f( … )))), with vectors further in time deeper in the nested structure-- but always using the same learned function determined by the parameter matrices and the activation. So while this allows for a lot of complexity over a long sequence (some of which is lost due to vanishing/exploding gradients), it’s limited by the behaviour of this single function. Imagine how much new possible behaviour is possible when you add a new parameterized function into the mix.

I hope this helps despite the fact that I’m fairly new and inexperienced to these ideas myself.

TMosh · May 16, 2021, 9:45pm

RNN’s are specifically intended for sequences of data.
That’s what makes them different than NNs or CNNs.

A “deep RNN” would be used if you don’t get sufficiently good results with a simpler implementation.

jcleung11 · May 18, 2021, 12:57am

Thanks, this was a very helpful way to think about it!

manifest · May 18, 2021, 11:19am

Hey @jcleung11, It seems that you have already got an answer. I just want to provide a little different perspective on that question.

Generally speaking it’s true for any neural network architecture (MLP, CNN, RNN, etc.) that deeper layers learn high-level representations – increase the model capacity and better fit patterns in the data.

The BRNN has a different purpose, it allows us to specify assumptions on how terms in the input sequences relay on each other. For example, in speech recognition, knowing a word ending narrow the choice of letters in the beginning of the word – resulting in more accurate recognition.

Topic		Replies	Views
W3A2 Trigger_word_detection_v2a Sequence Models week-3 , coursera-platform	4	357	December 29, 2023
RNN Architecture, Why not multi-layer NN inside the cell? Sequence Models coursera-platform	8	273	December 12, 2023
RNN Layer lecture video week 1 Sequence Models coursera-platform	4	535	June 24, 2021
Which one is associated with the largest amount of successful deep learning applications: traditional ANN with many layers, RNN, CNN Neural Networks and Deep Learning coursera-platform	2	563	April 21, 2021
Time step layer Sequence Models coursera-platform	6	400	September 6, 2023

When to use Deep RNNs and intuition behind Deep RNNs?

Related topics