Do Transformers obviate RNNs?

edmundsecho · October 10, 2024, 6:25pm

In this part of the deep learning specialization we are learning about RNNs. In general, I’m wondering how relevant RNNs remain given the advent of the Transformer. In my current understanding, the Transformer is a replacement for RNNs. Is that right?

Deepti_Prasad · October 10, 2024, 7:13pm

Hi @edmundsecho

I wouldn’t state transformer is replacement for RNNs as both differ in how they process the data.

so when it comes to RNNs, data(input) is processed sequentially, one step at a time. RNNs can handle variable-length sequences, but doesn’t go along whell with long-term dependencies in long sequences because the information is passed at each step, and the longer the sequence is, and more likely information will be lost.

Where as in transformers, data(input) process sequences in parallel, as simultaneous parts of a sequence are processed at the same time. So transformers can handle both short and long sequences efficiently.

How this helps!

Regards
DP

edmundsecho · October 10, 2024, 7:59pm

Thank you for your response. I take your point that RNN can take variable length of inputs. That said, if I’m willing to pad the input to some constant value, I’m still trying to figure out the motivation for a RNN given the option to use a transformer. So, is there a scenario for preferring the use of a RNN given the option to use a transformer?

Nevermnd · October 10, 2024, 8:07pm

I am busy repainting the house today, but this might be what the learner is referring to, or at least I know it is making the rounds: [2410.01201] Were RNNs All We Needed?

Deepti_Prasad · October 10, 2024, 10:05pm

hi @edmundsecho

Good question. As you know transformer uses self attention mechanism, but the main advantage of RNN comes with its ability to remember important things about the input they received, which allows them to be very precise in predicting what’s coming next.

RNNs allows a temporal flow, making interpreting their decisions easier and understanding how information flows through the sequence where transformers rely on self-attention mechanisms, which can make it challenging to interpret their decisions.

RNNs are well-suited for modeling sequential dependencies. They can capture contextual information from the past, making them effective for tasks like language modeling, speech recognition, and sentiment analysis.

Transformers excel at modeling dependencies between elements, irrespective of their positions in the sequence. They are particularly powerful for tasks involving long-range dependencies, such as machine translation, text classification, and image captioning.

Regards
DP

saifkhanengr · October 11, 2024, 6:34am

Transformer beats RNN in many NLP tasks. But now I have read somewhere that a more advanced form of RNN can surpass Transformer. I am not sure. In my experience, LSTM performs far better than transformer for time series task.

Topic		Replies	Views
Comparing RNN like models and Transformers Sequence Models	5	422	August 15, 2023
What lenght of a sequence should be for the RNN/LSTM bad working? NLP with Attention Models week-1	4	97	June 10, 2024
RNN (Recurrent Neural Network) AI Discussions ai-discussions	7	127	March 18, 2025
DLS 5 - Input/output of varying window sizes Sequence Models	7	532	June 8, 2022
Limitation of seq2seq without attention Sequence Models	2	677	June 5, 2022

Do Transformers obviate RNNs?

Related topics