Do Transformers obviate RNNs?

In this part of the deep learning specialization we are learning about RNNs. In general, I’m wondering how relevant RNNs remain given the advent of the Transformer. In my current understanding, the Transformer is a replacement for RNNs. Is that right?

Hi @edmundsecho

I wouldn’t state transformer is replacement for RNNs as both differ in how they process the data.

so when it comes to RNNs, data(input) is processed sequentially, one step at a time. RNNs can handle variable-length sequences, but doesn’t go along whell with long-term dependencies in long sequences because the information is passed at each step, and the longer the sequence is, and more likely information will be lost.

Where as in transformers, data(input) process sequences in parallel, as simultaneous parts of a sequence are processed at the same time. So transformers can handle both short and long sequences efficiently.

How this helps!

Regards
DP

Thank you for your response. I take your point that RNN can take variable length of inputs. That said, if I’m willing to pad the input to some constant value, I’m still trying to figure out the motivation for a RNN given the option to use a transformer. So, is there a scenario for preferring the use of a RNN given the option to use a transformer?

I am busy repainting the house today, but this might be what the learner is referring to, or at least I know it is making the rounds: [2410.01201] Were RNNs All We Needed?

1 Like

hi @edmundsecho

Good question. As you know transformer uses self attention mechanism, but the main advantage of RNN comes with its ability to remember important things about the input they received, which allows them to be very precise in predicting what’s coming next.

RNNs allows a temporal flow, making interpreting their decisions easier and understanding how information flows through the sequence where transformers rely on self-attention mechanisms, which can make it challenging to interpret their decisions.

RNNs are well-suited for modeling sequential dependencies. They can capture contextual information from the past, making them effective for tasks like language modeling, speech recognition, and sentiment analysis.

Transformers excel at modeling dependencies between elements, irrespective of their positions in the sequence. They are particularly powerful for tasks involving long-range dependencies, such as machine translation, text classification, and image captioning.

Regards
DP

1 Like

Transformer beats RNN in many NLP tasks. But now I have read somewhere that a more advanced form of RNN can surpass Transformer. I am not sure. In my experience, LSTM performs far better than transformer for time series task.

2 Likes