Transformers architecture - Week 1 | Coursera

viagarwal · December 1, 2023, 3:39pm

In the lecture titled - Transformer Architecture in the course “Intro to LLMs and Gen AI Project Lifecycle”, Mike says “The power of the transformer architecture lies in its ability to learn the relevance and context of all of the words in a sentence”

My understanding is that the ability to “learn the relevance and context of all words” should be credited to the attention mechanism. For example, an RNN based encoder decoder with attention will learn from the context of all words in a sentence.

I also believe the power of Transformers comes from their ability to be trained in parallel (full sentence as a whole) compared to RNNs where training needs to be sequential.

Do I have this right?

hackyon · December 2, 2023, 2:49pm

Yup, the attention mechanism enables the model to learn the relevance and context of the words in a sentence.

You can use an attention mechanism for RNNs as well (and I believe the original attention mechanism was developed for RNNs). However, the attention mechanism used in Transformers have proven to be more effective (broadly speaking) than when used with other models like RNNs.

Yup, Transformers have more parallelizable components when compared with RNNs like LSTM. This is one of the advantages of Transformers.

Topic		Replies	Views
Transformer Architecture NLP with Sequence Models week-4	2	220	May 22, 2024
Attention is all you need GenAI with LLMs Resources	0	508	July 27, 2023
Questions about transformer architecture Generative AI with Large Language Models ai-discussions	1	41	October 8, 2024
In attention is all you need lesson, the use of feed forward network in the encoder and decoder module is not quiet clear. It will be helpful if someone can explain it clearly Generative AI with Large Language Models week1	3	31	January 4, 2025
Something is wrong in the Decoder Block (of the Week2 ): Contradiction with the paper "Attention is all you need" NLP with Attention Models week-2	6	699	January 31, 2022

Transformers architecture - Week 1 | Coursera

Related topics