W4 - Assignment: Why do we only update the attention weights in the decoder, but not in the encoder?

Harvey_Wang · December 1, 2022, 8:05pm

I may not be understanding transformers at a conceptual level, but I don’t think the lecture covers my question.

I noticed that, when we implemented the encoder, we didn’t keep track nor update the attention weights. However, we did that in the decoder. It seems that the weights weren’t used later on either; but even if they were, why wouldn’t the encoder’s weights be tracked.

balaji.ambresh · December 2, 2022, 12:35pm

Please see this topic. It’s good not to create duplicate topics. You can edit a reply and update your questions. Here’s community user guide to get started.

Harvey_Wang · December 2, 2022, 5:18pm

Thanks for the heads up. I meant to delete this after making my other post, but I ended up forgetting. Sorry about the inconvenience!

Topic		Replies	Views
Conceptual Questions about Transformers Sequence Models	13	661	April 23, 2023
Course 5 Week 4 Assignment: Why are attention weights returned in DecoderLayer Sequence Models	1	739	June 29, 2022
Week 4: Transformer Network (test time intuition) Sequence Models	1	515	April 21, 2022
Transformer Architecture Assignment Sequence Models	4	562	May 11, 2022
General Understanding of Transformer Encoder and Decoder blocks NLP with Attention Models week-3	7	782	January 22, 2024

W4 - Assignment: Why do we only update the attention weights in the decoder, but not in the encoder?

Related topics