W4 - Assignment: Why do we only update the attention weights in the decoder, but not in the encoder?

I may not be understanding transformers at a conceptual level, but I don’t think the lecture covers my question.

I noticed that, when we implemented the encoder, we didn’t keep track nor update the attention weights. However, we did that in the decoder. It seems that the weights weren’t used later on either; but even if they were, why wouldn’t the encoder’s weights be tracked.

Please see this topic. It’s good not to create duplicate topics. You can edit a reply and update your questions. Here’s community user guide to get started.

Thanks for the heads up. I meant to delete this after making my other post, but I ended up forgetting. Sorry about the inconvenience!