Encoder Block ResNet

Anbu · May 7, 2022, 2:04pm

Dear Mentor,

In the encoder block, using residual connection we are directly passing Input X to the Add & Norm Layer by skipping Multi head attention layer. If so then why output of the multi head attention layer also passed as input to the Add& Norm Layer.

Can you please help to understand the logic ?

alvaroramajo · May 7, 2022, 6:22pm

Hi, @Anbu!

As its name suggests, Add & Norm performs the addition between the input X and the multi-head attention output and layer normalization of the corresponding result. Layer normalization works similarly to batch normalization but recentering and rescaling across the feature dimension.

Batch normalization is usually less effective than layer normalization in natural language processing tasks, whose inputs are often variable-length sequences.

Topic		Replies	Views
[Week4] programming assignment, EncoderLayer, misleading comment Sequence Models coursera-platform	2	914	November 10, 2021
Transformer: dimensions of encoder output and decoder Q matrix Sequence Models coursera-platform	1	602	April 21, 2022
W4 Assignment-Exercise6; why shape after second Add&Norm Layer is (batch_size, n_target, full_connected_dim) not (batch_size, n_target, d_model)? Sequence Models coursera-platform	2	446	October 12, 2023
Week 4 - Exercise 4 - Encoder Layer Sequence Models coursera-platform	5	755	June 11, 2023
Layer order in Residual block of UNQ_C6 NLP with Attention Models week-module-2	3	615	April 11, 2023

Encoder Block ResNet

Related topics