W4 Assignment-Exercise6; why shape after second Add&Norm Layer is (batch_size, n_target, full_connected_dim) not (batch_size, n_target, d_model)?

Seungjun_Lee · September 21, 2023, 5:08pm

So I have two questions.
Q1:
In Week 4 assignment ‘Transformer Architecture’, In Exercise 6, I understand that the shape after second MultiHead Attention is (batch_size, n_target, d_model). But how possibly the shape can be changed to (batch_size, n_target, full_connected_dim) after ‘Add&Norm’ Layer?

The code I’m referring to is this:

mult_attn_out2, attn_weights_block2 = self.mha2(Q1, enc_output, enc_output, padding_mask, return_attention_scores=True)  # (batch_size, target_seq_len, d_model)
        
 # apply layer normalization (layernorm2) to the sum of the attention output and the output of the first block (~1 line)
mult_attn_out2 = self.layernorm2(mult_attn_out2 + Q1)  # (batch_size, target_seq_len, fully_connected_dim)

Q2:
and also in class EncoderLayer(tf.keras.layers.Layer): the shape of output of Encoder is said to be (batch_size, target_seq_len, d_model), but in the doc string of class DecoderLayer(tf.keras.layers.Layer), the shape of enc_output is said to be (batch_size, n_target, full_connected_dim) , why is that?

Thanks

balaji.ambresh · October 1, 2023, 5:18am

Thanks for bringing this up.
The staff have been notified to fix this for the following reasons:

Stacking encoder / decoder layers doesn’t make sense if the final dimension of inputs and outputs don’t match.
def FullyConnected eventually emits embedding_dim across the last dimension.
Layer normalization will not change output shape of the provided input.

Andres_Castillo · October 12, 2023, 11:47pm

Thanks for pointing about this error. That was a good catch. I confirm, that the size of last dimension of the EnconderLayer is embedding_dim. Somehow we mess up, because we said this: x -- Tensor of shape (batch_size, input_seq_len, fully_connected_dim), which is the source of the confusion. The size of x is (batch_size, input_seq_len, embedding_dim)

I’ll change the documentation and the comments accordingly.

Topic		Replies	Views
Attention Output shape Sequence Models	9	630	May 11, 2022
Course 5 - Week 4 - understanding EncoderLayer dimensions Sequence Models	2	1223	May 14, 2021
Wrong comments in the assignment of C4W2 NLP with Attention Models general	3	77	June 19, 2024
Q about keras doc of tf.keras.layers.MultiHeadAttention Sequence Models	6	560	July 18, 2021
C5W4 Questions after finish the course Sequence Models	5	264	December 30, 2023

W4 Assignment-Exercise6; why shape after second Add&Norm Layer is (batch_size, n_target, full_connected_dim) not (batch_size, n_target, d_model)?

Related topics