Course 5 - Week 4 - understanding EncoderLayer dimensions

branderhorst · May 14, 2021, 2:37am

While I was able to get EncoderLayer to pass the tests, I remain confused about what’s going on with the tensor shapes in the guts of the encoder. What the comments say is not what I expect, which I take as a sign that I’m mistaken.

Specifically, after the attention output has been passed through the fully connected layer, the comments in the exercise say that the shape at this point, of ffn_output, should be
(batch_size, input_seq_len, fully_connected_dim)

This confuses me because the sequential FullyConnected layer, defined just above the exercise, has fully_connected_dim activations in the first layer, and then embedding_dim activations in the second and final layer. Doesn’t this mean that the output tensor should have a dimension of embedding_dim in its inner axis?

Related to this, I was trying to figure out why the input x has dimension fully_connected_dim in its inner axis, as stated in the ‘def call()’ comments. In the test cell below the exercise, and I noticed that the test encoder layer is defined to have fully_connected_dim = 8, seen on this line.
encoder_layer1 = EncoderLayer(4, 2, 8)

But then the input tensor is
q = np.array([[[1, 0, 1, 1], [0, 1, 1, 1], [1, 0, 0, 1]]])
which clearly has dimension 4, not 8, in its inner axis, which happens to be the embedding_dim in the first argument passed to EncoderLayer.

All of this is making me think that the comments are wrong, and that input x in fact has shape
(batch_size, input_seq_len, embedding_dim), as well as the output of ffn_output, which makes way more sense to me.

Admittedly my coding skills are very poor, so I’m checking in here to ask if there’s something that I’m misunderstanding here. I’d like to make sure that I understand every step of the transformer architecture.

edwardyu · May 14, 2021, 6:30am

Hi Alex,

Thanks to report the problem. Yes, you’re right. The output shape of fully connected layer should be (batch_size, input_seq_len, embedding_dim). I’ll submit a git issue.

In fact, just like what you found, encoder layer input, MultiHeadAttention layer output, fully connected layer input/output, all of these MUST have the same shape, because there are skip-connections (similar to ResNet in course 4) in between, they have to maintain the same shape. Besides, encoder layer input x should have shape (batch_size, input_seq_len, embedding_dim), too (b/c, for language model, x is an embedding vector sequence.)

branderhorst · May 14, 2021, 1:56pm

Hi Edward,

Thanks for the clarification! Good to know that it wasn’t me.

Topic		Replies	Views
C5_W4_A1 Question about fully_connected_dim in EncoderLayer Sequence Models	1	636	August 23, 2021
Transformer: dimensions of encoder output and decoder Q matrix Sequence Models	1	582	April 21, 2022
Wrong comments in the assignment of C4W2 NLP with Attention Models general	3	77	June 19, 2024
C5W4 Questions after finish the course Sequence Models	5	264	December 30, 2023
W4 Assignment 1 Exercise 8 Are the input dimensions of our transformer model correct Sequence Models week-4	2	249	January 9, 2024

Course 5 - Week 4 - understanding EncoderLayer dimensions

Related topics