Encoder blocks dimension

YIHUI · August 4, 2022, 3:52am

The encode model structure of Transformer is multiple layers of encoder blocks. However I am wondering if the output of the one encoder block can match the input of next encoder block?

encoder_block = [ 
    # add `Residual` layer
    tl.Residual(
        # add norm layer
        tl.LayerNorm(),
        # add attention
        attention,
        # add dropout
        dropout_,
    ),
    # add another `Residual` layer
    tl.Residual(
        # add feed forward
        feed_forward,
    ),

Since the encoder block starts with attention layer, I think the input should be something like (batchsize, nseq, n_heads, d_model); and since the last layer of an encoder block is a feed forward layer, I think the output should be of (batchsize, nseq, d_model). If my understanding is correct, how to fit the dimension of the next encoder block?

Thanks

reinoudbosch · August 7, 2022, 5:01am

Hi YIHUI,

The number of heads is a hyperparameter and does not constitute part of the input to the attention layer. Also note that the d_model dimension is first divided by the number of heads, with the resulting volumes being reconcatenated at the output. So the number of heads does not impact the input and output dimensions.

YIHUI · August 9, 2022, 1:54am

So does it mean that the input and output share the same dimension for each encoder and decoder block?

reinoudbosch · August 9, 2022, 12:06pm

That’s how I understand it

Topic		Replies	Views
Transformer: dimensions of encoder output and decoder Q matrix Sequence Models coursera-platform	1	582	April 21, 2022
Transformer Encoder tl.Select NLP with Attention Models week-module-3	2	532	August 4, 2022
General Understanding of Transformer Encoder and Decoder blocks NLP with Attention Models week-module-3	7	833	January 22, 2024
Transformer Encoder Block tl.Mean NLP with Attention Models week-module-3	5	550	May 31, 2023
Course 5 - Week 4 - understanding EncoderLayer dimensions Sequence Models coursera-platform	2	1224	May 14, 2021

Encoder blocks dimension

Related topics