Transformer Encoder tl.Select

YIHUI · August 3, 2022, 9:02pm

In function of Transformer Encoder of A3, why we need to add tl.Select([0], n_in=2)? I learnt that this line pop the top two elements on the stack and place back the original top one. In the context of this model structure, what exactly did we pop out and why doing this?

wesleyalmeida · August 3, 2022, 11:16pm

Hello @YIHUI!

With this we are dropping the masks. Did it make sense for you?

Best regards,
Wesley P.

YIHUI · August 4, 2022, 4:01am

Hi Wesley,

Just want to make sure that my understanding is correct: For each encoder block:

encoder_block = [ 
    # add `Residual` layer
    tl.Residual(
        # add norm layer
        tl.LayerNorm(),
        # add attention
        attention,
        # add dropout
        dropout_,
    ),
    # add another `Residual` layer
    tl.Residual(
        # add feed forward
        feed_forward,
    ),
]

in the attention part, the output is the activation and mask, and only the activation goes to the feed forward part, leaving the mask unchanged. Thus the final output of an encoder layer is feedforward(activation)+mask, and as you mentioned the mask is dropped with tl.Select. Am I right?

Then what about the case the encoder_blocks in TransformEncoder consist of more than one encoder block? How will the outputs (activation+mask) of the first encoder block pass to the next encoder block? Only the activation or the activation+mask… I am think if only activation goes to the next encoder block then after the chain of encoder blocks, there are multiple masks in the sequence instead of only one…

Hope you can understand my questions. thanks!

Topic		Replies	Views
Encoder blocks dimension NLP with Attention Models week-module-3	3	525	August 9, 2022
C5_W4_A1 UNQ_C4 Encoder Layer Mask Sequence Models coursera-platform	16	1063	August 3, 2021
About encoder structure in C4_W3_Assignment NLP with Attention Models week-module-3	1	409	June 27, 2023
Transformer Encoder Block tl.Mean NLP with Attention Models week-module-3	5	550	May 31, 2023
NLP C4_W1 Assignment: tf.Select ignores mask in train_batch_stream, mistake or intentional? NLP with Attention Models week-module-1	1	567	May 22, 2022

Transformer Encoder tl.Select

Related topics