Architecture of Transformer Network

mohit · May 29, 2022, 10:24am

When in transformer paper it is said that encoder layer is composed of stack of N = 6 identical layer and decoder is also composed of stack of N = 6 identical layers.

I have a question take for example N=2 how two layers of encoder are connected to each other?
In case of encoder Ex - Does the output from the 1layer gets connected to the multihead head attention of 2nd layer and then the output from the 2nd layer(output of feed forward neural network) acts as supplying keys and values to the first layer of decoder? Or are the stacked parallel?

Similarly in case of decoder does N layers are arranged in sequential manner the output of 1st layer goes to the multihead attention of 2nd layer?

Also in decoder the 2nd Multihead attention block, from which layer of encoder does it receives keys and values (like 1st decoder layer could receive keys and values from 1st layer of encoder or 1st layer of decoder receives its keys and values from last layer of encoder, likewise from which layer of encoder does 2nd layer of decoder (2nd multihead attention) receives its keys and values?

balaji.ambresh · June 3, 2022, 9:41am

Please do the programming layer for week 4 and post your response on this thread.

Topic		Replies	Views
Does Number of Fully connected neural networks changes in transformer architechture based on max length input size? Sequence Models	1	497	May 5, 2023
Course 5 - Transformer question about the layers in the EncoderLayer Sequence Models	1	675	June 4, 2022
Transformer Network - Question about "N" Sequence Models week-4	5	141	May 12, 2024
I don't understand the transformer's decoder Generative AI with Large Language Models week-1	2	184	July 24, 2024
Transformer: dimensions of encoder output and decoder Q matrix Sequence Models	1	579	April 21, 2022

Architecture of Transformer Network

Related topics