Does Number of Fully connected neural networks changes in transformer architechture based on max length input size?

Arjun_Reddy · May 2, 2023, 10:02am

Considering the architecture of encoder and decoder in transformer as shown below:

Does each input token after self attention mechanism (z1,z2,z3,…)is passed to it’s specific separate Feed forward neural network or does all the Z’s are stacked into one and then passed to single FFNN?
If all the Z’s are stacked into one, then how the difference in shapes of different inputs is taken care
If every z has its own Feed forward neural network, how in practical it is implemented with arbitrary input length?

reinoudbosch · May 5, 2023, 10:07pm

Hi Arjun_Reddy,

the z’s are stacked and then passed to a single feed forward layer
the difference in shapes of inputs is resolved by padding to the dimension of the model

Hope this clarifies.

Topic		Replies	Views
Transformer Model Decoder Question Sequence Models coursera-platform	1	447	July 15, 2023
4 Questions on Transformers Sequence Models coursera-platform	2	1133	April 23, 2023
What is the use of Feed forward layer in Transformer Generative AI with Large Language Models week-module-1	4	1571	July 13, 2023
Course 5 - Week 4 - understanding EncoderLayer dimensions Sequence Models coursera-platform	2	1224	May 14, 2021
Architecture of Transformer Network Sequence Models coursera-platform	1	536	June 3, 2022