N_units=d_ff in feed forward layer different from n_units=d_model

PZ2004 · November 5, 2023, 11:34pm

I noticed that the dense layer parameter n_units=d_ff in feed forward layer is different from the dense layer parmater n_units in the output n_units=d_model. What is the difference between d_ff and d_model and why d_ff is set to be about four times the size of d_model?

arvyzukai · November 8, 2023, 8:28am

Hi @PZ2004

These are hyper-parameters for the model (attention/embedding layers and feed-forward layer) and are usually the thing the data scientists (not the model ) are searching for to get the best performance out of the model.
It happens to be that in the original paper Attention Is All You Need the authors found these to be best for their task (Check Section 6.2 Model Variations to see what they tried and what were the results).

Cheers

Topic		Replies	Views
UNQ_C6 Number of units in Dense Layer NLP with Attention Models week-2	3	560	May 2, 2022
Understanding d_model and d_ff under Attention block NLP with Attention Models week-3	2	381	August 17, 2023
Why is Units same as size of vocabulary for dense layer Doubt NLP with Attention Models week-1	6	242	April 2, 2024
What does the 'units' parameter in the 'Dense' function signify? Advanced Learning Algorithms week-1	2	385	October 8, 2023
C2W1_Lab_Coffee Roasting in Tensorflow Advanced Learning Algorithms week-1	7	495	January 2, 2023

N_units=d_ff in feed forward layer different from n_units=d_model

Related topics