Understanding d_model and d_ff under Attention block

Can you explain a bit how does it work? What depends from this parameters? As I understand one of them something like quantity of neurons and quantity of layers in standard NN?

(Solution code removed, as posting it publicly is against the honour code of this community, regardless if it is correct or not)

Hi @someone555777,

You can read about those in the function doc string:

Best,
Mubsi

1 Like

this is not an answer on my question, mate. Can you explain me what depends from this variables? What can we regulate in the model by changing of them?