Why there are two feed forward layers?

Can anyone let me know why there are two feedforward layers?
And what is the dropout rate for? is that for the residual?


Hi, @Amazing_Patrick !

Having to dense layers means you have a more complex network and, therefore, you have more learning capacity (in a few words). On the other hand, the dropout rate is the rate at which the previous layers connections are “switched out” during training. That allows the model to “learn features” through different routes and generalize better. A greater rate would mean more connections turned off.