Activation Functions & Vital Parameters

If you use the same cost function and model and train two different models one using relu and one using tanh for the hidden layers given enough iterations: (assuming final output activation function is the same for both)

Would you still end up with the same parameters for the final model because the cost function is the same or does the choice of hidden layer activations affect the final parameters?

The activation function for the hidden layers can affect the final parameters of the model. So, you will end up with different values and even different accuracy.

1 Like

In particular, relu tends to need more units, because negative values create no gradients. These units become essentially inactive.

1 Like