Week3 - Choice of Activation function

udx1 · February 4, 2022, 11:17pm

In Week3, prof discusses different activation functions such as tanh, ReLU, Leaky ReLu for hidden layers. I understand the use of sigmoid for the output layer for binary classification. Otherwise, how do you determine which function to use? Do you try out different options and see what fits the best?

Thanks.

paulinpaloalto · February 5, 2022, 2:06am

Yes, the choice of hidden layer activations is one of the “hyperparameters”, meaning choices that you need to make. As you say, the output layer is fixed: sigmoid for binary classifications and softmax for multiclass classifications (we haven’t learned about softmax yet, but we will in Course 2). But for the hidden layers, you have quite a few choices. What you will see in this and the subsequent courses is that Prof Ng normally uses ReLU for the hidden layer activations, although he uses tanh here in Week 3. You can think of ReLU as the “minimalist” activation function: it’s dirt cheap to compute and provides just the minimum required amount of non-linearity. But it has some limitations as well: it has the “dead neuron” or “vanishing gradient” problem for all z < 0, so it may not work well in all cases. But it seems to work remarkably well in lots of cases. So it looks like there is a natural order in which you try the possible hidden layer activation functions: start with ReLU, if that doesn’t work well then try Leaky ReLU, which is almost as cheap to compute and eliminates the “dead neuron” problem. With Leaky ReLU you also can try different values of the slope for negative values. If that doesn’t work, then you try the more expensive functions like tanh, sigmoid, swish or other possibilities.

udx1 · February 5, 2022, 3:19pm

Perfect, thank you very much for the detailed explanation.

Topic		Replies	Views
Why ReLU and softmax? NLP with Probabilistic Models week-module-4	1	612	November 2, 2021
About activation functions Neural Networks and Deep Learning coursera-platform	2	666	August 9, 2022
Using different activation function for hidden layers Neural Networks and Deep Learning coursera-platform	4	1690	February 7, 2022
Activation functions as hyperparameters Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	563	September 14, 2021
Course1 - Week3 Assignment - ReLU gave worse performance than tanh Neural Networks and Deep Learning coursera-platform	3	550	September 9, 2021

Week3 - Choice of Activation function

Related topics