Activation functions as hyperparameters

djpassadore · September 14, 2021, 7:51pm

When enumerating the list of hyperparameters in week 3, no reference was made to the selection of different activation functions. Is it because there are only a handful of options or because ReLU is the best alternative in general or is there any other reason? In any case, how important is the activation function compared to tuning the learning rate for example? Thank you.

paulinpaloalto · September 14, 2021, 11:10pm

Thanks for pointing this out. That is an omission! I did a quick scan of the transcripts in those lectures in Week 3. I think the reason is that Prof Ng is concentrating more on the hyperparameters that have numerical ranges and the strategies for handling those choices. The activation function selection doesn’t really fit that framework.

The choice of activation function for the hidden layers is an important hyperparameter. There are lots of choices. A common practice is to start with ReLU, since it is by far the cheapest to compute. You can view it as the “minimalist” activation function: just the minimal amount of non-linearity and dirt cheap to compute. If it works, that’s great. But it is definitely the case that ReLU does not always work. Then you try Leaky ReLU: it’s almost as cheap to compute, but does not have the vanishing gradient (aka “dead neuron”) problem for Z < 0. If that doesn’t work, then you graduate to more expensive and sophisticated functions like tanh, swish or sigmoid.

In terms of the relative importance of the choice of activation versus learning rate, I don’t know a definitive answer, but would say that it’s probably not worth worrying about. The reason is that pretty soon we will graduate to using TensorFlow for everything and it uses more sophisticated gradient descent algorithms internally that do not require you to select a fixed learning rate. In other words, pretty soon the learning rate will cease to be a knob you need to turn.

Topic		Replies	Views
About activation functions Neural Networks and Deep Learning	2	665	August 9, 2022
Week3 - Choice of Activation function Neural Networks and Deep Learning	2	754	February 5, 2022
DL and NN course1 Week#3: Understanding Activation functions Neural Networks and Deep Learning week-3	2	30	March 4, 2025
Course1 - Week3 Assignment - ReLU gave worse performance than tanh Neural Networks and Deep Learning	3	550	September 9, 2021
Key concepts quiz in week 4 Neural Networks and Deep Learning week-4	3	65	July 7, 2024

Activation functions as hyperparameters

Related topics