Building an intuition for which activation function makes sense to a problem

paulinpaloalto · August 1, 2022, 6:07pm

There are two separate issues here:

Which activation function to use at the output layer
Which activation function(s) to use in the hidden layers of the network

For the output layer, the choice is determined by what your network is predicting. If it is a classification problem, then you use sigmoid for binary classifications (cat/not a cat) and softmax for multiclass classifications (cat, dog, zebra, horse, kangaroo …). Also note that there is a loss function that goes naturally with sigmoid and softmax, which is the cross entropy (“log loss”) loss function. You can think of softmax as the multiclass generalization of sigmoid.

For “regression” problems where you are predicting a continuous numeric value (stock price, temperature, …), then you’re right that it might make sense to just the linear output or ReLU in the case that a negative output value does not make sense. In that type of problem you want a distance based loss function, so typically that would be either MSE (mean squared error) or perhaps MAE (mean absolute error).

For the hidden layers of the network, you have a lot more freedom. Here’s a thread which discusses that.

Topic		Replies	Views
Why ReLU and softmax? NLP with Probabilistic Models week-module-4	1	612	November 2, 2021
Higher dimensional activation functions Neural Networks and Deep Learning coursera-platform	4	560	July 2, 2021
First binary classification model Neural Networks and Deep Learning coursera-platform	5	566	July 12, 2022
In NN are activation function alway logistic regesstions? Advanced Learning Algorithms week-module-1	2	477	February 14, 2023
Activation function in NN NLP with Classification and Vector Spaces week-module-3	3	333	March 30, 2022

Building an intuition for which activation function makes sense to a problem

Related topics