Choice of activation function

Srivaths_Gondi · October 28, 2022, 12:00pm

Hi everyone! Today I learned about how and which activation function to use for the hidden layers.
Prof. Andrew mentions that using linear activations for all the hidden layers results in a logistic regression model itself (considering the last layer uses sigmoid function) but, then he also says using the RELU function would be an efficient alternative and I noticed the Relu function isn’t very different from the linear function if we only consider positive values for ‘x’ so how exactly would this be an alternative if it’s doing the same thing?

AbdElRhaman_Fakhry · October 28, 2022, 12:38pm

Hi @Srivaths_Gondi

I think it will help you machine learning - Why is ReLU used as an activation function? - Data Science Stack Exchange.

Please feel free to ask any questions,
Thanks,
Abdelrahman

shanup · October 28, 2022, 3:31pm

ReLU function is different from a linear function - ReLU is linear in the range [0,\infty] and non-linear in the overall range [-\infty,\infty]. It is this non-linearity that we exploit in the hidden layers, to be able to model any kind of output function.

Enzo_Faliveni_Alzuet · October 28, 2022, 4:10pm

Hi! I think this article will help you clarify your question.

paulinpaloalto · October 28, 2022, 5:45pm

Right! In mathematics, there is no such thing as “almost linear”: it’s either linear or it’s not. ReLU is “piecewise linear”, but that is a very different thing than “linear”.

You could think of ReLU as the “minimalist” activation function: it is dirt cheap to compute and provides the most basic form of non-linearity. It doesn’t always give good results in every application, because it also has the “dead neuron” problem for all z < 0, but it’s the first thing to try because of its computational efficiency. If it doesn’t work in your case, then you try Leaky ReLU, which fixes the “dead neuron” issue and is still very cheap to compute. If that doesn’t work, only then to you “graduate” to more expensive functions based on the exponential function like tanh, sigmoid, swish and so forth.

Hui_Chen2 · November 21, 2022, 6:55am

Thank you for your answer…I think for us newbie, the non-linear means ‘curve’ to us…so at beginning it doesn’t seem Relu is curving…But when you explain that it’s a ‘dirty cheap’ approximation, I think I got it. Now it sounds interesting…I start to wonder how thought about Relu first.

paulinpaloalto · November 21, 2022, 3:26pm

I haven’t really studied the history here, but the idea and usefulness of ReLU predates ML/DL by quite a bit. I know it was used in Signal Processing for a long time. If you think about it, it’s what you would call a “high pass filter”: it drops everything below a certain value and passes through the values above that. You can think of ReLU as a high pass filter with the threshold = 0.

This is a pattern in a lot of ML/DL: the mathematics is not new. It’s being recycled or repurposed from earlier applications in Statistics, Physics and other fields. E.g. the sigmoid function and cross entropy loss have been around since the 18th century. Look up Maximum Likelihood Estimation from Statistics for the history in that example. I think most of that work was done by Gauss, who was one of the towering figures in 18th century mathematics. That’s Gauss as in Gaussian Distribution, Gaussian Elimination and many more …

And all the optimization techniques like Gradient Descent have many other earlier applications.

Christian_Simonis · November 21, 2022, 6:38pm

In addition to the very good answers for illustrative purposes (although it may be obvious to many - but it helped several students when dealing with ReLU for the first time):

Since many neurons are assigned with bias and weights, linked with an activation function, by combination of multiple neurons (as the neural net in total does) this also allows to learn highly nonlinear behavior, although the activation function of one neuron itself (as you correctly pointed out) possesses only a piecewise linear activation function in case of ReLU.

Best
Christian

Topic		Replies	Views
Doubt about Relu activation in hidden layer Advanced Learning Algorithms week-module-2	3	832	January 29, 2023
Understanding RELU deeply Neural Networks and Deep Learning coursera-platform	6	911	February 5, 2023
Relu activation NLP with Probabilistic Models week-module-2	1	565	March 14, 2023
Neural Network functions Advanced Learning Algorithms week-module-2	3	489	April 22, 2023
Activation functions in the hidden layers Advanced Learning Algorithms week-module-2	4	510	July 21, 2022

Choice of activation function

Related topics