Doubt about Relu activation in hidden layer

cpp219 · January 29, 2023, 1:54pm

In “why do we need activation function” andrew has said that we cant use linear activation everywhere because it will result in just normal linear regression (linear function of a linear function is a linear function) but he also said that we can use Relu activation in every hidden layer. It’s a little bit counter-intuitive to me because i think Relu is very close to linear activation and we don’t get much different result from it.

thank you for reading.

TMosh · January 29, 2023, 4:38pm

Relu is not a linear function, because it has a discontinuity at zero, and outputs zero for all negative values.

paulinpaloalto · January 29, 2023, 5:13pm

Exactly as Tom says, a function is either linear or it’s not and ReLU is “piecewise” linear, but that is nonlinear. It might seem counterintuitive, but it works. You can think of ReLU as the “minimalist” activation function: it’s incredibly cheap to compute and provides just the bare minimum of nonlinearity. It acts like what they call a “high pass filter” in the signal processing world: it zeros all negative values and passes through the positive values unchanged. It doesn’t always work, because returning zeros for all the negative values is a version of what Prof Ng will later call the “dead neuron problem”. I haven’t taken MLS, so I’m not sure if he discusses that there, but he does in DLS. Because of the low compute cost of ReLU it is common to try that first as the hidden layer activation and in a lot of cases it works just fine. If you don’t get good training results with that, then you try Leaky ReLU which is almost as cheap to compute. If that also doesn’t give good results, only then you graduate to more computationally expensive functions like tanh, sigmoid, swish and others.

Christian_Simonis · January 29, 2023, 7:58pm

In addition to Tom‘s and Paul‘s excellent answers:
We recently had a thread on a similar topic which you probably find interesting. Feel free to take a look!

Happy learning, @cpp219
and best regards
Christian

Topic		Replies	Views
Choice of activation function Advanced Learning Algorithms week-module-2	7	687	November 21, 2022
Relu activation NLP with Probabilistic Models week-module-2	1	565	March 14, 2023
Activation functions in the hidden layers Advanced Learning Algorithms week-module-2	4	510	July 21, 2022
Understanding RELU deeply Neural Networks and Deep Learning coursera-platform	6	901	February 5, 2023
Neural Network functions Advanced Learning Algorithms week-module-2	3	482	April 22, 2023

Doubt about Relu activation in hidden layer

Related topics