Doubt about Relu activation in hidden layer

paulinpaloalto · January 29, 2023, 5:13pm

Exactly as Tom says, a function is either linear or it’s not and ReLU is “piecewise” linear, but that is nonlinear. It might seem counterintuitive, but it works. You can think of ReLU as the “minimalist” activation function: it’s incredibly cheap to compute and provides just the bare minimum of nonlinearity. It acts like what they call a “high pass filter” in the signal processing world: it zeros all negative values and passes through the positive values unchanged. It doesn’t always work, because returning zeros for all the negative values is a version of what Prof Ng will later call the “dead neuron problem”. I haven’t taken MLS, so I’m not sure if he discusses that there, but he does in DLS. Because of the low compute cost of ReLU it is common to try that first as the hidden layer activation and in a lot of cases it works just fine. If you don’t get good training results with that, then you try Leaky ReLU which is almost as cheap to compute. If that also doesn’t give good results, only then you graduate to more computationally expensive functions like tanh, sigmoid, swish and others.

Topic		Replies	Views
Choice of activation function Advanced Learning Algorithms week-2	7	683	November 21, 2022
Activation functions in the hidden layers Advanced Learning Algorithms week-2	4	510	July 21, 2022
Relu activation NLP with Probabilistic Models week-2	1	563	March 14, 2023
Neural Network functions Advanced Learning Algorithms week-2	3	476	April 22, 2023
Why do you need Non-Linear Activation Functions? Neural Networks and Deep Learning	3	683	March 15, 2022

Doubt about Relu activation in hidden layer

Related topics