Why use ReLU for hidden layers when output layer is linear?

baver · October 25, 2023, 2:05pm

Similar question asked here, but the answer was “why it worked” and did not address motivation: ReLU is used in hidden layers WHY?. My question is asking about the motivation behind using ReLU in the hidden layer when the output is linear, as opposed to using linear activations in the hidden layers.

Is it similar to the logistic regression question here: Don't use Linear activation in hidden layers where you want to force some non-linearity into the hidden layers?

Some more info on why we want to do this would be great. It doesn’t seem to be addressed in the “Choosing Activation Functions” video.

baver · October 25, 2023, 2:12pm

Nevermind, the next lecture “Why do we need activation functions” addressed my question - a linear function of a linear function is still a linear function.

TMosh · October 25, 2023, 3:47pm

The hidden layer in an NN always needs a non-linear function.

ReLU is the simplest non-linear function that has low computational impact for both forward propagation and gradients.

paulinpaloalto · October 25, 2023, 4:01pm

Exactly. Because of that mathematical fact, there is literally no point in adding layers to the network unless each one of them is non-linear. You don’t get a more complex function without the non-linearity.

Topic		Replies	Views
Neural Network functions Advanced Learning Algorithms week-2	3	476	April 22, 2023
Why do you need Non-Linear Activation Functions? Neural Networks and Deep Learning coursera-platform	3	683	March 15, 2022
Choice of activation function Advanced Learning Algorithms week-2	7	683	November 21, 2022
Activation functions in the hidden layers Advanced Learning Algorithms week-2	4	510	July 21, 2022
Relu activation NLP with Probabilistic Models week-2	1	564	March 14, 2023

Why use ReLU for hidden layers when output layer is linear?

Related topics