Default activation function in hidden layers

Christopher_Badman · July 27, 2022, 12:35pm

In the video Choosing activation functions, Andrew outlines his recommendations for which activation function to use for the output layer depending on the range of possible values. This makes sense to me.

However, he then proceeds to recommend that “… then for the hidden layers, I would recommend just using ReLU as the default activation function.”

I’m struggling to understand how this could optimise the model performance if the range of possible values extends into the negative space. Wouldn’t the hidden layers effectively filter out/diminish all negative inputs before they even arrive at the output layer?

vignesh18 · July 27, 2022, 1:48pm

Hello @Christopher_Badman,
Welcome to community.

Wouldn’t the hidden layers effectively filter out/diminish all negative inputs before they even arrive at the output layer?

The ReLU unit will output 0 when the linear combination of its inputs is zero or negative and not when the inputs are negative. A negative weight for a negative input will still be expressed as a non-zero value at the ReLU unit.

Christopher_Badman · July 27, 2022, 2:40pm

Thanks @vignesh18 !

I keep forgetting that the activation function is applied to the original z-function so it’s still possible that negative inputs can result in a positive value so long as, as you mention, the weight is negative (or I guess the bias could be a large enough negative number too).

TMosh · July 27, 2022, 3:30pm

Since ReLU units can get stuck on negative inputs, typically one has to use a lot more ReLU units in the hidden layer than if another activation (like sigmoid) was used.

The benefit to ReLU is that the activation is extremely easy to compute.

Topic		Replies	Views
ReLU is used in hidden layers WHY? Advanced Learning Algorithms week-2	3	489	May 31, 2023
Can 'relu' activation be used in the last layer of a neural network? AI Discussions ai-discussions	3	1123	January 20, 2024
Activation functions in the hidden layers Advanced Learning Algorithms week-2	4	510	July 21, 2022
Activation function in NN NLP with Classification and Vector Spaces week-3	3	332	March 30, 2022
ReLu activation function Vs sigmoid function Neural Networks and Deep Learning	2	555	June 15, 2022

Default activation function in hidden layers

Related topics