Normalizing activation in Neural Network

Anbu · July 24, 2021, 12:18pm

Hi Sir,

@paulinpaloalto @bahadir @eruzanski @Carina @neurogeek @lucapug @javier @kampamocha

The lecture notes says that, if you have a sigmoid activation function, you don’t want your values to always be clustered here. You might want them to have a larger variance or have a mean that’s different than 0, in order to better take advantage of the nonlinearity of the sigmoid function rather than have all your values be in just this linear regime

Our doubt is, what would be the advantage over the non-linearity region of sigmoid function rather than linear region ?

linear_regime

kampamocha · July 24, 2021, 7:03pm

Hi @Anbu, I understand it like this:

We need nonlinearities for the network to be able to comport as a universal function approximator. If you use only linear functions, no matter how many layers your network has, it is only performing linear transformations of the input, thus not well suited to solve nonlinear problems.

In the example you mention, if all the values are clustered on the central part of the graph you have essencially a linear function of the input, so it is not helping you much.

I think there’s much more to it, but I hope that helps you to understand better.

Topic		Replies	Views
Week 3, "Why do you need Non-Linear Activation Functions?" Neural Networks and Deep Learning week-module-3 , coursera-platform	3	221	March 21, 2024
Activation functions in the hidden layers Advanced Learning Algorithms week-module-2	4	510	July 21, 2022
Activation Functions, (conceptually) Neural Networks and Deep Learning coursera-platform	10	606	November 2, 2022
Activation Function Intuittion Question Supervised ML: Regression and Classification week-module-4	8	300	February 15, 2024
Better Activation functions: (tanh > sigmoid) MLS Resources	18	1077	November 10, 2022

Normalizing activation in Neural Network

Related topics