What does activation actually means?

tbhaxor · January 19, 2023, 1:08pm

So activation function is used to map the output of linear function into some other range. That I learnt in the first course in logistic regression. But in term of activation function why it is crucial?

I mean why does having activation makes sense. I could just add sigmoid or other action function in the output layer because there I want to map the value in certain range. What “meaning” does adding activation in the hidden layer give to the next layer.

I hope you understand the question.

gent.spah · January 19, 2023, 3:12pm

What basically the activation does is to change a linear function into non-linear function i.e. you change its behavior (you dent it, break it, you shape it) in such way that linear function is impossible to do, and in much simpler than joining polynomials like in SVM.

If you just add sigmoid to the end only you only get logistic regression no matter how many linear layers you have. You only get a dent in the end

paulinpaloalto · January 19, 2023, 5:09pm

Right! To state Gent’s point in another way, there is an easily provable theorem that the composition of linear functions is still a linear function. That’s the mathematical way to say that if you feed the output of a linear function into another linear function, the combined result is still a linear function, just with different coefficients. In other words, you don’t get any more complex a function by “stacking” linear layers in a network. You need the non-linear activation function at every layer of the network precisely because the whole point of multiple cascading layers in a Neural Network is that you want to create a more and more complex function. With the addition of non-linearity, the more layers, the more complexity you can get.

At the output layer, you specifically need sigmoid as the activation, because it converts the output into something that you can interpret as the probability that the answer is “yes” for a binary classifier. For the hidden layers you have lots of choices. ReLU is one that is very commonly used. You can think of that as the “minimalist” activation function: it’s dirt cheap to compute since it’s just a “high pass filter”. It provides the bare minimum of non-linearity: it’s piecewise linear with a break at z = 0. But it also has the “dead neuron” problem for all inputs < 0 by definition, so it doesn’t always work. If not, you can try Leaky ReLU, tanh, swish or sigmoid. Prof Ng will discuss this in more detail as you proceed through the various courses and specializations. You may need to wait until you get to DLS for the full explanation.

gent.spah · January 19, 2023, 5:32pm

Your explainations are always very detailed and informative Paul, thank you.

tbhaxor · January 20, 2023, 5:26am

So activation functions are just transformation functions. 0 means that neuron should not contribute in making decisions in top layers (bottom to top layout).

tbhaxor · January 20, 2023, 5:37am

I get it, the whole point of using neural nets over regular machine learning is it helps in fitting on the non linear data (complex data). So as Gent said, we need to beat the linear function to have a bent shape as per the shape of the data.

Topic		Replies	Views
Week 3, "Why do you need Non-Linear Activation Functions?" Neural Networks and Deep Learning week-3	3	205	March 21, 2024
W2C2 Why do we need activation function? Advanced Learning Algorithms week-2	14	651	March 6, 2024
Choice of activation function Advanced Learning Algorithms week-2	7	667	November 21, 2022
Why do you need Non-Linear Activation Functions? Neural Networks and Deep Learning	3	681	March 15, 2022
Neural Network functions Advanced Learning Algorithms week-2	3	474	April 22, 2023

What does activation actually means?

Related topics