Sigmoid Activation function

Noor_jamali · December 29, 2022, 8:32am

"Hello, I have a question about why we are using “g,” which denotes a sigmoid activation function in the hidden layer 1,2 computation formula. Because, as we have learned, sigmoid is used for probability distribution on the output layer?

Elemento · December 29, 2022, 8:37am

Hey @Noor_jamali,
By convention, g can be used to denote any activation function, and not just the sigmoid activation function. Additionally, we can use sigmoid as the activation function in the hidden layers too, though it might not be a popular choice due to vanishing gradients, about which you will learn in Week 3 of this course.

I hope this helps.

Cheers,
Elemento

shanup · December 29, 2022, 8:54am

Prof. Andrew in one of the videos illustrates with the math that if we do not use non-linear activation functions in the hidden layers, then the linear equations of all the neurons in the hidden layers can be added up to again get a linear equation - In that case, it would be no different than having a single linear regression unit or a single logistic regression unit.

TMosh · December 29, 2022, 9:07am

Sigmoid has many different uses.

At the output layer, it can be viewed as a probability.

But in general, it’s quite a handy non-linear function for compressing the range of a real-valued input into a range of 0.0 to 1.0, with the added benefits that its partial derivative is continuous and is very easily computed. These are good properties to have in a hidden layer activation function.

Basit_Kareem · December 29, 2022, 10:54am

The g there doesn’t imply sigmoid function. In fact, it never for once did explicitly imply sigmoid function.

It rather means a “function of”.

Just like saying f(x)

TMosh · December 29, 2022, 8:54pm

I think it’s pretty clear that g() implies sigmoid in this series of lectures (Course 2 Week 1 “Neural Network Model”).

At 2:42 in that video (“Inferencing: making predictions”), Andrew specifically mentions that we’re using sigmoid.
And in the two previous videos, g() is referred to as the logistic function or the sigmoid function.

Example:

Noor_jamali · January 7, 2023, 5:49am

Hey, @Elemento Sir, I understand the above explanation. Thanks

Sincerely
Noor Jamali

Elemento · January 7, 2023, 3:48pm

Hey @Noor_jamali,
We are glad we could help.

P.S. - There is absolutely no need to refer to any of us as “Sir”. All of us are learners just like you.

Cheers,
Elemento

Topic		Replies	Views
W2C2 Why do we need activation function? Advanced Learning Algorithms week-2	14	654	March 6, 2024
Week 3: Backpropagation Intuition [improvement/correction] Neural Networks and Deep Learning week-3	2	18	September 20, 2024
Activation functions in the hidden layers Advanced Learning Algorithms week-2	4	510	July 21, 2022
Don't use Linear activation in hidden layers Advanced Learning Algorithms week-2	2	499	May 11, 2023
What does activation actually means? Advanced Learning Algorithms week-1	5	528	January 20, 2023

Sigmoid Activation function

Related topics