"Hello, I have a question about why we are using “g,” which denotes a sigmoid activation function in the hidden layer 1,2 computation formula. Because, as we have learned, sigmoid is used for probability distribution on the output layer?
Hey @Noor_jamali,
By convention, g
can be used to denote any activation function, and not just the sigmoid activation function. Additionally, we can use sigmoid
as the activation function in the hidden layers too, though it might not be a popular choice due to vanishing gradients, about which you will learn in Week 3 of this course.
I hope this helps.
Cheers,
Elemento
Prof. Andrew in one of the videos illustrates with the math that if we do not use nonlinear activation functions in the hidden layers, then the linear equations of all the neurons in the hidden layers can be added up to again get a linear equation  In that case, it would be no different than having a single linear regression unit or a single logistic regression unit.
Sigmoid has many different uses.
At the output layer, it can be viewed as a probability.
But in general, it’s quite a handy nonlinear function for compressing the range of a realvalued input into a range of 0.0 to 1.0, with the added benefits that its partial derivative is continuous and is very easily computed. These are good properties to have in a hidden layer activation function.
The g there doesn’t imply sigmoid function. In fact, it never for once did explicitly imply sigmoid function.
It rather means a “function of”.
Just like saying f(x)
I think it’s pretty clear that g() implies sigmoid in this series of lectures (Course 2 Week 1 “Neural Network Model”).

At 2:42 in that video (“Inferencing: making predictions”), Andrew specifically mentions that we’re using sigmoid.

And in the two previous videos, g() is referred to as the logistic function or the sigmoid function.
Example:
Hey @Noor_jamali,
We are glad we could help.
P.S.  There is absolutely no need to refer to any of us as “Sir”. All of us are learners just like you.
Cheers,
Elemento